@auditionhub/voice-detector
v0.1.4
Published
Real-time line completion detection for browser apps (Web Speech default, Vosk-WASM fallback)
Maintainers
Readme
@auditionhub/voice-detector
Real-time line completion detection for spoken lines in the browser. Detect when a user has finished a line using pause duration, fuzzy suffix matching, or AI-powered semantic verification. Optimized for low latency with support for AssemblyAI streaming transcription and Gemini semantic analysis.
Install
npm install @auditionhub/voice-detector🔒 Security Best Practices
Important: API keys should never be exposed to the browser! For production applications:
- AssemblyAI: Generate temporary tokens server-side (see example below)
- Gemini: Use
geminiProxyUrlto proxy API calls through your server - See Remix Example for a complete secure implementation
// ✅ GOOD: Server generates tokens, client uses proxy
const detector = new LineDetector({
assemblyAiToken: await fetchTokenFromServer(),
geminiProxyUrl: '/api/gemini-proxy',
})
// ❌ BAD: Direct API keys in browser
const detector = new LineDetector({
assemblyAiApiKey: 'secret-key', // ⚠️ Exposed to browser!
geminiApiKey: 'secret-key', // ⚠️ Exposed to browser!
})Basic usage
Simple Web Speech (Default)
import { LineDetector } from '@auditionhub/voice-detector'
const detector = new LineDetector({
pauseMs: 800,
suffixWords: 3,
useWebSpeech: true,
})
await detector.init()
detector.on('lineComplete', (ev) => {
console.log('Line complete:', ev.reason, ev.elapsedMs)
})
await detector.startLine({ text: 'I think we should head back now.' })With AssemblyAI + Gemini (Recommended)
import { LineDetector } from '@auditionhub/voice-detector'
const detector = new LineDetector({
assemblyAiToken: 'your-temporary-token', // Generate server-side for browser
geminiApiKey: 'your-gemini-key',
semanticThreshold: 0.75,
enableSemanticMatching: true,
pauseMs: 800,
})
await detector.init()
detector.on('semanticMatch', (ev) => {
console.log('Match probability:', ev.probability)
console.log('Reason:', ev.reason)
})
detector.on('lineComplete', (ev) => {
console.log('Line complete:', ev.reason, ev.elapsedMs)
console.log('Semantic score:', ev.semanticProbability)
})
await detector.startLine({ text: 'To be or not to be, that is the question.' })Configuration
Core Options
- pauseMs: number (default 800) – silence duration to trigger completion
- suffixWords: number (default 3) – trailing words to compare for suffix match
- suffixMaxDist: number (default 2) – max edit distance for suffix match
- baseWPS: number (default 2.7) – initial words/sec for timing estimate
- maxMultiplier: number (default 2.0) – timeout multiplier cap
- useWebSpeech: boolean (default true) – prefer Web Speech API locally
AssemblyAI Options
- assemblyAiToken: string (optional) – Temporary token for AssemblyAI real-time streaming (required for browser)
- assemblyAiApiKey: string (optional) – API key for AssemblyAI (Node.js only, not for browser)
- Important: For browser usage, you MUST use
assemblyAiTokeninstead ofassemblyAiApiKey. Generate tokens server-side:// Server-side (Node.js) import { AssemblyAI } from 'assemblyai' const client = new AssemblyAI({ apiKey: 'your-api-key' }) const token = await client.streaming.createTemporaryToken({ expires_in_seconds: 480, }) // Return token to client - Get your API key: https://www.assemblyai.com
Gemini Options
- geminiApiKey: string (optional) – API key for Gemini 2.5 Flash Lite semantic matching
- geminiProxyUrl: string (optional) – Recommended – Server-side proxy URL for Gemini API calls (keeps API key secure)
- semanticThreshold: number (default 0.75) – probability threshold for semantic completion (0-1)
- enableSemanticMatching: boolean (default true) – enable semantic verification
Security Best Practice: Use geminiProxyUrl instead of geminiApiKey in
browser applications to keep your API key secure on the server. See
Remix Example for a complete implementation.
Get your key: https://aistudio.google.com
Legacy Options
- remoteURL: string | null – optional remote STT (future)
Events
lineStart:{ text }– Fired when a new line startslineComplete:{ reason: 'pause' | 'suffix' | 'semantic', elapsedMs, transcript?, avgWPS?, semanticProbability?, semanticReason? }– Fired when line is successfully detectedlineTimeout:{ reason: 'timeout', elapsedMs, transcript?, avgWPS?, semanticProbability?, semanticReason? }– Fired if no completion detected in timetranscript:{ text, isFinal }– Fired on each transcript updatesemanticMatch:{ probability, reason }– Fired when semantic match is calculated (Gemini only)
Speech-to-Text Adapters
The library supports multiple STT backends with automatic fallback:
AssemblyAI (recommended): Real-time streaming transcription with WebSocket
- Best accuracy and latency
- Requires temporary token (for browser) or API key (for Node.js)
- Generate tokens server-side for browser security
- Automatic fallback if unavailable
Web Speech API: Browser-native STT (default fallback)
- No API key required
- Good for simple use cases
- Browser support varies
Vosk-WASM (offline, WIP): Local on-device transcription
- No internet required
- Scaffolding included; full integration pending
Semantic Verification
When geminiApiKey is provided, the library uses Gemini 2.5 Flash Lite to
verify semantic meaning:
- Probability-based matching: Returns 0-1 score for how well transcript matches target
- Smart threshold: Configurable matching strictness (default 75%)
- Paraphrase-friendly: Understands synonyms and alternate phrasings
- Low cost: $0.10/1M input tokens
- Fast: ~180ms time-to-first-token
Examples
- React:
examples/react-basic– Complete demo with AssemblyAI + Gemini integration - Remix (client-only):
examples/remix
How It Works
- VAD Detection: Voice Activity Detection monitors audio for speech/silence boundaries
- Transcription: AssemblyAI streams real-time transcription (or Web Speech as fallback)
- Semantic Verification: Gemini compares meaning between spoken and target text
- Completion: Line completes when:
- Pause detected (configurable duration), OR
- Suffix match found, OR
- Semantic probability crosses threshold (Gemini only)
Roadmap
- [x] AssemblyAI real-time streaming
- [x] Gemini semantic verification
- [ ] Vosk-WASM offline model + caching
- [ ] Web Worker/AudioWorklet offload
- [ ] Multi-language support
