npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@auditionhub/voice-detector

v0.1.4

Published

Real-time line completion detection for browser apps (Web Speech default, Vosk-WASM fallback)

Readme

@auditionhub/voice-detector

Real-time line completion detection for spoken lines in the browser. Detect when a user has finished a line using pause duration, fuzzy suffix matching, or AI-powered semantic verification. Optimized for low latency with support for AssemblyAI streaming transcription and Gemini semantic analysis.

Install

npm install @auditionhub/voice-detector

🔒 Security Best Practices

Important: API keys should never be exposed to the browser! For production applications:

  • AssemblyAI: Generate temporary tokens server-side (see example below)
  • Gemini: Use geminiProxyUrl to proxy API calls through your server
  • See Remix Example for a complete secure implementation
// ✅ GOOD: Server generates tokens, client uses proxy
const detector = new LineDetector({
	assemblyAiToken: await fetchTokenFromServer(),
	geminiProxyUrl: '/api/gemini-proxy',
})

// ❌ BAD: Direct API keys in browser
const detector = new LineDetector({
	assemblyAiApiKey: 'secret-key', // ⚠️ Exposed to browser!
	geminiApiKey: 'secret-key', // ⚠️ Exposed to browser!
})

Basic usage

Simple Web Speech (Default)

import { LineDetector } from '@auditionhub/voice-detector'

const detector = new LineDetector({
	pauseMs: 800,
	suffixWords: 3,
	useWebSpeech: true,
})

await detector.init()

detector.on('lineComplete', (ev) => {
	console.log('Line complete:', ev.reason, ev.elapsedMs)
})

await detector.startLine({ text: 'I think we should head back now.' })

With AssemblyAI + Gemini (Recommended)

import { LineDetector } from '@auditionhub/voice-detector'

const detector = new LineDetector({
	assemblyAiToken: 'your-temporary-token', // Generate server-side for browser
	geminiApiKey: 'your-gemini-key',
	semanticThreshold: 0.75,
	enableSemanticMatching: true,
	pauseMs: 800,
})

await detector.init()

detector.on('semanticMatch', (ev) => {
	console.log('Match probability:', ev.probability)
	console.log('Reason:', ev.reason)
})

detector.on('lineComplete', (ev) => {
	console.log('Line complete:', ev.reason, ev.elapsedMs)
	console.log('Semantic score:', ev.semanticProbability)
})

await detector.startLine({ text: 'To be or not to be, that is the question.' })

Configuration

Core Options

  • pauseMs: number (default 800) – silence duration to trigger completion
  • suffixWords: number (default 3) – trailing words to compare for suffix match
  • suffixMaxDist: number (default 2) – max edit distance for suffix match
  • baseWPS: number (default 2.7) – initial words/sec for timing estimate
  • maxMultiplier: number (default 2.0) – timeout multiplier cap
  • useWebSpeech: boolean (default true) – prefer Web Speech API locally

AssemblyAI Options

  • assemblyAiToken: string (optional) – Temporary token for AssemblyAI real-time streaming (required for browser)
  • assemblyAiApiKey: string (optional) – API key for AssemblyAI (Node.js only, not for browser)
  • Important: For browser usage, you MUST use assemblyAiToken instead of assemblyAiApiKey. Generate tokens server-side:
    // Server-side (Node.js)
    import { AssemblyAI } from 'assemblyai'
    const client = new AssemblyAI({ apiKey: 'your-api-key' })
    const token = await client.streaming.createTemporaryToken({
    	expires_in_seconds: 480,
    })
    // Return token to client
  • Get your API key: https://www.assemblyai.com

Gemini Options

  • geminiApiKey: string (optional) – API key for Gemini 2.5 Flash Lite semantic matching
  • geminiProxyUrl: string (optional) – Recommended – Server-side proxy URL for Gemini API calls (keeps API key secure)
  • semanticThreshold: number (default 0.75) – probability threshold for semantic completion (0-1)
  • enableSemanticMatching: boolean (default true) – enable semantic verification

Security Best Practice: Use geminiProxyUrl instead of geminiApiKey in browser applications to keep your API key secure on the server. See Remix Example for a complete implementation.

Get your key: https://aistudio.google.com

Legacy Options

  • remoteURL: string | null – optional remote STT (future)

Events

  • lineStart: { text } – Fired when a new line starts
  • lineComplete: { reason: 'pause' | 'suffix' | 'semantic', elapsedMs, transcript?, avgWPS?, semanticProbability?, semanticReason? } – Fired when line is successfully detected
  • lineTimeout: { reason: 'timeout', elapsedMs, transcript?, avgWPS?, semanticProbability?, semanticReason? } – Fired if no completion detected in time
  • transcript: { text, isFinal } – Fired on each transcript update
  • semanticMatch: { probability, reason } – Fired when semantic match is calculated (Gemini only)

Speech-to-Text Adapters

The library supports multiple STT backends with automatic fallback:

  1. AssemblyAI (recommended): Real-time streaming transcription with WebSocket

    • Best accuracy and latency
    • Requires temporary token (for browser) or API key (for Node.js)
    • Generate tokens server-side for browser security
    • Automatic fallback if unavailable
  2. Web Speech API: Browser-native STT (default fallback)

    • No API key required
    • Good for simple use cases
    • Browser support varies
  3. Vosk-WASM (offline, WIP): Local on-device transcription

    • No internet required
    • Scaffolding included; full integration pending

Semantic Verification

When geminiApiKey is provided, the library uses Gemini 2.5 Flash Lite to verify semantic meaning:

  • Probability-based matching: Returns 0-1 score for how well transcript matches target
  • Smart threshold: Configurable matching strictness (default 75%)
  • Paraphrase-friendly: Understands synonyms and alternate phrasings
  • Low cost: $0.10/1M input tokens
  • Fast: ~180ms time-to-first-token

Examples

  • React: examples/react-basic – Complete demo with AssemblyAI + Gemini integration
  • Remix (client-only): examples/remix

How It Works

  1. VAD Detection: Voice Activity Detection monitors audio for speech/silence boundaries
  2. Transcription: AssemblyAI streams real-time transcription (or Web Speech as fallback)
  3. Semantic Verification: Gemini compares meaning between spoken and target text
  4. Completion: Line completes when:
    • Pause detected (configurable duration), OR
    • Suffix match found, OR
    • Semantic probability crosses threshold (Gemini only)

Roadmap

  • [x] AssemblyAI real-time streaming
  • [x] Gemini semantic verification
  • [ ] Vosk-WASM offline model + caching
  • [ ] Web Worker/AudioWorklet offload
  • [ ] Multi-language support