npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@steelbrain/media-speech-detection-web

v1.2.0

Published

Production-ready speech detection using Silero VAD ONNX model for web browsers

Readme

@steelbrain/media-speech-detection-web

Speech Detection using Silero VAD ONNX model for web browsers.

Installation

npm install @steelbrain/media-speech-detection-web

Modern Bundler Support: This package is fully compatible with modern bundlers (Webpack 5, Next.js, Vite, etc.). The ONNX model file is automatically detected and bundled - no manual setup or public folder configuration required.

Quick Start

import { speechFilter, preloadModel } from '@steelbrain/media-speech-detection-web';
import { ingestAudioStream, RECOMMENDED_AUDIO_CONSTRAINTS } from '@steelbrain/media-ingest-audio';

// Optional: Preload model during app initialization for faster first use
await preloadModel();

// Get microphone access
const mediaStream = await navigator.mediaDevices.getUserMedia({
  audio: RECOMMENDED_AUDIO_CONSTRAINTS
});

// Create 16kHz audio stream
const audioStream = await ingestAudioStream(mediaStream);

// Filter audio to only speech chunks
const vadTransform = speechFilter({
  onSpeechStart: () => console.log('🎤 Speech started'),
  onSpeechEnd: () => console.log('🔇 Speech ended'),
  threshold: 0.5
});

await audioStream
  .pipeThrough(vadTransform)
  .pipeTo(speechProcessor);

// Events-only (no audio output) using .tee() pattern
const [processStream, eventsStream] = audioStream.tee();

// Process audio on one branch
processStream.pipeTo(speechProcessor);

// Handle events on another branch without outputting audio
eventsStream.pipeThrough(speechFilter({
  noEmit: true,  // Don't emit audio chunks
  onSpeechStart: () => console.log('🎤 Speech started'),
  onSpeechEnd: () => console.log('🔇 Speech ended'),
  onMisfire: () => console.log('⚠️ Short speech segment filtered')
}));

API Reference

preloadModel(): Promise<void>

Preloads the Silero VAD ONNX model by fetching it into browser cache, eliminating network delay when speech detection is first used.

Usage: await preloadModel() - Call during app initialization for optimal performance.

speechFilter(options): TransformStream<Float32Array, Float32Array>

Creates a TransformStream that filters audio, outputting only speech chunks. Use the noEmit option for events-only processing.

Usage: audioStream.pipeThrough(speechFilter(options)).pipeTo(processor)

Configuration Options

interface VADOptions {
  // Event Handlers
  onSpeechStart?: () => void;
  onSpeechEnd?: (speechAudio: Float32Array) => void;
  onMisfire?: () => void;
  onError?: (error: Error) => void;
  onDebugLog?: (message: string) => void;

  // Detection Configuration
  threshold?: number;              // Speech detection threshold (0-1). Default: 0.5
  minSpeechDurationMs?: number;    // Minimum speech duration in ms. Default: 160ms
  redemptionDurationMs?: number;   // Grace period before confirming speech end. Default: 400ms
  lookBackDurationMs?: number;     // Lookback buffer for smooth speech start. Default: 384ms
  
  // Stream Control
  noEmit?: boolean;               // Don't emit chunks, only trigger callbacks. Default: false
}

Optimal Defaults

The package provides carefully tuned defaults that work well for most use cases:

| Parameter | Default | Purpose | |-----------|---------|---------| | threshold | 0.5 | Balanced speech detection | | minSpeechDurationMs | 160ms | Filters out very short sounds | | redemptionDurationMs | 400ms | Handles natural speech pauses | | lookBackDurationMs | 384ms | Captures natural audio context before speech |

Advanced Usage

Error Handling & Debugging

const vadTransform = speechFilter({
  onSpeechStart: () => console.log('🎤 Speech started'),
  onSpeechEnd: () => console.log('🔇 Speech ended'),
  onError: (error) => console.error('VAD Error:', error),
  onDebugLog: (message) => console.log('VAD Debug:', message),
  threshold: 0.6
});

Real-time Speech Transcription Pipeline

// Preload model during app startup
await preloadModel();

// Complete pipeline: microphone → VAD → transcription
await audioStream
  .pipeThrough(speechFilter({
    onSpeechStart: () => showRecordingIndicator(),
    onSpeechEnd: () => hideRecordingIndicator(),
    threshold: 0.5
  }))
  .pipeThrough(transcriptionTransform)
  .pipeTo(displayResults);

Performance Optimization

// Preload model early in your application lifecycle
window.addEventListener('load', async () => {
  try {
    await preloadModel();
    console.log('VAD model preloaded and cached');
  } catch (error) {
    console.warn('Failed to preload VAD model:', error);
  }
});

How It Works

  1. Silero VAD Model: Uses the pre-trained Silero VAD ONNX model for production-ready accuracy
  2. Audio Processing: Processes 16kHz mono audio in 512-sample windows (32ms frames)
  3. State Machine: Implements a sophisticated state machine with speech/intermediate/silent states
  4. Lookback Buffer: Maintains a buffer to capture speech starts smoothly
  5. Temporal Smoothing: Uses configurable timing thresholds to prevent false triggers
  6. Web Streams: Built on modern Web Streams API for optimal performance and composability

Model Details

  • Model: Silero VAD v4.0 (MIT License)
  • Input: 16kHz mono audio, 512 samples per inference (32ms windows)
  • Output: Speech probability (0-1) per window + internal LSTM state
  • Model Size: ~2.3MB ONNX format
  • Performance: <1ms inference time per chunk on modern browsers
  • Accuracy: Enterprise-grade performance across diverse acoustic conditions

Credits

This package uses the Silero VAD model developed by Silero Team, licensed under MIT License. The model provides state-of-the-art speech detection with excellent performance across various languages and acoustic conditions.

License

MIT License - See LICENSE file for details.

Silero VAD Model: MIT License (© Silero Team)