npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

avr-vad

v1.0.9

Published

A Node.js library for Voice Activity Detection using Silero VAD

Readme

Agent Voice Response - AVR VAD - Silero Voice Activity Detection for Node.js

Discord GitHub Repo stars npm version npm downloads Ko-fi

🎤 A Node.js library for Voice Activity Detection using the Silero VAD model.

✨ Features

  • 🚀 Based on Silero VAD: Uses the pre-trained Silero ONNX model (v5 and legacy versions) for accurate results
  • 🎯 Real-time processing: Supports real-time frame-by-frame processing
  • Non-real-time processing: Batch processing for audio files and streams
  • 🔧 Configurable: Customizable thresholds and parameters for different needs
  • 🎵 Audio processing: Includes utilities for resampling and audio manipulation
  • 📊 Multiple models: Support for both Silero VAD v5 and legacy models
  • 💾 Bundled models: Models are included in the package, no external downloads required
  • 📝 TypeScript: Fully typed with TypeScript

🚀 Installation

npm install avr-vad

📖 Quick Start

Real-time Processing

import { RealTimeVAD } from 'avr-vad';

// Initialize the VAD with default options (Silero v5 model)
const vad = await RealTimeVAD.new({
  model: 'v5', // or 'legacy'
  positiveSpeechThreshold: 0.5,
  negativeSpeechThreshold: 0.35,
  preSpeechPadFrames: 1,
  redemptionFrames: 8,
  frameSamples: 1536,
  minSpeechFrames: 3
});

// Process audio frames in real-time
const audioFrame = getAudioFrameFromMicrophone(); // Float32Array of 1536 samples at 16kHz
const result = await vad.processFrame(audioFrame);

console.log(`Speech probability: ${result.probability}`);
console.log(`Speech detected: ${result.msg === 'SPEECH_START' || result.msg === 'SPEECH_CONTINUE'}`);

// Clean up when done
vad.destroy();

Non-Real-time Processing

import { NonRealTimeVAD } from 'avr-vad';

// Initialize for batch processing
const vad = await NonRealTimeVAD.new({
  model: 'v5',
  positiveSpeechThreshold: 0.5,
  negativeSpeechThreshold: 0.35
});

// Process entire audio buffer
const audioData = loadAudioData(); // Float32Array at 16kHz
const results = await vad.processAudio(audioData);

// Get speech segments
const speechSegments = vad.getSpeechSegments(results);
console.log(`Found ${speechSegments.length} speech segments`);

speechSegments.forEach((segment, i) => {
  console.log(`Segment ${i + 1}: ${segment.start}ms - ${segment.end}ms`);
});

// Clean up
vad.destroy();

⚙️ Configuration

Real-time VAD Options

interface RealTimeVADOptions {
  /** Model version to use ('v5' | 'legacy') */
  model?: 'v5' | 'legacy';
  
  /** Threshold for detecting speech start */
  positiveSpeechThreshold?: number;
  
  /** Threshold for detecting speech end */
  negativeSpeechThreshold?: number;
  
  /** Frames to include before speech detection */
  preSpeechPadFrames?: number;
  
  /** Frames to wait before ending speech */
  redemptionFrames?: number;
  
  /** Number of samples per frame (usually 1536 for 16kHz) */
  frameSamples?: number;
  
  /** Minimum frames for valid speech */
  minSpeechFrames?: number;
}

Non-Real-time VAD Options

interface NonRealTimeVADOptions {
  /** Model version to use ('v5' | 'legacy') */
  model?: 'v5' | 'legacy';
  
  /** Threshold for detecting speech start */
  positiveSpeechThreshold?: number;
  
  /** Threshold for detecting speech end */
  negativeSpeechThreshold?: number;
}

Default Values

// Real-time VAD defaults
const defaultRealTimeOptions = {
  model: 'v5',
  positiveSpeechThreshold: 0.5,
  negativeSpeechThreshold: 0.35,
  preSpeechPadFrames: 1,
  redemptionFrames: 8,
  frameSamples: 1536,
  minSpeechFrames: 3
};

// Non-real-time VAD defaults
const defaultNonRealTimeOptions = {
  model: 'v5',
  positiveSpeechThreshold: 0.5,
  negativeSpeechThreshold: 0.35
};

📊 Results and Messages

VAD Messages

The VAD returns different message types to indicate speech state changes:

enum Message {
  ERROR = 'ERROR',
  SPEECH_START = 'SPEECH_START',
  SPEECH_CONTINUE = 'SPEECH_CONTINUE', 
  SPEECH_END = 'SPEECH_END',
  SILENCE = 'SILENCE'
}

Processing Results

interface VADResult {
  /** Speech probability (0.0 - 1.0) */
  probability: number;
  
  /** Message indicating speech state */
  msg: Message;
  
  /** Audio data if speech segment ended */
  audio?: Float32Array;
}

Speech Segments

interface SpeechSegment {
  /** Start time in milliseconds */
  start: number;
  
  /** End time in milliseconds */
  end: number;
  
  /** Speech probability for this segment */
  probability: number;
}

🔧 Audio Utilities

The library includes various audio processing utilities:

import { utils, Resampler } from 'avr-vad';

// Resample audio to 16kHz (required for VAD)
const resampler = new Resampler({
  nativeSampleRate: 44100,
  targetSampleRate: 16000,
  targetFrameSize: 1536
});

const resampledFrame = resampler.process(audioFrame);

// Other utilities
const frameSize = utils.frameSize; // Get frame size for current sample rate
const audioBuffer = utils.concatArrays([frame1, frame2]); // Concatenate audio arrays

🎯 Advanced Examples

Real-time Speech Detection with Callbacks

import { RealTimeVAD, Message } from 'avr-vad';

class SpeechDetector {
  private vad: RealTimeVAD;
  private onSpeechStart?: (audio: Float32Array) => void;
  private onSpeechEnd?: (audio: Float32Array) => void;

  constructor(callbacks: {
    onSpeechStart?: (audio: Float32Array) => void;
    onSpeechEnd?: (audio: Float32Array) => void;
  }) {
    this.onSpeechStart = callbacks.onSpeechStart;
    this.onSpeechEnd = callbacks.onSpeechEnd;
  }

  async initialize() {
    this.vad = await RealTimeVAD.new({
      positiveSpeechThreshold: 0.5,
      negativeSpeechThreshold: 0.35
      onSpeechStart: this.onSpeechStart,
      onSpeechEnd: this.onSpeechEnd
    });
  }

  async processFrame(audioFrame: Float32Array) {
    const result = await this.vad.processFrame(audioFrame);
    return result;
  }

  destroy() {
    this.vad?.destroy();
  }
}

// Usage
const detector = new SpeechDetector({
  onSpeechStart: (audio) => console.log(`Speech started with ${audio.length} samples`),
  onSpeechEnd: (audio) => console.log(`Speech ended with ${audio.length} samples`)
});

await detector.initialize();

Batch Processing Audio File

import { NonRealTimeVAD, utils } from 'avr-vad';
import * as fs from 'fs';

async function processAudioFile(filePath: string) {
  // Load audio data (you'll need your own audio loading logic)
  const audioData = loadWavFile(filePath); // Float32Array at 16kHz
  
  const vad = await NonRealTimeVAD.new({
    model: 'v5',
    positiveSpeechThreshold: 0.6,
    negativeSpeechThreshold: 0.4
  });

  const results = await vad.processAudio(audioData);
  const segments = vad.getSpeechSegments(results);

  console.log(`Processing ${filePath}:`);
  console.log(`Total audio duration: ${(audioData.length / 16000).toFixed(2)}s`);
  console.log(`Speech segments found: ${segments.length}`);
  
  segments.forEach((segment, i) => {
    const duration = ((segment.end - segment.start) / 1000).toFixed(2);
    console.log(`  Segment ${i + 1}: ${segment.start}ms - ${segment.end}ms (${duration}s)`);
  });

  vad.destroy();
  return segments;
}

📝 Development

Requirements

  • Node.js >= 16.0.0
  • TypeScript >= 5.0.0

Build

npm run build

Test

npm test

Scripts

npm run lint      # Run ESLint
npm run clean     # Clean build directory
npm run prepare   # Build before npm install (automatically run)

📁 Project Structure

avr-vad/
├── src/
│   ├── index.ts                    # Main exports
│   ├── real-time-vad.ts           # Real-time VAD implementation  
│   └── common/
│       ├── index.ts               # Common exports
│       ├── frame-processor.ts     # Core ONNX processing
│       ├── non-real-time-vad.ts  # Batch processing VAD
│       ├── utils.ts               # Utility functions
│       ├── resampler.ts           # Audio resampling
├── dist/                          # Compiled JavaScript
├── test/                          # Test files
├── silero_vad_v5.onnx            # Silero VAD v5 model
├── silero_vad_legacy.onnx        # Silero VAD legacy model
└── package.json

🔧 Troubleshooting

Audio Format Requirements

The Silero VAD model requires:

  • Sample rate: 16kHz
  • Channels: Mono (single channel)
  • Format: Float32Array with values between -1.0 and 1.0
  • Frame size: 1536 samples (96ms at 16kHz)

Model Selection

  • v5 model: Latest version with improved accuracy
  • legacy model: Original model for compatibility

Use the Resampler utility to convert audio to the required format:

import { Resampler } from 'avr-vad';

const resampler = new Resampler({
  nativeSampleRate: 44100,    // Your audio sample rate
  targetSampleRate: 16000,    // Required by VAD
  targetFrameSize: 1536       // Required frame size
});

Performance Tips

  • Use appropriate thresholds for your use case
  • Consider using the legacy model for lower resource usage
  • For real-time applications, ensure your audio processing pipeline can handle 16kHz/1536 samples per frame
  • Use redemptionFrames to avoid choppy speech detection

Acknowledgments

Support & Community

Support AVR

AVR is free and open-source. If you find it valuable, consider supporting its development:

License

MIT License - see the LICENSE.md file for details.