@restnpeacepk/worker-vad

v1.0.5

Published

15 days ago

Universal Voice Activity Detection SDK for WebAssembly - supports multiple VAD engines with a unified API

Downloads

589

0High
0Medium
0Low

restnpeacepk

vad voice-activity-detection webassembly wasm speech-detection audio cloudflare-workers fvad webrtc real-time streaming microphone voice audio-processing

worker-vad

Universal Voice Activity Detection SDK - Multiple WASM engines, one simple API

Detect speech in audio streams with WebAssembly-powered engines. Perfect for Cloudflare Workers, browsers, and Node.js.

✨ Features

🎯 Unified API - One interface for all VAD engines
🔄 Multiple Engines - fvad, libfvad, rnnoise support

// Create VAD instance const vad = await VAD.create({ sampleRate: 16000, mode: 'aggressive' });

// Process audio const result = vad.process(audioData);

if (result.isSpeech) { console.log('Speech detected!'); }

// Cleanup vad.destroy();


## 📖 Usage

### Basic Example

```javascript
import { VAD } from 'worker-vad';

const vad = await VAD.create({ sampleRate: 16000 });
const audioData = new Int16Array(480); // 30ms at 16kHz

const result = vad.process(audioData);
console.log(result.isSpeech);      // true/false
console.log(result.probability);   // 0.0 - 1.0

Web Audio API

import { VAD } from 'worker-vad';

// Get microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);

// Create VAD
const vad = await VAD.create({ sampleRate: 16000 });

// Process audio
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
  const float32 = e.inputBuffer.getChannelData(0);
  const pcm = VAD.floatTo16BitPCM(float32);
  
  const result = vad.process(pcm);
  if (result.isSpeech) {
    console.log('Speaking!');
  }
};

source.connect(processor);
processor.connect(audioContext.destination);

Cloudflare Workers

import { VAD } from 'worker-vad';

export default {
  async fetch(request) {
    const vad = await VAD.create({
      engine: 'fvad',
      sampleRate: 16000
    });
    
    const audioBuffer = await request.arrayBuffer();
    const result = vad.process(new Int16Array(audioBuffer));
    
    vad.destroy();
    
    return Response.json(result);
  }
};

🎛️ API Reference

`VAD.create(options)`

Create a new VAD instance.

Options:

engine - Engine to use ('auto', 'fvad', 'libfvad', 'rnnoise')
sampleRate - Audio sample rate (8000, 16000, 32000, 48000)
mode - VAD sensitivity ('quality', 'low', 'aggressive', 'very-aggressive')
frameDuration - Frame duration in ms (10, 20, 30)

Returns: Promise<VAD>

`vad.process(audioData)`

Process audio data.

Parameters:

audioData - Int16Array of PCM audio data

Returns:

{
  isSpeech: boolean,
  probability: number,
  timestamp: number,
  processingTime: number,
  engine: string,
  metadata: object
}

Utility Methods

VAD.floatTo16BitPCM(buffer)      // Float32Array → Int16Array
VAD.int16ToFloat(buffer)         // Int16Array → Float32Array
VAD.base64ToInt16(base64)        // Base64 → Int16Array
VAD.int16ToBase64(buffer)        // Int16Array → Base64
VAD.getAvailableEngines()        // List engines
VAD.getEngineCapabilities(name)  // Get engine info

🔧 Supported Engines

| Engine | Size | Speed | Accuracy | Best For | |--------|------|-------|----------|----------| | fvad | 20KB | ⚡⚡⚡ | ⭐⭐⭐ | Workers, Browser, Node | | libfvad | 20KB | ⚡⚡⚡ | ⭐⭐⭐ | Browser, Node | | rnnoise | 100KB | ⚡⚡ | ⭐⭐⭐⭐ | Browser, Node |

📊 Performance

Processing Speed: < 0.1ms per 30ms frame
Bundle Size: 20KB (fvad engine)
Memory Usage: < 1MB per instance
Latency: < 50ms for real-time

🌐 Browser Support

✅ Chrome/Edge (latest)
✅ Firefox (latest)
✅ Safari (latest)
✅ Node.js 14+
✅ Cloudflare Workers

📝 Examples

See the examples directory for:

Real-time microphone detection
WebSocket streaming
Batch processing
Engine comparison

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

📄 License

🙏 Acknowledgments

fvad-wasm - WebRTC VAD
Cloudflare Workers - Serverless platform

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme