@restnpeacepk/worker-vad
v1.0.5
Published
Universal Voice Activity Detection SDK for WebAssembly - supports multiple VAD engines with a unified API
Downloads
589
Maintainers
Readme
worker-vad
Universal Voice Activity Detection SDK - Multiple WASM engines, one simple API
Detect speech in audio streams with WebAssembly-powered engines. Perfect for Cloudflare Workers, browsers, and Node.js.
✨ Features
- 🎯 Unified API - One interface for all VAD engines
- 🔄 Multiple Engines - fvad, libfvad, rnnoise support
// Create VAD instance const vad = await VAD.create({ sampleRate: 16000, mode: 'aggressive' });
// Process audio const result = vad.process(audioData);
if (result.isSpeech) { console.log('Speech detected!'); }
// Cleanup vad.destroy();
## 📖 Usage
### Basic Example
```javascript
import { VAD } from 'worker-vad';
const vad = await VAD.create({ sampleRate: 16000 });
const audioData = new Int16Array(480); // 30ms at 16kHz
const result = vad.process(audioData);
console.log(result.isSpeech); // true/false
console.log(result.probability); // 0.0 - 1.0Web Audio API
import { VAD } from 'worker-vad';
// Get microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
// Create VAD
const vad = await VAD.create({ sampleRate: 16000 });
// Process audio
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
const pcm = VAD.floatTo16BitPCM(float32);
const result = vad.process(pcm);
if (result.isSpeech) {
console.log('Speaking!');
}
};
source.connect(processor);
processor.connect(audioContext.destination);Cloudflare Workers
import { VAD } from 'worker-vad';
export default {
async fetch(request) {
const vad = await VAD.create({
engine: 'fvad',
sampleRate: 16000
});
const audioBuffer = await request.arrayBuffer();
const result = vad.process(new Int16Array(audioBuffer));
vad.destroy();
return Response.json(result);
}
};🎛️ API Reference
VAD.create(options)
Create a new VAD instance.
Options:
engine- Engine to use ('auto','fvad','libfvad','rnnoise')sampleRate- Audio sample rate (8000, 16000, 32000, 48000)mode- VAD sensitivity ('quality','low','aggressive','very-aggressive')frameDuration- Frame duration in ms (10, 20, 30)
Returns: Promise<VAD>
vad.process(audioData)
Process audio data.
Parameters:
audioData- Int16Array of PCM audio data
Returns:
{
isSpeech: boolean,
probability: number,
timestamp: number,
processingTime: number,
engine: string,
metadata: object
}Utility Methods
VAD.floatTo16BitPCM(buffer) // Float32Array → Int16Array
VAD.int16ToFloat(buffer) // Int16Array → Float32Array
VAD.base64ToInt16(base64) // Base64 → Int16Array
VAD.int16ToBase64(buffer) // Int16Array → Base64
VAD.getAvailableEngines() // List engines
VAD.getEngineCapabilities(name) // Get engine info🔧 Supported Engines
| Engine | Size | Speed | Accuracy | Best For | |--------|------|-------|----------|----------| | fvad | 20KB | ⚡⚡⚡ | ⭐⭐⭐ | Workers, Browser, Node | | libfvad | 20KB | ⚡⚡⚡ | ⭐⭐⭐ | Browser, Node | | rnnoise | 100KB | ⚡⚡ | ⭐⭐⭐⭐ | Browser, Node |
📊 Performance
- Processing Speed: < 0.1ms per 30ms frame
- Bundle Size: 20KB (fvad engine)
- Memory Usage: < 1MB per instance
- Latency: < 50ms for real-time
🌐 Browser Support
- ✅ Chrome/Edge (latest)
- ✅ Firefox (latest)
- ✅ Safari (latest)
- ✅ Node.js 14+
- ✅ Cloudflare Workers
📝 Examples
See the examples directory for:
- Real-time microphone detection
- WebSocket streaming
- Batch processing
- Engine comparison
🤝 Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
📄 License
MIT © Your Name
🙏 Acknowledgments
- fvad-wasm - WebRTC VAD
- Cloudflare Workers - Serverless platform
