@jorastechnologies/silvero-vad-js
v0.1.0
Published
Pure-JS voice activity detection (SileroVAD v5) — no WASM, no ONNX Runtime, works on iOS Safari
Maintainers
Readme
silvero-vad-js
Pure-JavaScript SileroVAD v5 inference engine. No WASM, no ONNX Runtime — runs on all browsers with AudioWorklet support. Matches the official silero-vad Python wrapper's output frame-by-frame.
Why this exists
Every JS SileroVAD library depends on onnxruntime-web (WASM). That works on most browsers, but ONNX Runtime WASM has been broken on iOS Safari since iOS 16.4+, silently failing on iPhones. This engine reimplements the model's forward pass in vanilla JS — no WASM dependency at all — so it works everywhere: Chrome, Firefox, Safari (desktop and iOS), and any browser that supports AudioWorklet.
Install
npm install @jorastechnologies/silvero-vad-jsSetup
Copy the runtime assets to your app's public/static directory:
cp node_modules/@jorastechnologies/silvero-vad-js/weights/silero_vad_v5.bin public/vad/
cp node_modules/@jorastechnologies/silvero-vad-js/weights/silero_vad_v5.manifest.json public/vad/
cp node_modules/@jorastechnologies/silvero-vad-js/src/vad_processor.js public/vad/The weight files (~1.2 MB) and AudioWorklet script must be served as static files — they are fetched by the browser at runtime and cannot be inlined into a JS bundle.
Usage
import { VADRecorder } from '@jorastechnologies/silvero-vad-js';
const rec = new VADRecorder({
weightsBinUrl: '/vad/silero_vad_v5.bin',
weightsManifestUrl: '/vad/silero_vad_v5.manifest.json',
workletUrl: '/vad/vad_processor.js',
threshold: 0.5,
silenceMs: 480,
});
rec.addEventListener('speech-start', () => console.log('speaking'));
rec.addEventListener('speech-end', (e) => sendToSTT(e.detail.audio));
await rec.start();speech-end carries the accumulated audio as a Float32Array at 16kHz (event.detail.audio + event.detail.sampleRate).
Deploying inside an existing Flask + nginx app
Drop the runtime files into Flask's static folder:
your-flask-app/static/vad/
├── silero_vad_v5.bin # ~1.2 MB (16kHz-only weights)
├── silero_vad_v5.manifest.json # ~1 KB
├── vad_processor.js # AudioWorklet — MUST be same-origin
├── silero_vad.js
├── weights_loader.js
├── ops.js
├── vad_recorder.js
└── index.jsDeploy script:
cp node_modules/@jorastechnologies/silvero-vad-js/src/*.js your-flask-app/static/vad/
cp node_modules/@jorastechnologies/silvero-vad-js/weights/silero_vad_v5.bin your-flask-app/static/vad/
cp node_modules/@jorastechnologies/silvero-vad-js/weights/silero_vad_v5.manifest.json your-flask-app/static/vad/Template snippet:
<script type="module">
import { VADRecorder } from "{{ url_for('static', filename='vad/index.js') }}";
const rec = new VADRecorder({
weightsBinUrl: "{{ url_for('static', filename='vad/silero_vad_v5.bin') }}",
weightsManifestUrl: "{{ url_for('static', filename='vad/silero_vad_v5.manifest.json') }}",
workletUrl: "{{ url_for('static', filename='vad/vad_processor.js') }}",
});
rec.addEventListener('speech-end', (e) => sendToSTT(e.detail.audio));
await rec.start();
</script>nginx checklist
.jsMIME type —curl -I https://yourdomain/static/vad/vad_processor.jsmust returnContent-Type: application/javascript(ortext/javascript). Defaultmime.typescovers this.Long cache on weights:
location ~* /static/vad/.*\.(bin|json)$ { expires 1y; add_header Cache-Control "public, immutable"; }On model upgrades, version via query string (
silero_vad_v5.bin?v=2) — don't rename the file.Same-origin for the worklet — AudioWorklet refuses cross-origin module loads even with CORS. Serving from
/static/vad/(same origin as the page) is the simplest fix.
iOS Safari requirements
- HTTPS is mandatory — iOS blocks
getUserMediaon plain HTTP from non-localhost hosts. Use Caddy + Let's Encrypt, nginx + certbot, or Cloudflare in front of nginx. - iOS Safari 14.5+ is required for AudioWorklet.
- The
AudioContextsample rate will be 48kHz on iOS; Chrome on macOS returns 16kHz when requested. Both work — the engine doesn't care about the source rate because theAudioContextis explicitly constructed at 16kHz and the browser resamples.
Architecture
See docs/ARCHITECTURE.md for the per-layer breakdown of the 16kHz forward pass, including the non-obvious STFT input assembly (prepend 64 context samples, right-reflect by 64, total 640 samples into the STFT conv).
The engine ignores the 8kHz path in the ONNX graph and implements only the 16kHz branch — that's all iPhone audio capture needs.
Development
Contributing or regenerating weights from the ONNX source:
git clone <repo-url> && cd silvero-vad-js
npm install
python3 -m pip install -r scripts/requirements.txt
# One-time: download ONNX model + generate weights + golden fixtures
curl -L -o weights/silero_vad_v5.onnx \
https://github.com/snakers4/silero-vad/raw/master/src/silero_vad/data/silero_vad.onnx
npm run weights:export
npm run weights:golden
npm test # run all tests (28 pass)
npm run test:watch # TDD
npm run demo # http://localhost:8000/examples/index.htmlAll primitive ops (src/ops.js) are pure functions with unit tests. The integration test (tests/silero_vad.test.js) validates the forward pass against the official silero-vad Python wrapper output within 1e-4 per frame.
License
MIT (matches the SileroVAD model license).
