npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

omnivad

v0.2.12

Published

Cross-platform Voice Activity Detection and Audio Event Detection via WebAssembly. Runs in browsers, Web Workers, and Node.js. Built on FireRedVAD. Whisper-ready chunking included.

Readme

omnivad

npm npm bundle size license

Cross-platform Voice Activity Detection and Audio Event Detection via WebAssembly. Runs in browsers, Web Workers, and Node.js with a single API. Zero runtime dependencies. Built on FireRedVAD from Xiaohongshu (DFSMN architecture, ~2.2 MB per model).

What's in the box

| Class | Use case | Output | |-------|----------|--------| | OmniVAD | Whole-audio voice activity detection | [start, end] timestamps | | OmniStreamVAD | Real-time, frame-by-frame VAD with segment-boundary events | per-frame probability + start/end events | | OmniAED | Audio event detection (3-class) | speech / singing / music timestamps | | mergeChunks | Pack VAD output into Whisper-style 30 s chunks | { start, end, segStartIdx, segCount }[] |

All four share one WASM module (~2.2 MB SIMD-enabled), one C implementation, and a single bundle (~24 KB JS, ESM + CJS + types).

Install

pnpm add omnivad     # or: npm install omnivad / yarn add omnivad

Models are served from jsDelivr by default (zero config). For air-gapped or custom deployments, pass modelUrl or pre-loaded modelData.

Quickstart — whole-audio VAD

import { OmniVAD } from "omnivad";

const vad = await OmniVAD.create();

// Float32Array in [-1, 1] (Web Audio, decodeAudioData) or Int16Array (raw PCM)
const result = vad.detect(audioFloat32);
// { duration: 12.4, timestamps: [[0.35, 4.8], [5.1, 12.4]] }

Streaming VAD — real-time, frame-by-frame

OmniStreamVAD processes 10 ms frames (160 samples @ 16 kHz) and emits segment-boundary events on the same call that confirms the boundary — bit-identical to upstream FireRedVAD's FireRedStreamVad.

processFrame() accepts Float32Array in [-1, 1] (Web Audio, AudioWorkletProcessor, decoded WebRTC tracks) or Int16Array PCM (WAV / microphone). Dispatch is by dtype — no scaling in JS.

import { OmniStreamVAD } from "omnivad";

const vad = await OmniStreamVAD.create();

// Float32Array [-1, 1] from Web Audio:
for (let i = 0; i + 160 <= floatPcm.length; i += 160) {
  const r = vad.processFrame(floatPcm.subarray(i, i + 160));
  if (!r) continue;
  if (r.isSpeechStart) console.log(`START @ ${(r.speechStartFrame * 0.01).toFixed(2)}s`);
  if (r.isSpeechEnd)   console.log(`END   @ ${(r.speechEndFrame   * 0.01).toFixed(2)}s`);
}

// Or Int16Array PCM from a WAV file — same call, same result:
for (let i = 0; i + 160 <= int16Pcm.length; i += 160) {
  vad.processFrame(int16Pcm.subarray(i, i + 160));
}

processFrame() returns { confidence, smoothedProb, isSpeech, isSpeechStart, isSpeechEnd, frameIdx, speechStartFrame, speechEndFrame } — every field comes straight from the C state machine.

Audio Event Detection — speech / singing / music

import { OmniAED } from "omnivad";

const aed = await OmniAED.create();
const events = aed.detect(audioFloat32);
// { duration: 22.0,
//   events: { speech: [[...]], singing: [[...]], music: [[...]] },
//   ratios: { speech: 0.41, singing: 0.0, music: 0.59 } }

Whisper / WhisperX-style chunking

OmniVAD + mergeChunks(mode: "greedy") is the 1:1 equivalent of WhisperX's Binarize(max_duration=chunk_size) + greedy packing. Use this recipe when feeding chunks into Whisper-family ASR models that expect a fixed 30 s window:

import { OmniVAD, mergeChunks } from "omnivad";

const vad = await OmniVAD.create();                 // threshold=0.4 default — safer for Whisper
const result = vad.detect(audioFloat32);

const chunks = await mergeChunks(result.timestamps, {
  maxChunkSecs:    30.0,                            // Whisper input window
  mode:            "greedy",                        // WhisperX behavior
  padOnsetSecs:    0.04,
  padOffsetSecs:   0.04,
  minSilenceSecs:  0.20,
});
// Slice the audio at [chunk.start, chunk.end] and feed each slice to Whisper.

A second mode "longest_gap" exists for variable-length-input models (forced alignment, TTS) — see the GitHub README for the comparison table.

Multi-stream concurrency

OmniStreamVAD instances have mutable per-stream state and must not be shared across concurrent streams. Use clone() to spin up a fresh instance that shares the underlying model weights but has its own state — instant, near-zero memory overhead per stream.

const base = await OmniStreamVAD.create();
const streamA = base.clone();
const streamB = base.clone();
// Process two independent audio sessions in parallel.

Models and CDN

By default, models are fetched from jsDelivr:

https://cdn.jsdelivr.net/npm/omnivad@<version>/models/{vad,stream-vad,aed}.omnivad

Override per call when you need to host them yourself or pre-bundle:

const vad = await OmniVAD.create({
  modelUrl: "https://your-cdn/vad.omnivad",   // or
  modelData: arrayBufferYouAlreadyHave,
});

In Node.js, models are read from the installed package (omnivad/models/) — no network access required at runtime.

Performance

Real-Time Factor (lower = faster) on Apple M-series:

| Model | RTF | Speed | |-------|-----|-------| | VAD | ~0.003 | ~330× real-time | | Streaming VAD | ~0.002 | ~500× real-time | | AED | ~0.002 | ~500× real-time |

WASM is built with SIMD enabled and ncnn fp16 weights.

Accuracy

Verified bit-identical to upstream PyTorch reference on 5 audio files × 3 models — see the accuracy table in the main repo.

Browser, Worker, Node — same API

The package detects its runtime and loads the right glue:

  • Browsers (main thread) — classic-script injection of the Emscripten glue (works around MODULARIZE=1 IIFE issues with import()).
  • Web Workers / ServiceWorkers — same path via importScripts.
  • Node.js (≥ 18)createRequire + local CJS resolution. No bundler config needed.

See also

Credits

  • FireRedVAD — Kaituo Xu, Wenpeng Li, Kai Huang, Kun Liu (Xiaohongshu). Source models, DFSMN architecture, training pipeline.
  • ncnn — Tencent. Inference backend.
  • Emscripten — WebAssembly toolchain.

License

Apache-2.0 — same as upstream FireRedVAD.