npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

onnx-asr-web

v0.1.3

Published

JavaScript ONNX ASR for Node.js and browser

Readme

onnx-asr-web

npm version npm downloads license

JavaScript ONNX ASR for Node.js and browser using onnxruntime-web. This package was heavily inspired by the Python istupakov/onnx-asr and aims to be the minimalistic way to achieve state of the art automatic speech recognition with JavaScript.

Online Demo Here!

Features

  • Loads models from Hugging Face or local directories
  • Autodetects model-type from files
  • Supports quantized models
  • Works with WAV files/buffers
  • Uses Voice Activity Detection (VAD) to do long-form speech-to-text
  • Extracts word-level timestamps
  • Minimal dependencies (just onnxruntime-web)
  • Full types

Supported Model Types

  • Nvidia Parakeet, Canary, FastFormer, and Conformer
  • OpenAI Whisper
  • GigaChat GigaAM
  • Kaldi Icefall Zipformer
  • T-Tech T-one
  • Custom CTC, RNNT, TDT, and Transformer models

Install

npm install onnx-asr-web

onnxruntime-web must be 1.24.x or newer. Earlier versions can fail on some models (notably browser VAD graphs).

API Reference

Generated API docs are published in API.md. They are emitted from the TypeScript declaration output during npm run build, so they stay aligned with the shipped package surface.

Node.js

import {
  loadLocalModel,
  loadHuggingfaceModel,
  loadLocalVadModel,
  loadHuggingfaceVadModel,
} from "onnx-asr-web/node";

const vad = await loadHuggingfaceVadModel("onnx-community/silero-vad", {
  cacheDir: "models",
  quantization: "int8",
});

const local = await loadLocalModel("models/istupakov/parakeet-tdt-0.6b-v3-onnx", {
  quantization: "int8", // default: prefers *.int8.onnx, falls back to *.onnx
  sessionOptions: { executionProviders: ["wasm"] },
  vadModel: vad, // optional: chunks long audio by non-speech
});

const hf = await loadHuggingfaceModel("istupakov/parakeet-tdt-0.6b-v3-onnx", {
  cacheDir: "models",
  quantization: "int8",
  revision: "main",
});

loadHuggingfaceModel() downloads into ${cacheDir}/${repo_id} and reuses cached files.

Browser

import {
  configureOrtWeb,
  loadLocalModel,
  loadHuggingfaceModel,
  loadHuggingfaceVadModel,
} from "onnx-asr-web/browser";

configureOrtWeb({ wasmPaths: "/node_modules/onnxruntime-web/dist/" });

const vad = await loadHuggingfaceVadModel("onnx-community/silero-vad");
const modelA = await loadLocalModel("/models/parakeet-tdt-0.6b-v3-onnx/", { vadModel: vad });
const modelB = await loadHuggingfaceModel("istupakov/parakeet-tdt-0.6b-v3-onnx");

Transcription

const result = await model.transcribeWavBuffer(await file.arrayBuffer());
console.log(result.text);
console.log(result.words); // [{word, start, end}] in seconds

Use transcribeWavBuffer() when you have a real WAV file as bytes, for example from:

  • a browser file input
  • fs.readFile() in Node.js
  • a downloaded .wav asset

Use transcribeSamples() when you already have decoded mono PCM samples and know the sample rate:

const result = await model.transcribeSamples(float32Samples, sampleRate);
console.log(result.text);

This is usually the right choice when audio is coming from:

  • Web Audio API decoding such as AudioBuffer.getChannelData(...)
  • microphone capture pipelines that already produce PCM chunks
  • custom preprocessing or resampling code
  • non-WAV formats that you decode yourself before transcription

transcribeSamples() expects normalized mono PCM, typically a Float32Array with values in [-1, 1], plus the input sample rate. The library will resample internally when needed.

transcribeWavBuffer() is just a convenience wrapper: it decodes the WAV container first and then forwards the decoded samples into transcribeSamples().

Model Files

loadLocalModel() expects config.json plus model files referenced by model type:

  • TDT (nemo-conformer-tdt): nemo128.onnx, encoder-model.onnx, decoder_joint-model.onnx, and vocab.txt or tokens.txt
  • RNNT (nemo-conformer-rnnt): encoder-model.onnx, decoder_joint-model.onnx, and vocab.txt or tokens.txt
  • CTC (nemo-conformer-ctc): model.onnx and vocab.txt or tokens.txt
  • Canary AED (nemo-conformer-aed): encoder-model.onnx, decoder-model.onnx, and vocab.txt or tokens.txt
  • FastConformer (nemo-conformer): prefers RNNT split (encoder-model.onnx + decoder_joint-model.onnx) and falls back to CTC model.onnx, with vocab.txt or tokens.txt
  • GigaAM (gigaam): auto-detects v2_*/v3_* files, prefers RNNT triplet (*_rnnt_encoder/decoder/joint) and falls back to CTC (*_ctc.onnx), with v2_vocab.txt/v3_vocab.txt
  • Tone CTC (tone-ctc): model.onnx with vocab from decoder_params.vocabulary in config.json (or vocab.json)
  • Whisper ORT (whisper-ort): *_beamsearch.onnx model, plus vocab.json (and optionally added_tokens.json)
  • Whisper HF (whisper): onnx/encoder_model*.onnx, onnx/decoder_model_merged*.onnx, plus vocab.json (and optionally added_tokens.json)
  • Sherpa transducer (no config): am-onnx/ (or am/) with encoder.onnx, decoder.onnx, joiner.onnx, plus lang/tokens.txt (or tokens.txt)
  • VAD (onnx-community/silero-vad): onnx/model*.onnx (e.g. onnx/model_int8.onnx)

When quantization is enabled (quantization: "int8"), *.int8.onnx is preferred.

For Node Hugging Face downloads, *.onnx.data sidecars are also fetched when present. In browser mode, models are loaded by URL so ONNX Runtime can fetch sidecars automatically.

When vadModel is supplied to loadLocalModel() / loadHuggingfaceModel(), transcription runs on VAD speech chunks and returns a segments array in output.

Word timestamps are currently provided for NeMo transducer models. Whisper returns transcript text and token IDs; words is empty.

Examples

Node.js CLI

npm run build
node examples/node/transcribe.mjs --repo-id istupakov/parakeet-tdt-0.6b-v3-onnx --cache-dir models --audio test.wav

Browser UI

npx http-server . # then /examples/browser/index.html

Browser UI (CDN package import)

npx http-server . # then /examples/browser-cdn/index.html

Testing

Run type/syntax checks:

npm run check

Run integration model tests (requires local model folders under models/):

npm test

Build and Publish

Create distributable artifacts:

npm run build

This produces:

  • dist/index.js
  • dist/node.js
  • dist/browser.js

Publish to npm:

npm publish

Contributing

See CONTRIBUTING.md.