onnx-asr-web

v0.1.3

Published

4 months ago

JavaScript ONNX ASR for Node.js and browser

0High
0Medium
0Low

sandergi

onnx asr onnxruntime-web speech-to-text

onnx-asr-web

JavaScript ONNX ASR for Node.js and browser using onnxruntime-web. This package was heavily inspired by the Python istupakov/onnx-asr and aims to be the minimalistic way to achieve state of the art automatic speech recognition with JavaScript.

Online Demo Here!

Features

Loads models from Hugging Face or local directories
Autodetects model-type from files
Supports quantized models
Works with WAV files/buffers
Uses Voice Activity Detection (VAD) to do long-form speech-to-text
Extracts word-level timestamps
Minimal dependencies (just onnxruntime-web)
Full types

Supported Model Types

Nvidia Parakeet, Canary, FastFormer, and Conformer
OpenAI Whisper
GigaChat GigaAM
Kaldi Icefall Zipformer
T-Tech T-one
Custom CTC, RNNT, TDT, and Transformer models

Install

npm install onnx-asr-web

onnxruntime-web must be 1.24.x or newer. Earlier versions can fail on some models (notably browser VAD graphs).

API Reference

Generated API docs are published in API.md. They are emitted from the TypeScript declaration output during npm run build, so they stay aligned with the shipped package surface.

Node.js

import {
  loadLocalModel,
  loadHuggingfaceModel,
  loadLocalVadModel,
  loadHuggingfaceVadModel,
} from "onnx-asr-web/node";

const vad = await loadHuggingfaceVadModel("onnx-community/silero-vad", {
  cacheDir: "models",
  quantization: "int8",
});

const local = await loadLocalModel("models/istupakov/parakeet-tdt-0.6b-v3-onnx", {
  quantization: "int8", // default: prefers *.int8.onnx, falls back to *.onnx
  sessionOptions: { executionProviders: ["wasm"] },
  vadModel: vad, // optional: chunks long audio by non-speech
});

const hf = await loadHuggingfaceModel("istupakov/parakeet-tdt-0.6b-v3-onnx", {
  cacheDir: "models",
  quantization: "int8",
  revision: "main",
});

loadHuggingfaceModel() downloads into ${cacheDir}/${repo_id} and reuses cached files.

Browser

import {
  configureOrtWeb,
  loadLocalModel,
  loadHuggingfaceModel,
  loadHuggingfaceVadModel,
} from "onnx-asr-web/browser";

configureOrtWeb({ wasmPaths: "/node_modules/onnxruntime-web/dist/" });

const vad = await loadHuggingfaceVadModel("onnx-community/silero-vad");
const modelA = await loadLocalModel("/models/parakeet-tdt-0.6b-v3-onnx/", { vadModel: vad });
const modelB = await loadHuggingfaceModel("istupakov/parakeet-tdt-0.6b-v3-onnx");

Transcription

const result = await model.transcribeWavBuffer(await file.arrayBuffer());
console.log(result.text);
console.log(result.words); // [{word, start, end}] in seconds

Use transcribeWavBuffer() when you have a real WAV file as bytes, for example from:

a browser file input
fs.readFile() in Node.js
a downloaded .wav asset

Use transcribeSamples() when you already have decoded mono PCM samples and know the sample rate:

const result = await model.transcribeSamples(float32Samples, sampleRate);
console.log(result.text);

This is usually the right choice when audio is coming from:

Web Audio API decoding such as AudioBuffer.getChannelData(...)
microphone capture pipelines that already produce PCM chunks
custom preprocessing or resampling code
non-WAV formats that you decode yourself before transcription

transcribeSamples() expects normalized mono PCM, typically a Float32Array with values in [-1, 1], plus the input sample rate. The library will resample internally when needed.

transcribeWavBuffer() is just a convenience wrapper: it decodes the WAV container first and then forwards the decoded samples into transcribeSamples().

Model Files

loadLocalModel() expects config.json plus model files referenced by model type:

TDT (nemo-conformer-tdt): nemo128.onnx, encoder-model.onnx, decoder_joint-model.onnx, and vocab.txt or tokens.txt
RNNT (nemo-conformer-rnnt): encoder-model.onnx, decoder_joint-model.onnx, and vocab.txt or tokens.txt
CTC (nemo-conformer-ctc): model.onnx and vocab.txt or tokens.txt
Canary AED (nemo-conformer-aed): encoder-model.onnx, decoder-model.onnx, and vocab.txt or tokens.txt
FastConformer (nemo-conformer): prefers RNNT split (encoder-model.onnx + decoder_joint-model.onnx) and falls back to CTC model.onnx, with vocab.txt or tokens.txt
GigaAM (gigaam): auto-detects v2_*/v3_* files, prefers RNNT triplet (*_rnnt_encoder/decoder/joint) and falls back to CTC (*_ctc.onnx), with v2_vocab.txt/v3_vocab.txt
Tone CTC (tone-ctc): model.onnx with vocab from decoder_params.vocabulary in config.json (or vocab.json)
Whisper ORT (whisper-ort): *_beamsearch.onnx model, plus vocab.json (and optionally added_tokens.json)
Whisper HF (whisper): onnx/encoder_model*.onnx, onnx/decoder_model_merged*.onnx, plus vocab.json (and optionally added_tokens.json)
Sherpa transducer (no config): am-onnx/ (or am/) with encoder.onnx, decoder.onnx, joiner.onnx, plus lang/tokens.txt (or tokens.txt)
VAD (onnx-community/silero-vad): onnx/model*.onnx (e.g. onnx/model_int8.onnx)

When quantization is enabled (quantization: "int8"), *.int8.onnx is preferred.

For Node Hugging Face downloads, *.onnx.data sidecars are also fetched when present. In browser mode, models are loaded by URL so ONNX Runtime can fetch sidecars automatically.

When vadModel is supplied to loadLocalModel() / loadHuggingfaceModel(), transcription runs on VAD speech chunks and returns a segments array in output.

Word timestamps are currently provided for NeMo transducer models. Whisper returns transcript text and token IDs; words is empty.

Examples

Node.js CLI

npm run build
node examples/node/transcribe.mjs --repo-id istupakov/parakeet-tdt-0.6b-v3-onnx --cache-dir models --audio test.wav

Browser UI

npx http-server . # then /examples/browser/index.html

Browser UI (CDN package import)

npx http-server . # then /examples/browser-cdn/index.html

Testing

Run type/syntax checks:

npm run check

Run integration model tests (requires local model folders under models/):

npm test

Build and Publish

Create distributable artifacts:

npm run build

This produces:

dist/index.js
dist/node.js
dist/browser.js

Publish to npm:

npm publish

Contributing

See CONTRIBUTING.md.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

onnx-asr-web

Features

Supported Model Types

Install

API Reference

Node.js

Browser

Transcription

Model Files

Examples

Node.js CLI

Browser UI

Browser UI (CDN package import)

Testing

Build and Publish

Contributing