npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@jeffpeng3/nemotron-asr-core

v0.2.3

Published

In-browser ASR using NVIDIA Nemotron 3.5 (FastConformer-RNNT) via onnxruntime-web on WebGPU

Downloads

1,464

Readme

nemotron-asr-core

npm version

Streaming in-browser speech recognition using NVIDIA Nemotron 3.5 (FastConformer-RNNT) via onnxruntime-web on WebGPU. Fully client-side — no server needed. Supports live mic capture, file transcription, and 5 latency-accuracy profiles from a single INT4-quantized encoder.

npm install @jeffpeng3/nemotron-asr-core

Live Demo

A demo app is included under example/. Run it with Vite:

npm install
npm run dev

Then open the URL shown in the terminal (usually http://localhost:5173).

Features

  • Single INT4 encoder — all 5 profiles share one encoder.onnx (~462 KB model + ~733 MB weights, asymmetric INT4 quantized)
  • 5 latency profiles — 80 ms to 1120 ms via freeDimensionOverrides
  • Streaming & full-audio — mic capture or file transcription
  • Greedy + beam search (configurable 1–5 beams)
  • blankPenalty=0.5 by default — suppresses blank-dominated output for both greedy and beam search
  • Multilingual — auto-detect or pick from 20+ language IDs
  • WebGPU encoder — decoder/joint on WASM (CPU); single-threaded by default (numThreads=1, overridable)

Usage

import { AsrEngine } from "@jeffpeng3/nemotron-asr-core";

// Callbacks for UI updates
const engine = new AsrEngine({
  progress(label, loaded, total, cached) {
    console.log(`${label}: ${loaded}/${total}`);
  },
  status(detail) {
    console.log(detail);
  },
  partial(text, lang, progress) {
    console.log(`partial (${(progress * 100).toFixed(0)}%): ${text}`);
  },
  ep(encoder, provider, note) {
    console.log(`encoder: ${provider}${note ? ` (${note})` : ""}`);
  },
});

// Download model weights (~863 MB total, cached on-device)
await engine.init();

// ── Full audio transcription ──
const result = await engine.transcribe(samples, 101);
// samples: Float32Array of 16 kHz PCM
// 101 = auto-detect language

console.log(result.text);
// { text: "hello world <en-US>", lang: "en-US", tokens: 12, timing: { ... } }

// ── Streaming (mic / tab capture) ──
const session = engine.session(101);

// push chunks as they arrive
for (const chunk of audioChunks) {
  const partial = await session.feed(chunk);
  if (partial) console.log(partial.text);
}

const final = await session.end();
console.log(final.text);

// ── Benchmark ──
const results = await engine.benchmark({ duration: 10 });
for (const r of results) {
  console.log(`${r.profile} RTF ${r.rtf.toFixed(3)}`);
}

Latency Profiles

All profiles use the same encoder.onnx with a dynamic time dimension pinned at runtime via freeDimensionOverrides.

| Profile | Latency | Encoder Frames | |-----------|---------|----------------| | TURBO | 80 ms | 17 | | FAST | 160 ms | 25 | | BALANCED| 320 ms | 33 | | NORMAL | 560 ms | 49 | | HIGH | 1120 ms | 65 |

Lower latency = fewer context frames = less accurate. Choose the profile that fits your use case.

await engine.switchProfile("HIGH");  // highest accuracy

API

AsrEngine.preload(onProgress?)

Pre-download and cache all model files (vocab, encoder, decoder, joint) before creating an engine instance. Subsequent init() calls will find files already in cache and skip network. Useful for showing a download progress screen early in the app lifecycle.

await AsrEngine.preload((label, loaded, total, cached) => {
  console.log(`${label}: ${loaded}/${total}`);
});
// Now engine.init() will be near-instant
const engine = new AsrEngine(callbacks);
await engine.init();

Arguments:

| Argument | Type | Description | |----------|------|-------------| | onProgress | (label, loaded, total, cached?) => void | Optional download progress callback |

new AsrEngine(callbacks?, options?)

Options:

| Option | Default | Description | |--------|---------|-------------| | profile | "NORMAL" | Initial latency profile | | beamWidth | 1 | Beam search width (1 = greedy) | | blankPenalty | 0.5 | Subtract from blank logit (both greedy & beam) | | blankTop2Threshold | 0.3 | Secondary blank suppression threshold | | numThreads | 1 | WASM threads for decoder/joint | | wasmPaths | CDN | Custom path for onnxruntime-web WASM files. Set to a local directory when bundling in a Chrome extension or offline environment to avoid CSP issues. |

Callbacks:

| Callback | Arguments | Description | |----------|-----------|-------------| | progress | (label, loaded, total, cached?) | Model download progress | | status | (detail) | Status messages | | partial | (text, lang, progress?) | Partial transcription result | | ep | (isEncoder, provider, note?) | Execution provider selection |

session(langId)Session

Create a streaming session.

| Method | Returns | Description | |--------|---------|-------------| | feed(samples) | {text, lang} \| null | Push audio chunk, get partial result | | end() | {text, lang, tokens, timing} | Finalize and get complete result |

benchmark(opts?)BenchmarkProfileResult[]

Test all profiles and return RTF (Real-Time Factor) measurements.

Options: { profiles?, duration?, langId?, warmup?, forceAll?, samples? }

switchProfile(name)

Switch latency profile at runtime (reloads encoder session).

clearCache()

Remove cached model weights and reset all sessions. Next init() re-downloads from Hugging Face.

getPerfStats()Record<string, {ms, calls, avg}>

Per-operation performance statistics (encoderStep, decoderStep, jointArgmax).

Language IDs

Pass 101 for auto-detect. Use 0 for English, 4 for Chinese, etc. See LANG_TO_ID and langId() exports for the full list.

Architecture

app.js (main thread)  ←→  worker.js (Web Worker)  ←→  Hugging Face / Cache API
                            └── AsrEngine
                                  ├── Mel filterbank + FFT
                                  ├── Encoder (WebGPU — required)
                                  ├── Decoder + Joint (WASM, single-thread)
                                  └── RNN-T greedy / beam search
  • Model weights (~863 MB) are fetched once from Hugging Face and cached via the Cache API with a versioned cache name.
  • All inference runs off the main thread via a Web Worker.
  • Encoder requires WebGPU (D3D12 on Windows, Vulkan on Linux, Metal on macOS). Decoder + joint run on WASM (CPU).
  • WASM multi-threading (numThreads > 1) requires cross-origin isolation headers (Cross-Origin-Opener-Policy + Cross-Origin-Embedder-Policy). Disabled by default — pass { numThreads: 11 } to enable if your deployment supports it.

Requirements

  • Browser: Chrome 113+ / Edge 113+ with WebGPU. Safari 18+ on iOS. Firefox Nightly with dom.webgpu.enabled.
  • GPU: ~750 MB of GPU-accessible memory. Integrated GPUs may page weights over PCIe (slower).
  • Network: Model weights (~863 MB) downloaded once, cached locally.