browser-whisper
v1.0.0
Published
Browser-native audio transcription powered by WebGPU Whisper — zero server, fully local.
Downloads
1,437
Maintainers
Readme
browser-whisper
Browser-native audio transcription powered by WebCodecs + WebGPU. No server. No API keys.
browser-whisper runs OpenAI's Whisper model entirely in the browser. It uses WebCodecs to decode audio from any file format with hardware acceleration, and WebGPU to run ONNX inference — falling back to WASM automatically when either is unavailable.
Live Demo → · Vite example → · Next.js example →
Features
- WebGPU inference — runs ONNX Whisper models on the GPU via
@huggingface/transformers; falls back to WASM automatically - WebCodecs audio decoding — hardware-accelerated decode of any audio/video format via
mediabunny; falls back toAudioContexton older browsers - Concurrent pipeline — model loading and audio decoding run in parallel across two Web Workers
- Zero-copy PCM transfer — audio frames move from the decoder worker to the inference worker via
MessageChannelwithArrayBuffertransfer, no copying - Streaming API — results are yielded as an async iterator, segment by segment
- Model caching — weights are cached in the browser Cache API after the first download
- TypeScript-first — full type definitions included
How it works
File
│
▼
[Decoder Worker]
mediabunny (demux) → WebCodecs AudioDecoder → mono 16 kHz PCM → 30 s chunks
│ (fallback: AudioContext.decodeAudioData)
│ MessageChannel (zero-copy ArrayBuffer transfer)
▼
[Whisper Worker]
@huggingface/transformers → ONNX Runtime → WebGPU ──► TranscriptSegments
(fallback: WASM)
│
▼
Main Thread (async iterator / callbacks)Both workers are started concurrently: the Whisper worker begins downloading and compiling the model while the decoder worker demuxes and decodes the audio file. Chunks queued before the model is ready are buffered and processed in order.
Install
npm install browser-whisper
# or
bun add browser-whisperPeer dependencies: none.
@huggingface/transformersandmediabunnyare bundled into the library's worker blobs at build time and resolved automatically at runtime.
Setup
The ONNX inference engine uses SharedArrayBuffer for threading, which requires two HTTP headers on every page that loads the library:
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-originVite
// vite.config.ts
export default defineConfig({
server: {
headers: {
'Cross-Origin-Embedder-Policy': 'require-corp',
'Cross-Origin-Opener-Policy': 'same-origin',
},
},
preview: {
headers: {
'Cross-Origin-Embedder-Policy': 'require-corp',
'Cross-Origin-Opener-Policy': 'same-origin',
},
},
})Next.js
See NEXTJS.md for the full guide, including SSR-safe dynamic imports and header configuration.
Quick start
import { BrowserWhisper } from 'browser-whisper'
const whisper = new BrowserWhisper()
// file from <input type="file"> or drag-and-drop
const file = event.target.files[0]
// Stream segments as they arrive
for await (const segment of whisper.transcribe(file)) {
console.log(`[${segment.start.toFixed(1)}s] ${segment.text}`)
}Collect all segments at once
const segments = await whisper.transcribe(file).collect()
console.log(segments.map(s => s.text).join(' '))With callbacks and options
const whisper = new BrowserWhisper({
model: 'whisper-small',
language: 'en',
})
whisper.transcribe(file, {
onSegment: (seg) => appendToUI(seg),
onProgress: (evt) => {
console.log(evt.stage) // 'loading' | 'decoding' | 'transcribing'
console.log(evt.progress) // 0 – 1
},
})API
new BrowserWhisper(options?)
Creates a reusable transcriber instance. The loaded model is cached in the worker between calls.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| model | WhisperModel | 'whisper-base' | Which Whisper model to use |
| language | string | auto-detect | BCP-47 language code, e.g. 'en', 'fr', 'ja' |
| quantization | QuantizationType | 'hybrid' | Model precision |
whisper.transcribe(file, options?)
Returns a TranscribeStream. Options passed here override constructor options for this call only.
| Option | Type | Description |
|--------|------|-------------|
| model | WhisperModel | Override model for this file |
| language | string | Override language for this file |
| quantization | QuantizationType | Override quantization for this file |
| onSegment | (seg: TranscriptSegment) => void | Called for each transcribed segment |
| onProgress | (evt: TranscribeProgress) => void | Called with stage and 0–1 progress |
TranscribeStream
Returned by whisper.transcribe(). Implements the async iterator protocol and has one helper:
const segments = await stream.collect() // resolves with TranscriptSegment[]WhisperModel
Sizes below are for the default 'hybrid' quantization (encoder fp32 + decoder q4).
| Value | Download size | Notes |
|-------|--------------|-------|
| 'whisper-tiny' | ~64 MB | Fastest |
| 'whisper-base' | ~136 MB | Default |
| 'whisper-small' | ~510 MB | Better accuracy |
| 'whisper-large' | ~3 GB | whisper-large-v3-turbo; best accuracy |
Other quantizations will differ. Models are downloaded from Hugging Face Hub (onnx-community namespace) and cached in the browser after the first run.
QuantizationType
| Value | Description |
|-------|-------------|
| 'hybrid' | Encoder fp32 + decoder q4 — default, best speed/accuracy balance |
| 'fp32' | Full precision |
| 'fp16' | Half precision |
| 'q8' | 8-bit quantized |
| 'q4' | 4-bit quantized |
TranscriptSegment
interface TranscriptSegment {
text: string
start: number // seconds from start of file
end: number // seconds from start of file
}TranscribeProgress
interface TranscribeProgress {
stage: 'loading' | 'decoding' | 'transcribing'
progress: number // 0 – 1
}Errors
All errors extend BrowserWhisperError.
| Class | When thrown |
|-------|-------------|
| WebCodecsNotSupportedError | AudioDecoder is unavailable and the AudioContext fallback also fails |
| CodecNotSupportedError | The file's audio codec is not decodable in this browser |
| NoAudioTrackError | The file has no audio track |
| ModelLoadError | The Whisper model failed to download or initialise |
| DecoderError | The WebCodecs AudioDecoder emitted a fatal error |
Browser support
WebGPU and WebCodecs are the primary paths. Both have automatic fallbacks so the library works on a broader range of browsers.
| Browser | WebGPU inference | WASM inference fallback | |---------|-----------------|-------------------------| | Chrome | 113+ | 94+ | | Firefox | 141+ | 130+ | | Safari | 18+ | 16.4+ |
| Browser | WebCodecs decoding | AudioContext fallback | |---------|-------------------|----------------------| | Chrome | 94+ | all | | Firefox | 130+ | all | | Safari | 16.4+ | all |
The library detects both features at runtime and falls back silently — no configuration needed.
Network required on first run: WASM binaries (~1 MB) are loaded from jsDelivr CDN, and model weights (64 MB – 3 GB depending on model) are streamed from Hugging Face Hub. Both are cached in the browser after the first run; subsequent calls work offline.
Contributing
Contributions are welcome. Please open an issue before submitting a large PR so we can discuss the approach.
# Install dependencies
bun install
# Start dev server (runs the demo app)
bun run dev:site
# Type-check
bun run typecheck
# Build the library
bun run buildThe library is built with Vite. Workers are bundled as self-contained inline blobs using the ?worker&inline query — see vite.config.ts for details.
License
MIT — Tanpreet Singh Jolly
