browser-whisper

v1.1.0

Published

a month ago

Browser-native audio transcription powered by WebGPU Whisper — zero server, fully local.

Downloads

2,849

0High
0Medium
0Low

tanpreetjolly

whisper speech-to-text audio-to-text webgpu browser offline transformers.js

browser-whisper — in-browser speech-to-text with WebGPU

browser-whisper

Transcribe audio and video in the browser with Whisper — fully local, no backend, no API keys.

Live demo · Documentation · Examples · GitHub

What is this?

browser-whisper is a TypeScript library that turns files (or microphone audio) into text using Whisper, entirely in the user’s browser.

WebCodecs decodes audio and video, with an AudioContext fallback when needed
WebGPU runs the ONNX model on the GPU, with WASM fallback when WebGPU is unavailable
Two Web Workers decode and transcribe in parallel so model load and file read overlap
OPFS caching keeps model weights after the first download for faster or offline repeat use

Audio never leaves the device. You do not need an OpenAI API key.

Install

npm install browser-whisper

bun add browser-whisper

No peer dependencies — mediabunny and @huggingface/transformers are used inside the library’s workers.

Quick start

import { BrowserWhisper } from 'browser-whisper'

const whisper = new BrowserWhisper({ model: 'whisper-base' })
const file = document.querySelector('input[type=file]').files[0]

for await (const { text, start, end } of whisper.transcribe(file)) {
  console.log(`[${start.toFixed(1)}s – ${end.toFixed(1)}s] ${text}`)
}

Collect all segments:

const segments = await whisper.transcribe(file).collect()

Mono 16 kHz Float32Array (e.g. from a VAD):

const segments = await whisper.transcribePCM(samples).collect()

Before you ship: COOP / COEP headers

Threaded WASM needs cross-origin isolation on the page that loads the library:

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

For Vite, set those on server and preview. For Next.js, set them in next.config and import the library only on the client.

Full setup (Vite, Next.js, deploy) is in the documentation.

Common patterns

Model and language

const whisper = new BrowserWhisper({
  model: 'whisper-small',
  language: 'en', // optional — omit for auto-detect
})

Use a *_timestamped model (e.g. whisper-base_timestamped) for word-level timestamps. See the live demo.

Progress and segments

whisper.transcribe(file, {
  onSegment: (seg) => renderLine(seg),
  onProgress: ({ stage, progress }) => {
    // stage: 'loading' | 'decoding' | 'transcribing'
    updateBar(progress) // 0 – 1
  },
})

Pre-download a model

await whisper.downloadModel({
  model: 'whisper-small',
  onProgress: ({ stage, progress }) => updateBar(progress),
})

Supports AbortSignal. Example: OPFS cache demo.

Clear cache

await BrowserWhisper.clearCache()
await BrowserWhisper.deleteModel('whisper-tiny')

Models

Default: whisper-base. Hybrid quantization (encoder fp32 + decoder q4). Weights from Hugging Face (onnx-community), cached in the browser after first use.

| Model | Download (approx.) | Notes | |-------|-------------------|--------| | whisper-tiny | ~64 MB | Fastest | | whisper-base | ~136 MB | Default | | whisper-small | ~510 MB | Better accuracy | | whisper-large-v3-turbo | ~2.7 GB | Strongest Whisper option | | moonshine-tiny / moonshine-base | ~32–61 MB | English only | | distil-whisper-small | ~185 MB | English only |

Timestamped, lite, and large-v3 variants are supported too. Full list: docs — Models.

How it works

Your file
  → Decoder worker (mediabunny + WebCodecs → 16 kHz mono chunks)
    → Whisper worker (Transformers.js + ONNX on WebGPU)
      → Main thread (async iterator / callbacks)

The decoder and model loader start together. Chunks that arrive before the model is ready are queued and processed in order.

Browser support

| | Chrome | Firefox | Safari | |---|--------|---------|--------| | WebGPU | 113+ | 141+ | 18+ | | WebCodecs | 94+ | 130+ | 16.4+ |

Missing features fall back automatically. First run needs network for WASM (~1 MB) and model weights; after caching, transcription can work offline.

Links

| | | |---|---| | Documentation | Install, headers, API, all models | | Live demo | Upload and transcribe in the browser | | Examples | OPFS cache, live mic + VAD | | Vite example | Minimal Vite app | | Next.js example | App Router, client-only |

API types (TranscriptSegment, errors, QuantizationType, …): docs — API or dist/index.d.ts after install.

Development

git clone https://github.com/tanpreetjolly/browser-whisper.git
cd browser-whisper
bun install
bun run dev:site
bun run typecheck
bun run build

Issues and PRs welcome on GitHub.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

browser-whisper

What is this?

Install

Quick start

Before you ship: COOP / COEP headers

Common patterns

Model and language

Progress and segments

Pre-download a model

Clear cache

Models

How it works

Browser support

Links

Development

License