mellon-stt
v1.1.0
Published
Offline, in-browser hotword detection powered by EfficientWord-Net (ResNet-50 ArcFace). Works as a standalone app or npm library.
Maintainers
Readme
mellon-stt
Offline, fully in-browser hotword / wake-word detection powered by EfficientWord-Net (ResNet-50 ArcFace). Works as a zero-dependency npm library or as a standalone PWA.
- 100% offline — ONNX inference runs in the browser via WebAssembly; no server, no cloud.
- Speaker-independent — the model generalises across voices out of the box.
- Custom words — enroll any phrase with ≥ 3 audio samples; no retraining.
- TypeScript-ready — ships with full
.d.tsdeclarations. - Tiny API surface — one class for simple use, low-level primitives for advanced use.
Table of contents
- Browser requirements
- Installation
- Quick start
- Asset setup
- API reference
- Enrolling custom words
- Server / bundler configuration
- Browser support
Browser requirements
mellon-stt uses ONNX Runtime's multi-threaded WebAssembly backend, which requires SharedArrayBuffer. This in turn requires the page to be served with the following HTTP headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpSee Server / bundler configuration for ready-to-use snippets.
Additionally:
- The page must be served over HTTPS (or
localhost). - Microphone permission is requested when
start()is called.
Installation
npm install mellon-sttThe package ships with the ONNX model (~88 MB) and all ORT WASM runtime files. Copy them to your public directory before your first deployment — see Asset setup.
Quick start
import { MellonStt } from 'mellon-stt'
const stt = new MellonStt({
// Tell the library where you copied the assets (see Asset setup below)
wasmBasePath: '/mellon-assets/wasm/',
modelUrl: '/mellon-assets/model.onnx',
})
// Optional: show a progress bar while the 88 MB model loads
await stt.init(pct => console.log(`Loading model: ${Math.round(pct * 100)}%`))
// Request mic and start listening for the built-in words
await stt.start()
stt.addEventListener('match', (e) => {
console.log(`Detected "${e.detail.name}" (confidence ${(e.detail.confidence * 100).toFixed(1)}%)`)
})Built-in words: suivant (French: "next") and precedent (French: "previous"). You can enroll any custom word — see Enrolling custom words.
Asset setup
The WASM runtime and model cannot be bundled into JavaScript — they must be served as static files. After installing, run the provided helper to copy them to your project's public directory:
# Copy to public/mellon-assets/ (adjust --dest as needed)
node node_modules/mellon-stt/scripts/copy-assets.js --dest ./public/mellon-assetsOr copy manually:
cp -r node_modules/mellon-stt/dist/assets/wasm public/mellon-assets/wasm
cp node_modules/mellon-stt/dist/assets/model.onnx public/mellon-assets/model.onnxThen pass the serving paths to the constructor:
new MellonStt({
wasmBasePath: '/mellon-assets/wasm/', // trailing slash required
modelUrl: '/mellon-assets/model.onnx',
})Vite projects
Add the copy step to your Vite config using the vite-plugin-static-copy plugin:
// vite.config.js
import { defineConfig } from 'vite'
import { viteStaticCopy } from 'vite-plugin-static-copy'
export default defineConfig({
server: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'require-corp',
},
},
plugins: [
viteStaticCopy({
targets: [
{ src: 'node_modules/mellon-stt/dist/assets/wasm/*', dest: 'mellon-assets/wasm' },
{ src: 'node_modules/mellon-stt/dist/assets/model.onnx', dest: 'mellon-assets' },
],
}),
],
})API reference
MellonStt (high-level)
The easiest way to use the library. Wraps mic access, AudioWorklet wiring, and detector management into a single class.
class MellonStt extends EventTarget {
static BUILTIN_WORDS: string[] // ['suivant', 'precedent']
constructor(opts?: MellonSttOptions)
readonly isInitialized: boolean
readonly isRunning: boolean
init(onProgress?: (pct: number) => void): Promise<void>
start(words?: string[]): Promise<void>
stop(): void
addCustomWord(refData: RefData): void
enrollWord(wordName: string): EnrollmentSession
}MellonSttOptions
| Option | Type | Default | Description |
|---|---|---|---|
| words | string[] | BUILTIN_WORDS | Words to detect |
| threshold | number | 0.65 | Detection threshold (0–1) |
| relaxationMs | number | 2000 | Min ms between match events |
| inferenceGapMs | number | 300 | Min ms between inference runs |
| wasmBasePath | string | — | Base URL for ORT WASM (trailing /) |
| modelUrl | string | — | URL to model.onnx |
Events
| Event | Detail type | Fired when |
|---|---|---|
| ready | — | init() completes |
| match | { name, confidence, timestamp } | A word is detected |
| error | { error: Error } | Model load or mic access fails |
HotwordDetector
Stateful, single-word detector. Wire it to your own AudioWorklet pipeline.
class HotwordDetector extends EventTarget {
constructor(opts: DetectorOptions)
readonly name: string
readonly lastScore: number // most recent similarity score
threshold: number
relaxationMs: number
inferenceGapMs: number
scoreFrame(audioBuffer: Float32Array): Promise<number | null>
}DetectorOptions
| Option | Type | Default | Description |
|---|---|---|---|
| name | string | — | Label for this word |
| refEmbeddings | number[][] | — | N × 256 embedding vectors |
| threshold | number | 0.65 | Detection threshold |
| relaxationMs | number | 2000 | Cooldown between matches |
| inferenceGapMs | number | 300 | Rate-limit on scoreFrame() |
Example
import { loadModel, configure, HotwordDetector, BUILTIN_REFS } from 'mellon-stt'
configure({ wasmBasePath: '/assets/wasm/', modelUrl: '/assets/model.onnx' })
await loadModel()
const ref = BUILTIN_REFS['suivant']
const detector = new HotwordDetector({ name: 'suivant', refEmbeddings: ref.embeddings })
detector.addEventListener('match', e => {
console.log(e.detail) // { name: 'suivant', confidence: 0.72, timestamp: 1711234567890 }
})
// In your AudioWorklet onmessage handler:
workletNode.port.onmessage = async (e) => {
await detector.scoreFrame(e.data) // e.data is Float32Array[24000]
}EnrollmentSession
Records audio samples from the mic (or uploaded files) and generates reference embeddings for a new custom word.
class EnrollmentSession extends EventTarget {
constructor(wordName: string)
readonly wordName: string
readonly sampleCount: number
readonly samples: { audioBuffer: Float32Array; name: string }[]
recordSample(): Promise<number> // → 1-based sample index
addAudioFile(file: File): Promise<number> // → 1-based sample index
removeSample(idx: number): void
clearSamples(): void
generateRef(): Promise<RefData> // requires ≥ 3 samples
}Events
| Event | Detail |
|---|---|
| recording-start | — |
| sample-added | { count: number; name: string } |
| samples-changed | { count: number } |
| generating | { total: number } |
| progress | { done: number; total: number } |
Engine functions
// Configure asset paths (once, before loadModel)
configure({ wasmBasePath?: string, modelUrl?: string }): void
// Load (or return cached) ONNX inference session
loadModel(onProgress?: (pct: number) => void): Promise<void>
// Run inference — returns 256-dim L2-normalised embedding
embed(spectrogram: Float32Array): Promise<Float32Array>Audio features
// Compute log-mel spectrogram — input: 24 000 samples at 16 kHz
// Output: Float32Array[149 × 64]
logfbank(signal: Float32Array): Float32ArraySimilarity helpers
// Cosine similarity normalised to [0, 1]
cosineSim(a: Float32Array | number[], b: Float32Array | number[]): number
// Maximum cosine similarity against an array of reference embeddings
maxSimilarity(embedding: Float32Array, refs: number[][]): numberStorage helpers
// Constants
BUILTIN_WORDS: string[] // ['suivant', 'precedent']
BUILTIN_REFS: Record<string, RefData> // bundled, no fetch needed
// Network-based fetch (demo app / server usage)
fetchBuiltinRef(word: string): Promise<RefData>
// localStorage persistence
loadCustomRefs(): RefData[]
saveCustomRef(refData: RefData): void
deleteCustomRef(wordName: string): void
// File I/O
exportRef(refData: RefData): void // triggers browser download
importRefFile(file: File): Promise<RefData>RefData shape
interface RefData {
word_name: string // e.g. 'hello'
model_type: 'resnet_50_arc'
embeddings: number[][] // N × 256 vectors
}Compatible with the EfficientWord-Net _ref.json format — you can import reference files generated by the Python toolkit directly.
Enrolling custom words
import { MellonStt, saveCustomRef } from 'mellon-stt'
const stt = new MellonStt({ wasmBasePath: '/assets/wasm/', modelUrl: '/assets/model.onnx' })
await stt.init()
// 1. Create an enrollment session
const session = stt.enrollWord('hey computer')
session.addEventListener('recording-start', () => console.log('Recording…'))
session.addEventListener('sample-added', e => console.log(`Sample ${e.detail.count} recorded`))
// 2. Record at least 3 samples (1.5 s each)
await session.recordSample()
await session.recordSample()
await session.recordSample()
// 3. Generate reference embeddings
session.addEventListener('progress', e => console.log(`Embedding ${e.detail.done}/${e.detail.total}`))
const ref = await session.generateRef()
// 4a. Use immediately in the running detector
stt.addCustomWord(ref)
// 4b. Persist for future sessions
saveCustomRef(ref)You can also enroll from pre-recorded audio files:
const file = document.querySelector('input[type=file]').files[0]
await session.addAudioFile(file)Server / bundler configuration
SharedArrayBuffer (required by multi-threaded WASM) is only available when the page is served with:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpVite dev server
Already configured in the demo app's vite.config.js. For your own project:
// vite.config.js
export default {
server: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
preview: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
}Express
app.use((req, res, next) => {
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin')
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp')
next()
})Nginx
add_header Cross-Origin-Opener-Policy "same-origin";
add_header Cross-Origin-Embedder-Policy "require-corp";Netlify (public/_headers)
/*
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpBrowser support
| Browser | Supported | Notes | |---|---|---| | Chrome / Edge 89+ | ✅ | Full support | | Firefox 79+ | ✅ | Full support | | Safari 15.2+ | ✅ | SharedArrayBuffer re-enabled with COOP/COEP | | Safari < 15.2 | ❌ | SharedArrayBuffer not available | | iOS Safari 15.2+ | ✅ | Works over HTTPS | | Node.js | ❌ | Browser-only (AudioContext, getUserMedia) |
License
MIT
