npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

mellon-stt

v1.1.0

Published

Offline, in-browser hotword detection powered by EfficientWord-Net (ResNet-50 ArcFace). Works as a standalone app or npm library.

Readme

mellon-stt

Offline, fully in-browser hotword / wake-word detection powered by EfficientWord-Net (ResNet-50 ArcFace). Works as a zero-dependency npm library or as a standalone PWA.

  • 100% offline — ONNX inference runs in the browser via WebAssembly; no server, no cloud.
  • Speaker-independent — the model generalises across voices out of the box.
  • Custom words — enroll any phrase with ≥ 3 audio samples; no retraining.
  • TypeScript-ready — ships with full .d.ts declarations.
  • Tiny API surface — one class for simple use, low-level primitives for advanced use.

Table of contents

  1. Browser requirements
  2. Installation
  3. Quick start
  4. Asset setup
  5. API reference
  6. Enrolling custom words
  7. Server / bundler configuration
  8. Browser support

Browser requirements

mellon-stt uses ONNX Runtime's multi-threaded WebAssembly backend, which requires SharedArrayBuffer. This in turn requires the page to be served with the following HTTP headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

See Server / bundler configuration for ready-to-use snippets.

Additionally:

  • The page must be served over HTTPS (or localhost).
  • Microphone permission is requested when start() is called.

Installation

npm install mellon-stt

The package ships with the ONNX model (~88 MB) and all ORT WASM runtime files. Copy them to your public directory before your first deployment — see Asset setup.


Quick start

import { MellonStt } from 'mellon-stt'

const stt = new MellonStt({
  // Tell the library where you copied the assets (see Asset setup below)
  wasmBasePath: '/mellon-assets/wasm/',
  modelUrl:     '/mellon-assets/model.onnx',
})

// Optional: show a progress bar while the 88 MB model loads
await stt.init(pct => console.log(`Loading model: ${Math.round(pct * 100)}%`))

// Request mic and start listening for the built-in words
await stt.start()

stt.addEventListener('match', (e) => {
  console.log(`Detected "${e.detail.name}" (confidence ${(e.detail.confidence * 100).toFixed(1)}%)`)
})

Built-in words: suivant (French: "next") and precedent (French: "previous"). You can enroll any custom word — see Enrolling custom words.


Asset setup

The WASM runtime and model cannot be bundled into JavaScript — they must be served as static files. After installing, run the provided helper to copy them to your project's public directory:

# Copy to public/mellon-assets/  (adjust --dest as needed)
node node_modules/mellon-stt/scripts/copy-assets.js --dest ./public/mellon-assets

Or copy manually:

cp -r node_modules/mellon-stt/dist/assets/wasm  public/mellon-assets/wasm
cp    node_modules/mellon-stt/dist/assets/model.onnx  public/mellon-assets/model.onnx

Then pass the serving paths to the constructor:

new MellonStt({
  wasmBasePath: '/mellon-assets/wasm/',   // trailing slash required
  modelUrl:     '/mellon-assets/model.onnx',
})

Vite projects

Add the copy step to your Vite config using the vite-plugin-static-copy plugin:

// vite.config.js
import { defineConfig }   from 'vite'
import { viteStaticCopy } from 'vite-plugin-static-copy'

export default defineConfig({
  server: {
    headers: {
      'Cross-Origin-Opener-Policy':  'same-origin',
      'Cross-Origin-Embedder-Policy': 'require-corp',
    },
  },
  plugins: [
    viteStaticCopy({
      targets: [
        { src: 'node_modules/mellon-stt/dist/assets/wasm/*',       dest: 'mellon-assets/wasm' },
        { src: 'node_modules/mellon-stt/dist/assets/model.onnx',   dest: 'mellon-assets' },
      ],
    }),
  ],
})

API reference

MellonStt (high-level)

The easiest way to use the library. Wraps mic access, AudioWorklet wiring, and detector management into a single class.

class MellonStt extends EventTarget {
  static BUILTIN_WORDS: string[]          // ['suivant', 'precedent']

  constructor(opts?: MellonSttOptions)
  readonly isInitialized: boolean
  readonly isRunning:     boolean

  init(onProgress?: (pct: number) => void): Promise<void>
  start(words?: string[]): Promise<void>
  stop(): void
  addCustomWord(refData: RefData): void
  enrollWord(wordName: string): EnrollmentSession
}

MellonSttOptions

| Option | Type | Default | Description | |---|---|---|---| | words | string[] | BUILTIN_WORDS | Words to detect | | threshold | number | 0.65 | Detection threshold (0–1) | | relaxationMs | number | 2000 | Min ms between match events | | inferenceGapMs | number | 300 | Min ms between inference runs | | wasmBasePath | string | — | Base URL for ORT WASM (trailing /) | | modelUrl | string | — | URL to model.onnx |

Events

| Event | Detail type | Fired when | |---|---|---| | ready | — | init() completes | | match | { name, confidence, timestamp } | A word is detected | | error | { error: Error } | Model load or mic access fails |


HotwordDetector

Stateful, single-word detector. Wire it to your own AudioWorklet pipeline.

class HotwordDetector extends EventTarget {
  constructor(opts: DetectorOptions)

  readonly name:      string
  readonly lastScore: number       // most recent similarity score
  threshold:      number
  relaxationMs:   number
  inferenceGapMs: number

  scoreFrame(audioBuffer: Float32Array): Promise<number | null>
}

DetectorOptions

| Option | Type | Default | Description | |---|---|---|---| | name | string | — | Label for this word | | refEmbeddings | number[][] | — | N × 256 embedding vectors | | threshold | number | 0.65 | Detection threshold | | relaxationMs | number | 2000 | Cooldown between matches | | inferenceGapMs | number | 300 | Rate-limit on scoreFrame() |

Example

import { loadModel, configure, HotwordDetector, BUILTIN_REFS } from 'mellon-stt'

configure({ wasmBasePath: '/assets/wasm/', modelUrl: '/assets/model.onnx' })
await loadModel()

const ref = BUILTIN_REFS['suivant']
const detector = new HotwordDetector({ name: 'suivant', refEmbeddings: ref.embeddings })

detector.addEventListener('match', e => {
  console.log(e.detail)  // { name: 'suivant', confidence: 0.72, timestamp: 1711234567890 }
})

// In your AudioWorklet onmessage handler:
workletNode.port.onmessage = async (e) => {
  await detector.scoreFrame(e.data)   // e.data is Float32Array[24000]
}

EnrollmentSession

Records audio samples from the mic (or uploaded files) and generates reference embeddings for a new custom word.

class EnrollmentSession extends EventTarget {
  constructor(wordName: string)

  readonly wordName:    string
  readonly sampleCount: number
  readonly samples:     { audioBuffer: Float32Array; name: string }[]

  recordSample():            Promise<number>   // → 1-based sample index
  addAudioFile(file: File):  Promise<number>   // → 1-based sample index
  removeSample(idx: number): void
  clearSamples():            void
  generateRef():             Promise<RefData>  // requires ≥ 3 samples
}

Events

| Event | Detail | |---|---| | recording-start | — | | sample-added | { count: number; name: string } | | samples-changed | { count: number } | | generating | { total: number } | | progress | { done: number; total: number } |


Engine functions

// Configure asset paths (once, before loadModel)
configure({ wasmBasePath?: string, modelUrl?: string }): void

// Load (or return cached) ONNX inference session
loadModel(onProgress?: (pct: number) => void): Promise<void>

// Run inference — returns 256-dim L2-normalised embedding
embed(spectrogram: Float32Array): Promise<Float32Array>

Audio features

// Compute log-mel spectrogram — input: 24 000 samples at 16 kHz
// Output: Float32Array[149 × 64]
logfbank(signal: Float32Array): Float32Array

Similarity helpers

// Cosine similarity normalised to [0, 1]
cosineSim(a: Float32Array | number[], b: Float32Array | number[]): number

// Maximum cosine similarity against an array of reference embeddings
maxSimilarity(embedding: Float32Array, refs: number[][]): number

Storage helpers

// Constants
BUILTIN_WORDS: string[]                              // ['suivant', 'precedent']
BUILTIN_REFS:  Record<string, RefData>               // bundled, no fetch needed

// Network-based fetch (demo app / server usage)
fetchBuiltinRef(word: string): Promise<RefData>

// localStorage persistence
loadCustomRefs():                  RefData[]
saveCustomRef(refData: RefData):   void
deleteCustomRef(wordName: string): void

// File I/O
exportRef(refData: RefData):         void           // triggers browser download
importRefFile(file: File): Promise<RefData>

RefData shape

interface RefData {
  word_name:  string           // e.g. 'hello'
  model_type: 'resnet_50_arc'
  embeddings: number[][]       // N × 256 vectors
}

Compatible with the EfficientWord-Net _ref.json format — you can import reference files generated by the Python toolkit directly.


Enrolling custom words

import { MellonStt, saveCustomRef } from 'mellon-stt'

const stt = new MellonStt({ wasmBasePath: '/assets/wasm/', modelUrl: '/assets/model.onnx' })
await stt.init()

// 1. Create an enrollment session
const session = stt.enrollWord('hey computer')

session.addEventListener('recording-start', () => console.log('Recording…'))
session.addEventListener('sample-added', e => console.log(`Sample ${e.detail.count} recorded`))

// 2. Record at least 3 samples (1.5 s each)
await session.recordSample()
await session.recordSample()
await session.recordSample()

// 3. Generate reference embeddings
session.addEventListener('progress', e => console.log(`Embedding ${e.detail.done}/${e.detail.total}`))
const ref = await session.generateRef()

// 4a. Use immediately in the running detector
stt.addCustomWord(ref)

// 4b. Persist for future sessions
saveCustomRef(ref)

You can also enroll from pre-recorded audio files:

const file = document.querySelector('input[type=file]').files[0]
await session.addAudioFile(file)

Server / bundler configuration

SharedArrayBuffer (required by multi-threaded WASM) is only available when the page is served with:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Vite dev server

Already configured in the demo app's vite.config.js. For your own project:

// vite.config.js
export default {
  server:  { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
  preview: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp' } },
}

Express

app.use((req, res, next) => {
  res.setHeader('Cross-Origin-Opener-Policy',  'same-origin')
  res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp')
  next()
})

Nginx

add_header Cross-Origin-Opener-Policy  "same-origin";
add_header Cross-Origin-Embedder-Policy "require-corp";

Netlify (public/_headers)

/*
  Cross-Origin-Opener-Policy: same-origin
  Cross-Origin-Embedder-Policy: require-corp

Browser support

| Browser | Supported | Notes | |---|---|---| | Chrome / Edge 89+ | ✅ | Full support | | Firefox 79+ | ✅ | Full support | | Safari 15.2+ | ✅ | SharedArrayBuffer re-enabled with COOP/COEP | | Safari < 15.2 | ❌ | SharedArrayBuffer not available | | iOS Safari 15.2+ | ✅ | Works over HTTPS | | Node.js | ❌ | Browser-only (AudioContext, getUserMedia) |


License

MIT