@agency-lang/whisper-local

v0.0.1

Published

8 hours ago

Local Whisper transcription for Agency using vendored whisper.cpp

0High
0Medium
0Low

egonschiele

@agency-lang/whisper-local

Local Whisper transcription for Agency. No network at runtime, no API key, no data leaves your machine.

Installation

This package ships as source and ships no install/postinstall scripts — npm install will not run a compiler or fetch anything. After installing, run the explicit build step below to compile the native addon. This is intentional: postinstall scripts are the most common npm supply-chain attack vector, and silently invoking cmake on every install would surprise users who pulled the package as a transitive dependency.

# 1. Install (no compilation, no network beyond the npm download itself)
npm install @agency-lang/whisper-local

# 2. Explicitly build the native addon (one-time, ~30-90 seconds)
npx -p @agency-lang/whisper-local agency-whisper build

System dependencies (install before running agency-whisper build):

| Platform | Build tools | Audio decoding | |----------|-------------|----------------| | macOS | xcode-select --install && brew install cmake | brew install ffmpeg | | Debian/Ubuntu | sudo apt install cmake build-essential | sudo apt install ffmpeg | | Fedora | sudo dnf install cmake gcc-c++ | sudo dnf install ffmpeg | | Windows | Not yet supported in source-build v0; wait for prebuilts. | — |

If agency-whisper build fails because of missing build tools, the rest of your node_modules/ is unaffected.

Quick start

import { transcribe } from "pkg::@agency-lang/whisper-local"

node main() {
  const text = transcribe("interview.m4a", "en", "base.en")
  print(text)
}

The first call downloads ggml-base.en.bin (~150 MB) into ~/.agency/models/whisper/. Subsequent calls reuse the cached file.

API

transcribe(filepath: string, language: string = "", model: string = "base.en"): string

filepath — any audio file ffmpeg can read (mp3, m4a, wav, flac, ogg, webm, mp4, …). Must be a regular file on the local filesystem; URLs and ffmpeg pseudo-protocols (http://, concat:, subfile:, …) are explicitly rejected. See Operational notes.
language — ISO 639-1 code (e.g. "en", "fr", "de"). Empty string → whisper.cpp auto-detects.
model — model name. See the table below.

Returns the joined transcript text. Throws on missing ffmpeg, unknown model, corrupted audio, or SHA-256 mismatch on model download.

Models

All ten upstream whisper.cpp models are pinned in models.lock.json against HuggingFace commit 5359861c… with verified SHA-256 hashes.

| Name | Size | English-only | Notes | |------|------|--------------|-------| | tiny | 75 MB | no | Fastest, lowest accuracy. Good for quick prototypes. | | tiny.en | 75 MB | yes | English-only variant of tiny; slightly more accurate. | | base | 142 MB | no | Multilingual base model. | | base.en | 142 MB | yes | Default. Good accuracy/speed trade-off for English. | | small | 466 MB | no | Noticeable accuracy bump over base. | | small.en | 466 MB | yes | English-only variant of small. | | medium | 1.5 GB | no | Slow on CPU; usable on Apple Silicon. | | medium.en | 1.5 GB | yes | English-only variant of medium. | | large-v3 | 2.9 GB | no | Best accuracy. Use only with adequate RAM (~5 GB peak). | | large-v3-turbo | 1.6 GB | no | Approaches large-v3 quality at ~half the size. |

.en variants are slightly more accurate on English-only audio. To add a model from a newer upstream release, see docs/DEV.md.

Pre-downloading models

npx -p @agency-lang/whisper-local agency-whisper pull base.en
npx -p @agency-lang/whisper-local agency-whisper list
npx -p @agency-lang/whisper-local agency-whisper verify base.en

Manual model placement

If you can't reach HuggingFace (network-restricted environment, etc.), download ggml-<name>.bin yourself and place it at ~/.agency/models/whisper/ggml-<name>.bin. The package will use it as-is, without re-hashing on every load. To verify a manually-placed file:

npx -p @agency-lang/whisper-local agency-whisper verify <name>

Custom model directory

Set AGENCY_WHISPER_MODELS_DIR to use a directory other than ~/.agency/models/whisper/.

Operational notes

Threading. The native addon runs whisper inference on a libuv worker thread (Napi::AsyncWorker), so the JavaScript event loop is not blocked during a transcribe() call. Other JS work — HTTP requests, timers, etc. — continues to run.

Per-model serialization. A whisper_context is mutable internal state. We hold a per-model mutex around whisper_full, so concurrent transcribe() calls on the same model serialize cleanly (no races, no corruption) but do not run in parallel. Throughput per model is single-threaded.

Cross-model parallelism. Calls on different model instances run concurrently on separate libuv worker threads. By default Node sizes the libuv pool at 4 threads. If you run many concurrent transcriptions in the same process (e.g. an Agency program serving multiple users), bump the pool: UV_THREADPOOL_SIZE=16 node ... Do this before node starts; the pool size is locked at startup.

Loaded-model cache. Loaded model contexts are kept in an in-process LRU cache. The default cap is 2 entries (a large-v3 context can use ~3 GB, so an unbounded cache is dangerous in long-lived processes). Override via AGENCY_WHISPER_HANDLE_CACHE_MAX. Set to 0 to disable the cache (load + free per transcribe()).

Memory profile. The audio decode step buffers the entire decoded PCM in memory before handing it to whisper. At 16 kHz mono float32 that is ~230 MB per hour of audio. Peak RSS during a transcribe is roughly 3× the decoded size (decoded buffer + Float32Array copy + C++ std::vector copy). The package rejects any single decode that exceeds 2 GB by default; override per-call via the lower-level decodeToPcm({ maxPcmBytes }) API or chunk long audio client-side.

Timeout. Each ffmpeg invocation has a 10-minute wall-clock cap (configurable via decodeToPcm({ timeoutMs })). On expiry the ffmpeg process is SIGKILL-ed and the call rejects, so a stuck or pathological input cannot hold the worker forever.

Input restriction. transcribe(filepath) validates that filepath is a regular file before spawning ffmpeg, and the spawn is restricted to ffmpeg's file protocol. Inputs starting with - or specifying any other protocol (http://, tcp://, concat:, subfile:, …) are rejected up front. This matters because Agency programs often pass LLM-driven tool arguments straight into transcribe(); without this guard, a crafted "filepath" could turn a transcription into an outbound HTTP fetch or a read of an arbitrary local file.

Troubleshooting

cmake: command not found — install cmake (see the install table).

ffmpeg not found on PATH — install ffmpeg (see the install table).

whisper_init_from_file_with_params failed — the .bin file is corrupted or not a whisper.cpp ggml format. Re-download with agency-whisper pull <name> or run agency-whisper verify <name> to check.

SHA-256 mismatch — the downloaded model's hash doesn't match the lockfile. The partial file has been deleted. Re-running usually succeeds; if it persists, file an issue.

whisper-local native addon not found — you skipped step 2 of installation. Run npx -p @agency-lang/whisper-local agency-whisper build.

Not in v0

Prebuilt binaries (will arrive via an optional-deps platform matrix).
Windows source-build support.
Streaming partial results.
Speaker diarization.
Translation mode.

For maintainers / contributors

See docs/DEV.md for architecture, security model, C++ memory design, vendoring procedure, and testing notes.

Credits

This package vendors and depends on:

whisper.cpp by Georgi Gerganov and contributors — MIT License. See vendor/whisper.cpp/LICENSE. We use a pinned release; see vendor/whisper.cpp/VERSION.
ggml by Georgi Gerganov and contributors — MIT License. Vendored alongside whisper.cpp.
node-addon-api by the Node.js project — MIT License.
cmake-js — MIT License.

Audio decoding shells out to your system's ffmpeg, which is not bundled or distributed with this package.

License

ISC. See repository root LICENSE file. Vendored whisper.cpp + ggml source is MIT-licensed; see vendor/whisper.cpp/LICENSE for that notice.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@agency-lang/whisper-local

Installation

Quick start

API

Models

Pre-downloading models

Manual model placement

Custom model directory

Operational notes

Troubleshooting

Not in v0

For maintainers / contributors

Credits

License