voxtral-transcribe-ts
v0.1.3
Published
Minimal TypeScript wrapper for local Voxtral Mini 4B Realtime transcription in Node.js.
Maintainers
Readme
voxtral-transcribe-ts
Minimal TypeScript wrapper for local transcription with Voxtral Mini 4B Realtime in Node.js.
This package targets the ONNX checkpoint:
onnx-community/Voxtral-Mini-4B-Realtime-2602-ONNX
It is intentionally small:
- Node/TS only, no Python
- thin wrapper around
@huggingface/transformers+ ONNX Runtime - 0 external audio decoder dependency
- optional Mistral API transcription backend with no extra dependency
The built-in file loader only supports .wav input so the package can stay lightweight. If you already have PCM samples in memory, use transcribeAudio().
Architecture and multi-target rollout plan: PLAN.md
Install
npm install voxtral-transcribe-tsQuick Start
import { VoxtralTranscriber } from "voxtral-transcribe-ts";
const transcriber = new VoxtralTranscriber({
device: "cpu",
dtype: "q4",
});
const result = await transcriber.transcribeFile("./sample.wav");
console.log(result.text);
await transcriber.dispose();By default, the package now auto-selects the audio decoder backend:
- Node/local:
InternalWavDecoder - Browser:
BrowserNativeAudioDecoder
The package now ships conditional entries:
- package root in Node ->
dist/index.node.js - package root in browser-aware bundlers ->
dist/index.browser.js - explicit subpaths:
voxtral-transcribe-ts/nodevoxtral-transcribe-ts/browser
Environment Matrix
| Environment | Package entry | Inference runtime | Default decoder | File input strategy |
|---|---|---|---|---|
| Node / local | voxtral-transcribe-ts or voxtral-transcribe-ts/node | @huggingface/transformers + onnxruntime-node | InternalWavDecoder | wav by default, multiformat via FfmpegDecoder |
| Browser | voxtral-transcribe-ts in browser-aware bundlers or voxtral-transcribe-ts/browser | browser-safe package entry | BrowserNativeAudioDecoder | URL, Blob, File, browser codec support dependent on runtime |
| Server high-perf | voxtral-transcribe-ts/node | @huggingface/transformers + onnxruntime-node | FfmpegDecoder recommended | multiformat through ffmpeg |
| Mistral API | voxtral-transcribe-ts/node | HTTPS API call to Mistral | Mistral-hosted | local path upload or file_url |
Decoder Matrix
| Decoder | Environment | Purpose | Notes |
|---|---|---|---|
| InternalWavDecoder | Node, browser | Minimal fallback | wav only |
| FfmpegDecoder | Node / server | Best multiformat local path | Not available in browser builds |
| BrowserNativeAudioDecoder | Browser | Native client-side decoding | Depends on browser codec support |
You can override this with:
target: "auto" | "node" | "browser"audioDecoderBackendinferenceBackend
Raw Audio
import { transcribeAudio } from "voxtral-transcribe-ts";
const samples = new Float32Array([/* mono PCM samples */]);
const result = await transcribeAudio(samples, {
sampleRate: 16_000,
});
console.log(result.text);API
new VoxtralTranscriber(options?)
Options:
model: defaults toonnx-community/Voxtral-Mini-4B-Realtime-2602-ONNXmodelPath: optional local path or pre-provisioned snapshot path used instead of fetching by model iddevice: defaults tocpudtype: defaults toq4cacheDirlocalFilesOnlyrequireLocalModel: whentrue, fail instead of attempting a runtime downloadrevisionprogressCallbacktarget: defaults toauto
await transcriber.load()
Preloads the processor and model.
await transcriber.transcribeFile(path, options?)
Reads a WAV file, downmixes it to mono, resamples it to the model sample rate, and returns:
type VoxtralTranscriptionResult = {
decoder: string;
durationMs: number;
model: string;
sampleRate: number;
text: string;
};await transcriber.transcribeAudio(samples, options?)
Transcribes mono PCM samples already loaded in memory.
Options:
sampleRate: defaults to16000maxNewTokensskipSpecialTokens: defaults totrue
Advanced
The transcriber now separates:
- inference backend
- audio decoder backend
The current default pair is:
TransformersInferenceBackendInternalWavDecoderin NodeBrowserNativeAudioDecoderin browsers
For multiformat local/server decoding, use FfmpegDecoder.
import { FfmpegDecoder, VoxtralTranscriber } from "voxtral-transcribe-ts";
const transcriber = new VoxtralTranscriber({
audioDecoderBackend: new FfmpegDecoder(),
});
const result = await transcriber.transcribeFile("./sample.mp3");
console.log(result.text);Browser inputs can be passed as URLs or Blob / File objects when using BrowserNativeAudioDecoder or the default browser auto-selection.
You can also create an instance through createTranscriber(options), which uses the same defaults and target rules as new VoxtralTranscriber(options).
Optional Mistral API Backend
The package also exposes an optional hosted Voxtral transcription backend. This is not local/offline, but it is useful when latency matters more than self-hosting.
It adds no npm dependency and uses the platform fetch / FormData APIs.
import { MistralVoxtralApiTranscriber } from "voxtral-transcribe-ts/node";
const transcriber = new MistralVoxtralApiTranscriber({
// Optional in Node if process.env.MISTRAL_API_KEY is set.
apiKey: process.env.MISTRAL_API_KEY,
});
const result = await transcriber.transcribeFile("./sample.mp3", {
language: "fr",
});
console.log(result.text);For remote audio, avoid downloading it yourself:
import { transcribeFileWithMistral } from "voxtral-transcribe-ts/node";
const result = await transcribeFileWithMistral("https://example.com/audio.wav", {
apiKey: process.env.MISTRAL_API_KEY,
language: "fr",
});API options:
model: defaults tovoxtral-mini-2602apiKey: defaults toprocess.env.MISTRAL_API_KEYin the Node transcriberbaseUrl: defaults tohttps://api.mistral.ai/v1languagediarizetimestampGranularities:segmentorwordcontextBiastemperature
Browser builds also expose MistralVoxtralApiTranscriber, but do not put a long-lived Mistral API key in frontend code. Use a short-lived token or proxy if you need this path in a browser.
Enterprise / Artifactory
There are two separate concerns in enterprise environments:
- npm dependency installation
- model provisioning
npm install voxtral-transcribe-ts only installs the package and its npm dependencies. It does not download the Voxtral model checkpoint during package installation.
By default, the model may still be fetched later at runtime when the transcriber first loads. In registry-controlled environments such as Artifactory, the recommended setup is:
- proxy npm dependencies through your internal registry
- pre-provision the Voxtral model snapshot on disk or in an internal artifact store
- point the transcriber at that local snapshot
- require local-only model loading so runtime fails fast instead of reaching out to Hugging Face
import { FfmpegDecoder, VoxtralTranscriber } from "voxtral-transcribe-ts/node";
const transcriber = new VoxtralTranscriber({
audioDecoderBackend: new FfmpegDecoder(),
modelPath: "/opt/models/Voxtral-Mini-4B-Realtime-2602-ONNX",
requireLocalModel: true,
});With modelPath set, the package treats the model as a local artifact and enables local-only loading for the runtime backend. That is the mode to use for Artifactory + local model deployments.
Browser Entry
import { createTranscriber } from "voxtral-transcribe-ts/browser";
const transcriber = createTranscriber({
target: "browser",
});Node Entry
import { createTranscriber, FfmpegDecoder } from "voxtral-transcribe-ts/node";
const transcriber = createTranscriber({
target: "node",
audioDecoderBackend: new FfmpegDecoder(),
});WAV Support
The internal WAV decoder supports:
- PCM 8/16/24/32-bit
- IEEE float 32-bit
- mono or multi-channel input, mixed down to mono
For mp3, m4a, ogg, or flac, decode audio yourself and call transcribeAudio().
If you want the package to decode those formats for you on local/server, instantiate the transcriber with FfmpegDecoder.
Validation
npm run validate
npm run test:smokeBenchmark
The repository includes a benchmark harness for comparing voxtral-transcribe-ts against faster-whisper on WER, CER, and real-time factor.
See BENCHMARK.md.
CI / Release
The repository now ships a GitHub Actions workflow in .github/workflows/typescript-ci.yml modeled after graphify.
It does four things:
- runs
npm run validateon Node20and22 - builds a tarball and installs it into a pristine temp project with
npm install - verifies the published root,
node, andbrowserexports - publishes to npm on tags matching
v*
Publish strategy:
- default: GitHub Actions trusted publishing with
id-token: write - fallback: if
NPM_TOKENis configured as a repository secret, the workflow uses that token instead
Local pre-publish check:
npm run test:smokeThat smoke test now proves a fresh-machine install path: it packs the library, creates an empty temp project, runs npm install <tarball>, then verifies the installed dependencies and exports from that temp install.
The runtime tests also cover the enterprise local-model path: modelPath + requireLocalModel is forwarded as a local-only load contract so runtime can be configured with zero remote model fetch.
Typical release flow:
npm version patch
git push
git push --tags