npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pocket-tts-js

v0.1.0

Published

Tiny browser API for the Pocket TTS ONNX model: streaming neural text-to-speech with voice cloning, running entirely client-side in a Web Worker.

Readme

pocket-tts-js

npm npm bundle size license: MIT live demo Deploy demo

Tiny browser API for the Pocket TTS ONNX model — streaming neural text-to-speech with voice cloning, running entirely client-side in a Web Worker so it never blocks your UI.

  • 🪶 Tiny package — pure-JS SentencePiece tokenizer (no 4 MB WASM build); onnxruntime-web is loaded from a CDN, not bundled.
  • 🌍 Per-language bundles — load only the language you need.
  • Quantized or full precision — INT8 by default; only the variant you choose is downloaded.
  • 📥 Downloads only what's used — the voice encoder is fetched only when cloning is enabled; built-in voices only when requested.
  • 🧵 Off the main thread — all inference runs in a Web Worker; audio streams out chunk by chunk.

Models are streamed at runtime from vlapky/pocket-tts-onnx on Hugging Face.

▶ Live demo — runs in your browser; the source is in example/.

Install

npm install pocket-tts-js

You also need a bundler that supports the new Worker(new URL('./worker.js', import.meta.url)) pattern (Vite, webpack 5, Rollup, Parcel 2 — all do).

Quick start

import { PocketTTS, StreamingPlayer } from "pocket-tts-js";

const tts = new PocketTTS({
  language: "english_2026-04", // see PocketTTS.LANGUAGES
  quantized: true,             // INT8 (smaller/faster) — set false for full precision
  voiceCloning: true,          // download the encoder so cloneVoice() works
});

await tts.load((p) => {
  if (p.total) console.log(`${p.label}: ${(p.loaded / p.total * 100) | 0}%`);
});

const player = new StreamingPlayer({ sampleRate: tts.sampleRate });
await player.resume(); // call from a user gesture

// Pick a built-in voice…
const voice = await tts.loadVoice("alba");

// …and stream speech. Pass `meta` so playback stays gapless.
await tts.generate("Hello from your browser!", {
  voice,
  onChunk: (audio /* Float32Array @ tts.sampleRate */, meta) => player.play(audio, meta),
});
player.flush(); // release any audio still held by the jitter buffer

See Examples for built-in voices, cloning, and cache management.

Examples

English + built-in voice "alba"

Built-in voices come from a voices.bin and need no encoder, so you can skip the ~21 MB cloning model.

import { PocketTTS, StreamingPlayer } from "pocket-tts-js";

const tts = new PocketTTS({
  language: "english_2026-04",
  voiceCloning: false, // built-in voices don't need the encoder
});
await tts.load((p) => p.total && console.log(`${p.label} ${(p.loaded / p.total * 100) | 0}%`));

const player = new StreamingPlayer({ sampleRate: tts.sampleRate });
await player.resume(); // must run inside a user gesture (click/tap)

const alba = await tts.loadVoice("alba"); // any name from tts.predefinedVoices

const metrics = await tts.generate("Hi, I'm Alba, speaking right inside your browser.", {
  voice: alba,
  onChunk: (audio, meta) => player.play(audio, meta),
});
player.flush();
console.log(`done — RTFx ${metrics.rtfx.toFixed(2)}x`);

English + voice cloning

Clone a voice from any mono reference clip (file upload, fetch, microphone…).

import { PocketTTS, StreamingPlayer } from "pocket-tts-js";

const tts = new PocketTTS({
  language: "english_2026-04",
  voiceCloning: true, // default — downloads the encoder
});
await tts.load();

const player = new StreamingPlayer({ sampleRate: tts.sampleRate });
await player.resume();

// Decode the reference clip to a mono Float32Array (any sample rate)
const fileBuffer = await referenceFile.arrayBuffer();
const audioCtx = new AudioContext();
const decoded = await audioCtx.decodeAudioData(fileBuffer);
const mono = decoded.getChannelData(0);

const myVoice = await tts.cloneVoice(mono, { inputSampleRate: decoded.sampleRate });
await audioCtx.close();

await tts.generate("This sentence is spoken in the cloned voice.", {
  voice: myVoice,
  onChunk: (audio, meta) => player.play(audio, meta),
});
player.flush();

Clearing the cached model

Free the persisted models/voices from disk and force a fresh download next time.

import { PocketTTS } from "pocket-tts-js";

// Tear down a running instance first (worker + in-memory ONNX sessions)…
tts.destroy();

// …then delete the on-disk Cache Storage bucket.
await PocketTTS.clearCache();

// Optional: inspect what the browser still stores for this origin.
const est = await PocketTTS.storageEstimate(); // { usage, quota } in bytes, or null
if (est) console.log(`using ${(est.usage / 1e6).toFixed(0)} MB of ${(est.quota / 1e6).toFixed(0)} MB`);

Cross-origin isolation (recommended)

onnxruntime-web runs multi-threaded when the page is cross-origin isolated. Serve your app with:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Without these it still works, just single-threaded (slower). Hugging Face and jsDelivr both send the CORS/CORP headers required under require-corp.

API

new PocketTTS(options)

| option | default | description | | -------------- | ---------------------- | ----------- | | language | "english_2026-04" | Language bundle (PocketTTS.LANGUAGES lists all). | | quantized | true | INT8 models vs full precision. | | voiceCloning | true | Download the encoder so cloneVoice() works (~21 MB). | | modelBaseUrl | HF …/onnx | Base URL of the onnx/ folder. | | ortBaseUrl | jsDelivr ORT 1.20.0 | Base URL for onnxruntime-web dist files. | | voicesUrl | null | Explicit URL to a voices.bin (see Built-in voices). | | maxThreads | 8 | Max WASM threads when cross-origin isolated. |

Methods

  • load(onProgress?) → Promise<BundleInfo> — download the runtime + selected models and initialise.
  • cloneVoice(audio: Float32Array, { inputSampleRate?, name? }) → Promise<string> — encode a reference clip into a voice reference.
  • loadVoice(name: string) → Promise<string> — prepare a built-in voice (requires voices.bin).
  • generate(text, { voice, onChunk }) → Promise<{ rtfx, genTime, audioDuration }> — stream synthesis.
  • stop() → Promise<void> — stop the current generation early.
  • destroy() — terminate the worker and free resources.

Helpers

  • StreamingPlayer — gapless playback of streamed chunks (play, reset, stop, resume, analyser).
  • chunksToWavBlob(chunks, sampleRate) — assemble collected chunks into a downloadable WAV.
  • resampleLinear(data, fromRate, toRate) — simple linear resampler.
  • SentencePieceTokenizer — the standalone pure-JS Unigram tokenizer.

What gets downloaded

For the chosen language + quantized setting only:

| file | needed for | INT8 size | | ---- | ---------- | --------- | | bundle.json, tokenizer.model | always | ~80 KB | | flow_lm_main, flow_lm_flow, mimi_decoder | always | ~109 MB | | text_conditioner | always | ~16 MB | | mimi_encoder | only if voiceCloning | ~21 MB | | bos_before_voice.npy | only if voiceCloning | <1 KB | | voices.bin | only when a built-in voice is requested | varies |

Full-precision (quantized: false) variants are larger.

Caching (no re-download every load)

Downloaded assets (models, tokenizer, voices.bin) are persisted in the browser's Cache Storage by default, so after the first visit later loads read from disk — no network, works offline. Only ONNX session compilation runs (a few seconds).

new PocketTTS({ cache: true });            // default
new PocketTTS({ cache: false });           // always fetch from network
new PocketTTS({ cacheName: "my-bucket" }); // custom Cache Storage bucket

await PocketTTS.clearCache();              // free the space / force fresh download
await PocketTTS.storageEstimate();         // { usage, quota } in bytes (or null)

Progress callbacks include fromCache: true when a file is served from the cache. Caching needs a secure context (https:// or localhost); it silently falls back to plain network fetches if Cache Storage is unavailable or the quota is exceeded. Bump cacheName (or call clearCache()) when you publish new model weights.

Built-in voices

Each language bundle ships a voices.bin with several ready-made speakers. List them via tts.predefinedVoices and prepare one with tts.loadVoice(name). The voices.bin for a language is downloaded lazily, only the first time you request a built-in voice.

console.log(tts.predefinedVoices); // e.g. ["alba", "azelma", "cosette", …]
const voice = await tts.loadVoice("alba");

To serve voices from a different location, point the library at your own file:

new PocketTTS({ voicesUrl: "https://example.com/english_2026-04/voices.bin" });

Voice cloning is also available and needs no voices.bin at all.

License

The library code is licensed under the MIT License.

The Pocket TTS model weights and the bundled built-in voice assets that this library downloads at runtime are © Kyutai and licensed under CC-BY-4.0 — see kyutai/pocket-tts. Your use of those assets is subject to the CC-BY-4.0 terms (including attribution), independently of this package's MIT license.