npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

kitten-tts-webgpu

v0.1.1

Published

Run Kitten TTS (80M) locally in the browser via WebGPU. One function call: textToSpeech('Hello!') → WAV blob.

Readme

Kitten TTS WebGPU

npm license

Pure WebGPU text-to-speech for the browser. 80M params, sub-second on desktop, ~1.2s on iPhone. No ONNX Runtime, no WASM inference — just 29 compute shaders. 753KB gzipped JS + model weights downloaded at runtime.

Live Demo | npm | Model Card


Quick Start

npm install kitten-tts-webgpu
import { textToSpeech } from 'kitten-tts-webgpu';

const blob = await textToSpeech("The quick brown fox jumps over the lazy dog.");
const audio = new Audio(URL.createObjectURL(blob));
audio.play();

One function. Text in, WAV blob out (16-bit PCM, 24 kHz mono). The model downloads on first call and is cached for subsequent calls. Full TypeScript types included.

Note: This library requires WebGPU. For server-side rendering frameworks (Next.js, Nuxt), dynamically import on the client side only.

Size & Performance

What gets downloaded

| | Size | When | |-|------|------| | JS bundle | 753 KB gzipped (2.9 MB raw) | npm install / bundled into your app | | Model weights | 24–78 MB (see below) | First textToSpeech() call, cached by browser |

The JS bundle includes the WebGPU engine, 29 compute shaders, and a 234K-word phonemizer dictionary. No WASM binaries, no ONNX Runtime.

Models

Three Kitten TTS v0.8 sizes, same API:

| Model | Params | Weights | M4 Pro (Chrome) | iPhone 17 Pro Max (Safari) | |-------|--------|---------|------------------|----------------------------| | Mini | 80M | 78 MB | 1.80s (3.3× RT) | ~1.2s | | Micro | 40M | 41 MB | 1.05s (6.2× RT) | — | | Nano | 15M | 24 MB | 0.93s (7.3× RT) | — |

RT = real-time factor (audio duration ÷ generation time). Higher is better. Times are for warm generation (model already in GPU). First call adds ~2-4s for model download depending on connection.

await textToSpeech("Hello world");                        // Default: nano (fastest, 24 MB)
await textToSpeech("Hello world", { model: 'micro' });    // Balanced (41 MB)
await textToSpeech("Hello world", { model: 'mini' });     // Best quality (78 MB)

Options

const blob = await textToSpeech("Welcome to the future.", {
  voice: "Leo",        // 8 voices: Bella, Luna, Rosie, Kiki, Jasper, Bruno, Hugo, Leo
  speed: 1.2,          // 0.5x – 2.0x
  model: "micro",      // mini | micro | nano
  onProgress: (stage) => console.log(stage), // string: "Initializing WebGPU…", "Downloading…", "Generating speech…", etc.
});

Voices

| Female | Male | |--------|------| | Bella | Jasper | | Luna | Bruno | | Rosie | Hugo | | Kiki | Leo |

Error Handling

// Check for WebGPU support
if (!navigator.gpu) {
  console.log("WebGPU not available — use Chrome 113+, Edge 113+, or Safari 26+");
}

// textToSpeech throws on:
// - No WebGPU support
// - Network error (model download fails)
// - Empty text input
try {
  const blob = await textToSpeech("Hello");
} catch (err) {
  console.error("TTS failed:", err.message);
}

Advanced: Direct Engine Access

For repeated generations or fine-grained control:

import { KittenTTSEngine, textToInputIds, float32ToWav } from 'kitten-tts-webgpu';

const engine = new KittenTTSEngine();
await engine.init();
await engine.loadModel(onnxUrl, voicesUrl);

const { ids } = await textToInputIds("Hello world");
const { waveform } = await engine.generate(ids, "Bella", 1.0);
// waveform: Float32Array of 24kHz PCM samples

const wavBlob = float32ToWav(waveform, 24000);

How It Works

29 hand-written WGSL compute shaders execute the full TTS pipeline on GPU:

Text → Phonemes (234K-word dictionary + espeak rules in pure JS)
  → ALBERT encoder (embedding, multi-head attention, FFN)
  → Duration predictor (LSTM + CNN)
  → Acoustic decoder (LSTM + AdaIN + CNN, style-conditioned)
  → HiFi-GAN vocoder (ConvTranspose1d, Snake activations, iSTFT)
  → 24kHz WAV

Why not ONNX Runtime Web?

Most browser TTS uses ONNX Runtime Web (~2MB WASM binary + C++ runtime). This project takes a different approach:

  • Custom ONNX parser — dequantizes int8/uint8/float16 weights in pure TypeScript, no C++ runtime
  • 234K-word phonemizer — espeak-ng rules ported to pure JS (WASM espeak hangs on iOS Safari)
  • GPU buffer pooling — reuses buffers across HiFi-GAN iterations, ~130MB peak on mobile
  • Dynamic architecture — detects model dimensions from weight shapes, one engine for all 3 sizes

Browser Support

| Browser | Status | |---------|--------| | Chrome 113+ | ✅ | | Edge 113+ | ✅ | | Safari 26+ (macOS/iOS) | ✅ | | Firefox Nightly | Experimental |

FAQ

Max input length? Recommended under ~500 characters per call. For longer text, split into sentences.

Languages? English only (matches the upstream Kitten TTS model).

Offline? Yes, after the model is cached in the browser. No server needed for inference.

Self-hosting models? Pass custom URLs to KittenTTSEngine.loadModel(onnxUrl, voicesUrl).

Bundle size? 753KB gzipped (2.9MB raw). Includes engine, 29 compute shaders, and 234K-word phonemizer dictionary. Model weights (24–78MB depending on model size) are downloaded separately at runtime on first call and cached by the browser.

Model license? Kitten TTS models are released under Apache 2.0. Code in this repo is MIT.

Development

git clone https://github.com/svenflow/kitten-tts-webgpu.git
cd kitten-tts-webgpu
npm install
npm run dev       # Dev server
npm run build     # Production build
npm test          # Phonemizer tests

Credits

  • Kitten TTS models by KittenML (Apache 2.0)
  • espeak-ng pronunciation dictionary and letter-to-sound rules (GPL-3.0, bundled as data files)
  • phonemizer by Xenova (espeak-ng WASM, used as primary backend on Chrome/Firefox; pure JS fallback on Safari)

License

MIT