npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

streamtts

v0.1.5

Published

Chatterbox Turbo TTS in the browser — WebGPU/WASM, voice cloning, localized model chunks

Downloads

649

Readme

streamtts

Chatterbox Turbo TTS in the browser — WebGPU/WASM, voice cloning, chunked model delivery.

Ships the Resemble AI Chatterbox Turbo ONNX model (~350M) via a q4/q4f16 quantized pipeline that runs fully client-side through Transformers.js v4. A companion models branch hosts the model split into ≤99MB chunks so it can be served from GitHub Pages / raw.githubusercontent without hitting the 100MB per-file limit.

On WebGPU, streamtts loads spacekaren/chatterbox-turbo-webgpu instead — the same 350M model with all int64 ops replaced by int32 so it actually runs on WebGPU (WebGPU cannot execute int64 Cast operations). On WASM it falls back to the stock ResembleAI repo.

install

npm install streamtts @huggingface/transformers

use

import { ChatterboxSDK } from 'streamtts'

const tts = new ChatterboxSDK()

// optional — load model from a chunked mirror instead of HF Hub
await tts.configure({
  modelBasePath: 'https://raw.githubusercontent.com/AnEntrypoint/streamtts/models/',
  allowRemoteModels: false,
})

await tts.load()                                       // auto-detect WebGPU, fall back to WASM
await tts.encodeSpeaker('voice-a', float32AudioData)   // from a WAV decoded to mono f32
const { waveform } = await tts.generate('hello', 'voice-a', 0.5)  // 24kHz mono float32

For long text, use generateChunked(text, speakerId, exaggeration, onProgress) — it splits on sentence/paragraph boundaries and stitches the outputs with silence. Call tts.abort() to interrupt a chunked run in flight.

model chunks

The models branch is rebuilt by build-model.yml whenever the upstream HF commit changes. Every file over 99MB is split into .part0, .part1, …; a chunks.json manifest at the branch root lists each split file with its total size and part count. The SDK's worker installs a fetch interceptor that transparently reassembles .part* ranges before onnx-runtime sees the bytes.

demo

Live demo app is in this repo (React + Vite). Run npm run dev to bring it up locally; the Vercel preview is at https://transformersjs-chatterbox-demo.vercel.app/.

Features

  • Zero-shot voice cloning — Record or upload a 5-10 second voice sample, then generate speech in that voice
  • Expressiveness control — Adjust the exaggeration slider (0–1.5) to control how expressive the generated speech sounds
  • WebGPU acceleration — Automatically detects and uses WebGPU when available, falls back to WASM
  • Offline after first load — Model files (~1.5 GB) are cached by the browser after the initial download
  • Web Worker inference — All model computation runs off the main thread for a smooth UI

Demo Modes

Playground

Full-featured TTS explorer. Type text, record a reference voice, adjust expressiveness, and generate speech. Displays real-time performance metrics (inference time, audio duration, real-time factor).

Echo — Voice Message Maker

Create personalized voice message cards in three steps:

  1. Record your voice (or upload a sample)
  2. Compose your message — pick a themed card (Birthday, Thank You, Holiday, Congrats, Get Well, Love), write your text, and adjust expressiveness
  3. Preview & Share — listen to the result and download as a WAV file

VoiceCraft — Dialogue Creator

Build multi-character dialogues with different voices:

  • Add characters, each with their own voice sample and color
  • Write a script with per-line character assignment and expressiveness control
  • Generate all lines sequentially, each using the correct speaker embedding
  • View the dialogue as a color-coded timeline
  • Export the entire conversation as a single WAV file with natural pauses between lines

Narrator — Story Reader

Turn text into narrated audio with automatic dialogue detection:

  • Paste any story or pick from built-in samples (The Fox and the Grapes, The Last Robot, Counting Stars)
  • Automatic dialogue detection via regex — identifies quoted speech and attributes characters
  • Assign different voices to the narrator and each detected character
  • Read-along display with paragraph-level highlighting during playback
  • Navigate between paragraphs

Getting Started

Prerequisites

  • Node.js 18+ (20+ recommended)
  • npm 9+
  • A modern browser with WebGPU support (Chrome 113+, Edge 113+) for best performance. Falls back to WASM on older browsers.

Installation

git clone https://github.com/resemble-ai/transformersjs-chatterbox-demo.git
cd transformersjs-chatterbox-demo
npm install

Development

npm run dev

Open http://localhost:5173 in your browser.

Production Build

npm run build
npm run preview   # preview the build locally

The built files are in dist/ and can be deployed to any static hosting (Vercel, Netlify, GitHub Pages, etc.).

How It Works

Architecture

┌─────────────────────────────────────────────────┐
│                  Main Thread                     │
│                                                  │
│  React App ──► tts-client.js ──► Web Worker     │
│    (UI)        (RPC bridge)     (Chatterbox)     │
│                                                  │
│  Zustand Store ◄── events ◄── Worker messages    │
└─────────────────────────────────────────────────┘
  1. Web Worker (src/workers/tts.worker.js) — Loads the Chatterbox ONNX model, handles all inference. The model has 4 ONNX sessions: embed_tokens, speech_encoder, language_model (quantized to q4/q4f16), and conditional_decoder.

  2. RPC Client (src/lib/tts-client.js) — Singleton that provides a promise-based API over the worker's postMessage interface. Handles progress events, error propagation, and worker lifecycle.

  3. React HooksuseTTS() for model loading/generation, useAudioRecorder() for microphone recording with 24kHz resampling, useAudioPlayer() for playback with time tracking.

  4. State — Zustand store with per-mode slices. Audio buffers are stored as Float32Array to avoid serialization overhead.

Speaker Caching

Voice embeddings are computed once per speaker via model.encode_speech() and cached in the worker's memory. Subsequent generations with the same voice skip the encoding step entirely.

Model Details

| Session | Size | Quantization | |---------|------|-------------| | Embed Tokens | ~61 MB | fp32 | | Speech Encoder | ~591 MB | fp32 | | Language Model | ~353 MB | q4 (WASM) / q4f16 (WebGPU) | | Conditional Decoder | ~533 MB | fp32 |

The model is loaded from onnx-community/chatterbox-ONNX on Hugging Face and cached by the browser after the first download.

Tech Stack

| Technology | Version | Purpose | |-----------|---------|---------| | React | 19 | UI framework | | Vite | 7 | Build tool & dev server | | Tailwind CSS | 4 | Styling | | Zustand | 5 | State management | | React Router | 7 | Client-side routing | | Framer Motion | 12 | Page transitions & animations | | Transformers.js | 4.0.0-next.2 | In-browser ML inference |

Project Structure

src/
├── main.jsx                          # Entry point
├── App.jsx                           # Router + layout shell
├── index.css                         # Tailwind + custom styles
├── workers/
│   └── tts.worker.js                 # Chatterbox model inference
├── lib/
│   ├── tts-client.js                 # Promise-based RPC to worker
│   ├── audio-recorder.js             # Mic recording + 24kHz resampling
│   ├── audio-utils.js                # WAV encoding, concat, silence
│   ├── audio-player.js               # AudioContext playback engine
│   └── constants.js                  # Model ID, sample rate, tags, templates
├── hooks/
│   ├── useTTS.js                     # Model load, generate, speaker encode
│   ├── useAudioRecorder.js           # Record / upload voice samples
│   ├── useAudioPlayer.js             # Play / pause / seek
│   └── useModelStatus.js             # Global model readiness
├── store/
│   └── app-store.js                  # Zustand store (model + per-mode state)
└── components/
    ├── layout/                       # AppShell, Sidebar, ModeHeader
    ├── shared/                       # ModelLoader, VoiceRecorder, AudioPlayer,
    │                                 # AudioWaveform, ExaggerationSlider, etc.
    ├── home/                         # Landing page with mode cards
    ├── playground/                   # TTS feature explorer
    ├── echo/                         # Voice message card maker
    ├── voicecraft/                   # Multi-character dialogue creator
    └── narrator/                     # Story reader with read-along

Browser Compatibility

| Browser | WebGPU | WASM Fallback | |---------|--------|---------------| | Chrome 113+ | Yes | Yes | | Edge 113+ | Yes | Yes | | Firefox | No | Yes | | Safari 18+ | Partial | Yes |

WebGPU provides significantly faster inference. The app auto-detects availability and falls back gracefully.

Known Limitations

  • No paralinguistic tag support — The Transformers.js ONNX port of Chatterbox does not currently support emotion/paralinguistic tags (e.g. [laugh], [sigh]). Tags in input text will be ignored or read literally. This may be added in a future Transformers.js release.
  • First load is large — The model weighs ~1.5 GB and must be downloaded on first visit. Subsequent visits use the browser cache.
  • Audio length — Generation uses max_new_tokens: 256, which limits output to roughly 5-10 seconds per call. Longer text should be split into chunks.

License

MIT

Acknowledgments