npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@uzen/kokoro-js

v1.2.4

Published

High-quality text-to-speech for the web

Readme

Kokoro TTS

Language: English | 简体中文

@uzen/kokoro-js is a JavaScript runtime wrapper for Kokoro TTS based on Transformers.js. This fork is wired for the v1.1 Chinese model and supports Mandarin plus mixed Chinese/English text through the built-in phonemizer.

Goals

  • Add browser-local Mandarin and mixed Chinese/English TTS support to kokoro.js.
  • Use Python kokoro plus misaki v1.1 Chinese frontend behavior as the reference for Chinese phonemization, normalization, tone sandhi, and erhua handling.
  • Keep the JavaScript package self-contained and browser-friendly at runtime; Python is used only as a behavior reference, not as a runtime dependency.
  • Target practical web inference with the v1.1 Chinese ONNX model, local voice assets, and a WebGPU-first default path.
  • Preserve usable English spans in mixed text by routing them through the existing English phonemizer path instead of dropping them.

Current Status

  • Chinese phonemization now emits v1.1 Chinese tokenizer symbols directly instead of falling back to English espeak phonemization.
  • The implementation covers common Mandarin normalization cases including time expressions, dates, numeric ranges, fractions, negative numbers, percentages, phone numbers, temperatures, measurements, money, and quantified numbers.
  • Tone-related handling includes selected neutral-tone words and particles, reduplication, / sandhi, third-tone sandhi, and erhua handling.
  • Golden corpus tracking is based on Python misaki v1.1 outputs; the current plan records 164 examples with 143 matches and 21 known gaps.
  • Remaining gaps mainly come from Intl.Segmenter versus jieba.posseg segmentation differences, missing POS information in the browser path, and long-tail text normalization cases.

Features

  • Runs Kokoro TTS in browser and Node.js through Transformers.js.
  • Supports the v1.1 Chinese ONNX model: onnx-community/Kokoro-82M-v1.1-zh-ONNX.
  • Supports Mandarin text and mixed Chinese/English text with explicit Chinese phonemization.
  • Provides local voice loading through voicePath; voice .bin files are not bundled in the npm package.
  • Supports single-shot generation through tts.generate() and chunked generation through tts.stream().

Known Limitations

  • Voice assets must be hosted or copied by the application before inference can run.
  • The registered English voice set is limited to af_maple, af_sol, and bf_vale.
  • Chinese phonemization is implemented in JavaScript and may still differ from Python misaki on long-tail polyphones or text normalization cases.
  • Browser performance depends on WebGPU availability; use wasm only as a fallback when webgpu is unavailable.
  • fp32, fp16, and q4f16 are recommended; q8 and q4 are supported by the loader but not recommended for this model.

Install

npm i @uzen/kokoro-js

Installing this package also installs its runtime dependencies, including @huggingface/transformers. Transformers.js brings in ONNX Runtime packages such as onnxruntime-web as transitive dependencies; applications do not need to install them separately.

Model And Voices

Use the v1.1 Chinese ONNX model:

onnx-community/Kokoro-82M-v1.1-zh-ONNX

Voice .bin files are not bundled in the npm package. Download the voice files you need from the model repository and expose them through voicePath.

  • Browser/Vite apps: put files such as zf_001.bin under public/kokoro/voices/; the default voicePath is /kokoro/voices.
  • Node.js: put the files in a local directory and pass that directory as voicePath, for example voicePath: "./voices".

Current registered voices are the v1.1 Chinese voices plus two American English voices and one British English voice:

af_maple, af_sol, bf_vale
zf_001, zf_002, zf_003, zf_004, zf_005, zf_006, zf_007, zf_008,
zf_017, zf_018, zf_019, zf_021, zf_022, zf_023, zf_024, zf_026,
zf_027, zf_028, zf_032, zf_036, zf_038, zf_039, zf_040, zf_042,
zf_043, zf_044, zf_046, zf_047, zf_048, zf_049, zf_051, zf_059,
zf_060, zf_067, zf_070, zf_071, zf_072, zf_073, zf_074, zf_075,
zf_076, zf_077, zf_078, zf_079, zf_083, zf_084, zf_085, zf_086,
zf_087, zf_088, zf_090, zf_092, zf_093, zf_094, zf_099,
zm_009, zm_010, zm_011, zm_012, zm_013, zm_014, zm_015, zm_016,
zm_020, zm_025, zm_029, zm_030, zm_031, zm_033, zm_034, zm_035,
zm_037, zm_041, zm_045, zm_050, zm_052, zm_053, zm_054, zm_055,
zm_056, zm_057, zm_058, zm_061, zm_062, zm_063, zm_064, zm_065,
zm_066, zm_068, zm_069, zm_080, zm_081, zm_082, zm_089, zm_091,
zm_095, zm_096, zm_097, zm_098, zm_100

Call tts.list_voices() or read tts.voices at runtime for the authoritative list from the installed package.

Generate Speech

import { KokoroTTS } from "@uzen/kokoro-js";

const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "fp32",
  device: "webgpu",
  voicePath: "/kokoro/voices",
});

const audio = await tts.generate("你好,欢迎使用 Kokoro 中文语音。", {
  voice: "zf_001",
  speed: 1,
});

audio.save("audio.wav");

Use "fp32" as the default dtype and "webgpu" as the default browser device. Recommended dtype values are "fp32", "fp16", and "q4f16". Other supported quantized variants such as "q8" and "q4" are not recommended for this model. Supported device values are "webgpu", "wasm", "cpu", or null; browser apps should prefer "webgpu" when available and can fall back to "wasm", while Node.js uses "cpu".

Stream Speech

import { KokoroTTS, TextSplitterStream } from "@uzen/kokoro-js";

const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "fp32",
  device: "webgpu",
});

const splitter = new TextSplitterStream();
const stream = tts.stream(splitter, { voice: "zf_001" });

(async () => {
  let index = 0;
  for await (const { text, phonemes, audio } of stream) {
    console.log({ text, phonemes });
    audio.save(`audio-${index++}.wav`);
  }
})();

splitter.push("Kokoro 支持中文和中英 mixed text。它可以分段生成音频。");
splitter.close();

tts.stream() also accepts a string directly. By default it uses TextSplitterStream sentence splitting and further splits long chunks at punctuation boundaries.

API Notes

  • KokoroTTS.from_pretrained(model_id, options) loads the model and tokenizer through Transformers.js.
  • tts.generate(text, { voice, speed }) returns a Transformers.js RawAudio object.
  • tts.stream(textOrSplitter, options) yields { text, phonemes, audio } for each generated chunk.
  • voicePath is a base path, not a full file path; the runtime loads ${voicePath}/${voice}.bin in browsers and resolves the same pattern from the local filesystem in Node.js.
  • Chinese voices begin with zf_ or zm_ and use the Chinese phonemizer path. English voices beginning with af_ or bf_ use the English phonemizer path.