npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pocketpalai/react-native-speech

v2.3.1

Published

A high-performance React Native library for text-to-speech on iOS and Android

Downloads

927

Readme

@pocketpalai/react-native-speech

On-device, multi-engine text-to-speech for React Native. Wraps the OS-native TTS (iOS AVSpeechSynthesizer / Android TextToSpeech) and three neural engines — Kokoro, Supertonic, Kitten — behind a single API, with native audio playback, progress events, and audio-focus handling.

New Architecture only. Requires React Native's New Architecture. RN 0.76+ enables it by default. For 0.68–0.75 see the enable-apps guide.

Preview

Streaming (LLM token stream → TTS)

| iOS | Android | | :---: | :---: | | | |

One-shot speak

| iOS | Android | | :---: | :---: | | | |

Features

  • Four engines behind one API: OS_NATIVE (platform TTS), KOKORO (high quality, multi-language), SUPERTONIC (fast, lightweight), KITTEN (compact IPA-driven).
  • License-neutral runner: the library is MIT and ships no model or dictionary data. Consumer apps supply both at runtime. See LICENSES.md.
  • On-device synthesis: neural TTS runs entirely on-device. The library performs no network I/O during synthesis. Any initial model or dictionary download is performed by the consumer app using its own network stack.
  • Interruption-aware audio: iOS AVAudioSession and Android AudioFocus are wired through a JS onAudioInterruption event so apps can react to phone calls and other interruptions.
  • Turbo-module native layer: native audio playback, progress events, and chunk progress for neural engines.
  • Permissive phonemization: default is phonemize (MIT). Optionally supply a mmap'd EPD1 dict via the NativeDict API for higher accuracy — see PHONEMIZATION.md.
  • HighlightedText component: highlight spoken text as it synthesizes.
  • TypeScript: full type definitions; per-engine config is a discriminated union on the engine field.

Installation

npm install @pocketpalai/react-native-speech
# or
yarn add @pocketpalai/react-native-speech

iOS:

cd ios && pod install

Expo (bare only — not supported in Expo Go):

npx expo install @pocketpalai/react-native-speech
npx expo prebuild

Neural engines (optional)

The neural engines need onnxruntime-react-native (optional peer):

npm install onnxruntime-react-native

OS-native TTS works without it.

Quickstart

import Speech, {TTSEngine} from '@pocketpalai/react-native-speech';

await Speech.initialize({engine: TTSEngine.OS_NATIVE});
// voiceId is optional for OS_NATIVE — omitted uses the platform default voice.
await Speech.speak('Hello world');

Neural engine quickstarts

The consumer app is responsible for downloading models and passing file paths. See example/src/utils/ for reference model managers.

// Kokoro
await Speech.initialize({
  engine: TTSEngine.KOKORO,
  modelPath: 'file:///.../kokoro.onnx',
  voicesPath: 'file:///.../voices.bin',
  tokenizerPath: 'file:///.../tokenizer.json',
});
await Speech.speak('Hello from Kokoro.', 'af_bella');

// Supertonic (4 ONNX files)
await Speech.initialize({
  engine: TTSEngine.SUPERTONIC,
  durationPredictorPath: 'file:///.../duration_predictor.onnx',
  textEncoderPath: 'file:///.../text_encoder.onnx',
  vectorEstimatorPath: 'file:///.../vector_estimator.onnx',
  vocoderPath: 'file:///.../vocoder.onnx',
  unicodeIndexerPath: 'file:///.../unicode_indexer.json',
  voicesPath: 'file:///.../voices/',
});
await Speech.speak('Hello from Supertonic.', 'F1');

// Kitten
await Speech.initialize({
  engine: TTSEngine.KITTEN,
  modelPath: 'file:///.../kitten.onnx',
  voicesPath: 'file:///.../voices.json',
  dictPath: 'file:///.../en-us.bin', // optional EPD1 dict
});
await Speech.speak('Hello from Kitten.', 'expr-voice-2-f');

Full options (execution providers, chunking, phonemizer selection) are documented in USAGE.md.

Streaming input (LLM token streams)

If your app plays a token-by-token LLM response through TTS, use createSpeechStream() instead of calling speak() per sentence. It buffers incoming text and adaptively flushes batches through the underlying engine so playback sounds continuous — the first sentence flushes as soon as it completes (low latency) and subsequent batches are packed up to targetChars characters.

const stream = Speech.createSpeechStream('af_bella', {
  targetChars: 300, // default
  onError: err => console.warn(err),
});

for await (const token of llmTokenStream) {
  stream.append(token); // non-blocking
}

await stream.finalize(); // flushes the tail and resolves when playback ends
// or: await stream.cancel(); // stops and discards

Per-sentence speak() chains produce audible gaps: each call resets the engine's internal synth pipeline, starting a fresh F0 contour and a cold first-chunk inference. The stream avoids this by keeping one continuous synth+play loop alive for the stream's entire lifetime — the next chunk is synthesized while the current one plays, so the only gap is genuine token-rate underrun (LLM slower than playback).

You can also track playback position with stream-absolute offsets:

stream.onProgress(event => {
  // event.streamRange is relative to the total text appended so far
  highlightText(event.streamRange.start, event.streamRange.end);
});

Works with all neural engines (Kokoro, Supertonic, Kitten) as well as the OS engine. See the Streaming tab in example/ for a live demo that simulates variable token rates.

Architecture (short)

  1. Speech is the public facade. Speech.initialize(config) dispatches on config.engine and constructs the matching engine.
  2. Each engine implements TTSEngineInterface<TConfig>. Neural engines run ONNX sessions under onnxruntime-react-native and stream PCM to the native audio player.
  3. Native code handles playback, progress events, and OS-level audio focus / session interruptions.

See ARCHITECTURE.md for the full picture, including memory and device requirements.

Model & dictionary downloads

The library ships no model or dictionary assets. Consumer apps fetch them from their own origin (typically Hugging Face) and pass local paths into initialize(). See LICENSES.md for upstream sources and license notes per engine.

Known limitations

  • First run per engine has a 200–2000 ms cold-start (model load + compilation).
  • Neural engines recommend a 3 GB+ RAM device. Low-memory devices should prefer the Kitten nano/micro variants or fall back to OS_NATIVE.
  • OS TTS interruption handling is limited to what the platform provides — no library-level custom ducking beyond what iOS/Android expose.
  • Hermes is supported, but has no TextDecoder or WASM — relevant only if you extend the library's text pipeline.
  • Android 16 KB page sizes (Android 15+): the library's own native_dict.so is 16 KB-aligned, but onnxruntime-react-native (≤ 1.24.3 at time of writing) is not — apps that load a neural engine on a 16 KB-page device will fail with dlopen errors. Workaround: a one-line linker flag added to its CMakeLists.txt via patch-package. See example/patches/onnxruntime-react-native+1.24.3.patch and the postinstall wiring in example/package.json for the full setup. Drop the patch once upstream ships the fix.

Testing

Mock the module in tests by creating __mocks__/@pocketpalai/react-native-speech.ts:

module.exports = require('@pocketpalai/react-native-speech/jest');

Contributing

See CONTRIBUTING.md.

Credits

Forked from @mhpdev/react-native-speech by Mhpdev. The 1.x line provided the OS-native TTS foundation and the HighlightedText component; 2.0 extended the library into a multi-engine neural platform under a new package name.

Built on top of:

Neural model credits (weights are not bundled):

Full license details in LICENSES.md.

License

MIT. See LICENSE. For model and third-party data licenses, see LICENSES.md.