tiny-tts
v5.0.1
Published
Ultra-lightweight text-to-speech (1.6M params). Pure Node.js inference via ONNX Runtime — zero Python dependency.
Maintainers
Readme
TinyTTS
Ultra-lightweight Text-to-Speech for Node.js — 1.6M params, 44.1kHz, ~53x real-time on CPU.
Pure Node.js offline TTS inference via ONNX Runtime. Zero Python dependency. The ONNX model (~6 MB) is auto-downloaded from HuggingFace on first use.
Installation
npm install tiny-ttsQuick Start
const TinyTTS = require('tiny-tts');
const tts = new TinyTTS();
// Synthesize and save to WAV
await tts.speak('Hello world!', { output: 'hello.wav' });
// With options
await tts.speak('This is a fast speech test.', {
output: 'fast.wav',
speaker: 'MALE',
speed: 1.5
});
// Clean up
await tts.dispose();CLI
# Basic usage
npx tiny-tts "Hello world!" -o hello.wav
# With options
npx tiny-tts "The weather is nice today." -o output.wav -s MALE --speed 1.2Features
- Offline inference — no server, no API calls, no Python needed
- ONNX Runtime — fast CPU inference (~53x real-time)
- Neural G2P — ported g2p_en GRU model for accurate pronunciation of any English word
- Full CMU dictionary — 123,463 entries for precise phoneme lookup
- 100% G2P match with Python (PyPI) version across 542 test sentences
- Auto model download — ONNX model fetched from HuggingFace on first run
API
new TinyTTS(options?)
| Option | Type | Default | Description | |--------|------|---------|-------------| | modelPath | string | (auto-download) | Path to ONNX model file |
tts.speak(text, options?)
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| output | string | 'output.wav' | Output WAV file path |
| speaker | string | 'MALE' | Speaker ID (MALE or FEMALE) |
| speed | number | 1.0 | Speech speed (0.3–3.0) |
Returns: Promise<Buffer> — WAV audio data (also saves to file if output is set)
tts.dispose()
Release ONNX session resources.
How It Works
- Text normalization — expands numbers, abbreviations, time expressions
- G2P pipeline — converts text to phonemes:
- Apostrophe-aware word splitting (matches Python BERT tokenizer behavior)
- CMU dictionary lookup (123K entries)
- Neural G2P fallback (GRU encoder-decoder, identical to Python
g2p_en)
- Phoneme → IDs — maps phonemes + tones to model input tensors
- ONNX inference — generates 44.1kHz audio waveform
- WAV output — saves as standard WAV file
Model Info
| Metric | Value | |--------|-------| | Parameters | 1.6M | | Model size | ~3.4 MB (ONNX FP16) | | Sample rate | 44.1 kHz | | CPU speed | ~53x real-time | | Language | English |
Python Version
Also available on PyPI with the same G2P output:
pip install tiny-ttsfrom tiny_tts import TinyTTS
tts = TinyTTS()
tts.speak("Hello world!", output_path="hello.wav")License
Apache License 2.0
