easy-supertonic-tts
v1.0.0
Published
A standalone TypeScript library for Supertonic TTS using ONNX runtime.
Maintainers
Readme
easy-supertonic-tts
Note: This is an unofficial library. It is NOT an official library from Supertone.
A standalone TypeScript library for Supertonic Text-to-Speech (TTS) using ONNX Runtime. This library allows you to generate high-quality speech from text with support for multiple voice styles and automatic asset management.
Features
- Standalone: No external dependencies required except ONNX Runtime.
- TypeScript First: Full type safety and modern syntax.
- Multiple Output Formats: Supports generation to Buffer, Stream, and File.
- Automatic Asset Management: Automatically downloads required ONNX models and voice styles from HuggingFace on first initialization.
- Multi-language Support: Supports Korean (
ko), English (en), Spanish (es), Portuguese (pt), and French (fr).
Installation
npm install easy-supertonic-ttsNote: This library depends on onnxruntime-node. Ensure your environment supports it.
Quick Start
import { SupertonicTTS } from 'easy-supertonic-tts';
import path from 'path';
async function main() {
const tts = new SupertonicTTS({
assetsPath: path.join(process.cwd(), 'assets'), // Path to store/load models
useGpu: false // Currently supports CPU inference
});
// Initialize (Downloads ~150MB of models if missing)
await tts.init();
// Synthesize to a WAV file
await tts.synthesizeToFile({
text: "Hello, this is a test from Supertonic TTS.",
voiceStyle: "M1", // Available styles: M1-M5, F1-F5
lang: "en"
}, "output.wav");
// Synthesize to a Buffer
const buffer = await tts.synthesizeToBuffer({
text: "안녕하세요, 슈퍼토닉 TTS 테스트입니다.",
voiceStyle: "F2",
lang: "ko"
});
// Synthesize to a Stream
const stream = await tts.synthesizeToStream({
text: "Streaming audio content.",
voiceStyle: "M2",
lang: "en"
});
}
main().catch(console.error);API
new SupertonicTTS(options)
Initializes the TTS instance.
assetsPath: (Optional) Directory where ONNX models and style files will be stored (default:"./assets").useGpu: (Optional) Whether to use GPU for inference (default:false).
tts.init()
Downloads missing assets from HuggingFace and loads the ONNX models. This must be called before synthesis.
tts.synthesizeToBuffer(options)
- Returns:
Promise<Buffer>(WAV format)
tts.synthesizeToStream(options)
- Returns:
Promise<Readable>(WAV format)
tts.synthesizeToFile(options, filePath)
- Saves the audio to
filePath.
SynthesizeOptions
text: Text to synthesize.voiceStyle: Voice style ID (e.g.,"M1","F1", etc.).lang: Language code ("en","ko","es","pt","fr", default:"en").speed: Playback speed (default:1.05).totalStep: Inference steps (default:5).silenceDuration: Silence at the end of parts (default:0.3).
Voice Styles
The library supports various voice styles:
- Male:
M1,M2,M3,M4,M5 - Female:
F1,F2,F3,F4,F5
Acknowledgments
This library is based on the Supertonic TTS model available on HuggingFace.
License
MIT
