easy-supertonic-tts

v1.0.0

Published

14 days ago

A standalone TypeScript library for Supertonic TTS using ONNX runtime.

0High
0Medium
0Low

jhleee

tts text-to-speech supertonic onnx typescript

easy-supertonic-tts

Note: This is an unofficial library. It is NOT an official library from Supertone.

A standalone TypeScript library for Supertonic Text-to-Speech (TTS) using ONNX Runtime. This library allows you to generate high-quality speech from text with support for multiple voice styles and automatic asset management.

Features

Standalone: No external dependencies required except ONNX Runtime.
TypeScript First: Full type safety and modern syntax.
Multiple Output Formats: Supports generation to Buffer, Stream, and File.
Automatic Asset Management: Automatically downloads required ONNX models and voice styles from HuggingFace on first initialization.
Multi-language Support: Supports Korean (ko), English (en), Spanish (es), Portuguese (pt), and French (fr).

Installation

npm install easy-supertonic-tts

Note: This library depends on onnxruntime-node. Ensure your environment supports it.

Quick Start

import { SupertonicTTS } from 'easy-supertonic-tts';
import path from 'path';

async function main() {
    const tts = new SupertonicTTS({
        assetsPath: path.join(process.cwd(), 'assets'), // Path to store/load models
        useGpu: false // Currently supports CPU inference
    });

    // Initialize (Downloads ~150MB of models if missing)
    await tts.init();

    // Synthesize to a WAV file
    await tts.synthesizeToFile({
        text: "Hello, this is a test from Supertonic TTS.",
        voiceStyle: "M1", // Available styles: M1-M5, F1-F5
        lang: "en"
    }, "output.wav");

    // Synthesize to a Buffer
    const buffer = await tts.synthesizeToBuffer({
        text: "안녕하세요, 슈퍼토닉 TTS 테스트입니다.",
        voiceStyle: "F2",
        lang: "ko"
    });
    
    // Synthesize to a Stream
    const stream = await tts.synthesizeToStream({
        text: "Streaming audio content.",
        voiceStyle: "M2",
        lang: "en"
    });
}

main().catch(console.error);

API

`new SupertonicTTS(options)`

Initializes the TTS instance.

assetsPath: (Optional) Directory where ONNX models and style files will be stored (default: "./assets").
useGpu: (Optional) Whether to use GPU for inference (default: false).

`tts.init()`

Downloads missing assets from HuggingFace and loads the ONNX models. This must be called before synthesis.

`tts.synthesizeToBuffer(options)`

Returns: Promise<Buffer> (WAV format)

`tts.synthesizeToStream(options)`

Returns: Promise<Readable> (WAV format)

`tts.synthesizeToFile(options, filePath)`

Saves the audio to filePath.

`SynthesizeOptions`

text: Text to synthesize.
voiceStyle: Voice style ID (e.g., "M1", "F1", etc.).
lang: Language code ("en", "ko", "es", "pt", "fr", default: "en").
speed: Playback speed (default: 1.05).
totalStep: Inference steps (default: 5).
silenceDuration: Silence at the end of parts (default: 0.3).

Voice Styles

The library supports various voice styles:

Male: M1, M2, M3, M4, M5
Female: F1, F2, F3, F4, F5

Acknowledgments

This library is based on the Supertonic TTS model available on HuggingFace.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

easy-supertonic-tts

Features

Installation

Quick Start

API

new SupertonicTTS(options)

tts.init()

tts.synthesizeToBuffer(options)

tts.synthesizeToStream(options)

tts.synthesizeToFile(options, filePath)

SynthesizeOptions

Voice Styles

Acknowledgments

License

`new SupertonicTTS(options)`

`tts.init()`

`tts.synthesizeToBuffer(options)`

`tts.synthesizeToStream(options)`

`tts.synthesizeToFile(options, filePath)`

`SynthesizeOptions`