voxflow
v1.5.1
Published
AI audio content creation CLI — stories, podcasts, narration, dubbing, transcription, translation, and video translation with TTS
Maintainers
Readme
ai-tts
AI audio content creation CLI — stories, podcasts, narration, dubbing, transcription, translation, and TTS synthesis.
Quick Start
# Synthesize a single sentence
npx ai-tts say "你好世界"
# Output as MP3 (smaller file size)
npx ai-tts say "你好世界" --format mp3
# Generate a story with TTS narration
npx ai-tts story --topic "三只小猪"
# Dub a video from SRT subtitles
npx ai-tts dub --srt subtitles.srt --video input.mp4 --output dubbed.mp4
# Transcribe audio to subtitles (SRT)
npx ai-tts asr --input recording.mp3
# Translate SRT subtitles to another language
npx ai-tts translate --srt subtitles.srt --to en
# End-to-end video translation (ASR → translate → dub → merge)
npx ai-tts video-translate --input video.mp4 --to en
# Browse available voices
npx ai-tts voices --search "温柔"A browser window will open for login on first use. After that, your token is cached automatically.
Install
npm install -g ai-ttsCommands
ai-tts say <text> / ai-tts synthesize <text>
Synthesize a single text snippet to audio.
ai-tts say "你好世界"
ai-tts say "你好世界" --format mp3
ai-tts synthesize "Welcome" --voice v-male-Bk7vD3xP --format mp3
ai-tts say "快速测试" --speed 1.5 --volume 0.8 --pitch 2| Flag | Default | Description |
|------|---------|-------------|
| <text> | (required) | Text to synthesize (positional or --text) |
| --voice <id> | v-female-R2s4N9qJ | TTS voice ID |
| --format <fmt> | pcm | Output format: pcm (WAV), wav, mp3 |
| --speed <n> | 1.0 | Speed 0.5-2.0 |
| --volume <n> | 1.0 | Volume 0.1-2.0 |
| --pitch <n> | 0 | Pitch -12 to 12 |
| --output <path> | ./tts-<timestamp>.wav | Output file path |
ai-tts narrate [options]
Narrate a document, text, or script to multi-segment audio.
ai-tts narrate --input article.txt
ai-tts narrate --input article.txt --format mp3
ai-tts narrate --input readme.md --voice v-male-Bk7vD3xP
ai-tts narrate --text "第一段。第二段。第三段。"
ai-tts narrate --script narration-script.json
echo "Hello world" | ai-tts narrate| Flag | Default | Description |
|------|---------|-------------|
| --input <file> | | Input .txt or .md file |
| --text <text> | | Inline text to narrate |
| --script <file> | | JSON script with per-segment voice control |
| --voice <id> | v-female-R2s4N9qJ | Default voice ID |
| --format <fmt> | pcm | Output format: pcm (WAV), wav, mp3 |
| --speed <n> | 1.0 | Speed 0.5-2.0 |
| --silence <sec> | 0.8 | Silence between segments 0-5.0 |
| --output <path> | ./narration-<timestamp>.wav | Output file path |
Script JSON format (per-segment voice/speed control):
{
"segments": [
{ "text": "第一段内容", "voiceId": "v-female-R2s4N9qJ", "speed": 1.0 },
{ "text": "第二段内容", "voiceId": "v-male-Bk7vD3xP", "speed": 0.8 }
],
"silence": 1.0,
"output": "my-narration.wav"
}ai-tts voices [options]
Browse and filter available TTS voices (no login required).
ai-tts voices
ai-tts voices --search "温柔" --gender female
ai-tts voices --language en --extended
ai-tts voices --json| Flag | Default | Description |
|------|---------|-------------|
| --search <query> | | Search by name, tone, style |
| --gender <m\|f> | | Filter by gender |
| --language <code> | | Filter by language: zh, en, etc. |
| --extended | false | Include extended voice library (380+) |
| --json | false | Output raw JSON |
ai-tts story [options]
Generate a story with AI and synthesize TTS audio.
ai-tts story --topic "小红帽的故事"
ai-tts story --topic "太空探险" --paragraphs 8 --speed 0.8| Flag | Default | Description |
|------|---------|-------------|
| --topic <text> | Children's story | Story prompt |
| --voice <id> | v-female-R2s4N9qJ | TTS voice ID |
| --output <path> | ./story-<timestamp>.wav | Output WAV file |
| --paragraphs <n> | 5 | Number of paragraphs (1-20) |
| --speed <n> | 1.0 | Speed (0.5-2.0) |
| --silence <sec> | 0.8 | Silence between paragraphs (0-5.0) |
ai-tts podcast [options]
Generate a multi-speaker podcast dialogue.
ai-tts podcast --topic "AI趋势" --exchanges 10
ai-tts podcast --topic "科技新闻" --style casual --length long| Flag | Default | Description |
|------|---------|-------------|
| --topic <text> | Tech trends | Podcast topic |
| --style <style> | professional | Dialogue style |
| --length <len> | medium | short / medium / long |
| --exchanges <n> | 8 | Number of exchanges (2-30) |
| --output <path> | ./podcast-<timestamp>.wav | Output WAV file |
| --speed <n> | 1.0 | Speed (0.5-2.0) |
| --silence <sec> | 0.5 | Silence between segments (0-5.0) |
ai-tts dub [options]
Dub audio from SRT subtitles with timeline-precise TTS synthesis. Supports multi-speaker voice mapping, dynamic speed compensation, video merge, and background music mixing.
# Basic: generate dubbed audio from SRT
ai-tts dub --srt subtitles.srt
# Dub and merge into video
ai-tts dub --srt subtitles.srt --video input.mp4 --output dubbed.mp4
# Multi-speaker with voice mapping
ai-tts dub --srt subtitles.srt --voices speakers.json --speed-auto
# Add background music with ducking
ai-tts dub --srt subtitles.srt --bgm music.mp3 --ducking 0.3
# Patch a single caption without full rebuild
ai-tts dub --srt subtitles.srt --patch 5 --output dub-existing.wav| Flag | Default | Description |
|------|---------|-------------|
| --srt <file> | (required) | SRT subtitle file |
| --video <file> | | Video file — merge dubbed audio into video |
| --voice <id> | v-female-R2s4N9qJ | Default TTS voice ID |
| --voices <file> | | JSON speaker-to-voiceId map for multi-speaker dubbing |
| --speed <n> | 1.0 | TTS speed 0.5-2.0 |
| --speed-auto | false | Auto-adjust speed when audio overflows timeslot |
| --bgm <file> | | Background music file to mix in |
| --ducking <n> | 0.5 | BGM volume ducking 0-1.0 (lower = quieter BGM) |
| --patch <id> | | Re-synthesize a single caption by ID (patch mode) |
| --output <path> | ./dub-<timestamp>.wav | Output file path (.wav or .mp4 with --video) |
SRT format with speaker tags (optional [Speaker: xxx] extension):
1
00:00:01,000 --> 00:00:03,500
[Speaker: Alice]
Hello, welcome to the show!
2
00:00:04,000 --> 00:00:06,500
[Speaker: Bob]
Thanks for having me.Voice mapping JSON (speakers.json):
{
"Alice": "v-female-R2s4N9qJ",
"Bob": "v-male-Bk7vD3xP"
}Requires
ffmpegin PATH for--video,--bgm, and--speed-autofeatures.
ai-tts asr [options] / ai-tts transcribe [options]
Transcribe audio or video files to text. Supports cloud ASR (Tencent Cloud, 3 modes) and local Whisper (offline, no quota).
# Transcribe with auto engine detection (local Whisper if available, else cloud)
ai-tts asr --input recording.mp3
# Force local Whisper (no login needed, no quota used)
ai-tts asr --input recording.mp3 --engine local
# Use a larger Whisper model for better accuracy
ai-tts asr --input meeting.wav --engine local --model small
# Cloud ASR with speaker diarization
ai-tts asr --input meeting.wav --engine cloud --speakers --speaker-number 3
# Transcribe video file, output plain text
ai-tts asr --input video.mp4 --format txt
# Remote URL (cloud only)
ai-tts asr --url https://example.com/audio.wav --mode flash
# Record from microphone (cloud only)
ai-tts asr --mic --format txt| Flag | Default | Description |
|------|---------|-------------|
| --input <file> | | Local audio or video file |
| --url <url> | | Remote audio URL (cloud only) |
| --mic | | Record from microphone (cloud only, requires sox) |
| --engine <type> | auto | Engine: auto, local, cloud |
| --model <name> | base | Whisper model: tiny, base, small, medium, large |
| --mode <type> | auto | Cloud mode: auto, sentence, flash, file |
| --lang <model> | 16k_zh | Language: 16k_zh, 16k_en, 16k_zh_en, 16k_ja, 16k_ko |
| --format <fmt> | srt | Output: srt, txt, json |
| --output <path> | <input>.<format> | Output file path |
| --speakers | false | Enable speaker diarization (cloud only) |
| --speaker-number <n> | | Expected speakers (with --speakers) |
| --task-id <id> | | Resume async task polling (cloud only) |
Engine selection:
auto— Uses local Whisper ifnodejs-whisperis installed, otherwise falls back to cloudlocal— Local Whisper via whisper.cpp (no login, no quota, offline capable)cloud— Tencent Cloud ASR (requires login, uses quota)
Local Whisper setup (optional):
npm install -g nodejs-whisper
# Model downloads automatically on first use (~142 MB for base)Requires
ffmpegin PATH for audio extraction from video files.
ai-tts translate [options]
Translate SRT subtitles, plain text, or text files using LLM-powered batch translation.
# Translate SRT file (Chinese → English)
ai-tts translate --srt subtitles.srt --to en
# Translate with timing realignment for target language
ai-tts translate --srt subtitles.srt --to en --realign
# Translate a text file
ai-tts translate --input article.txt --to ja --output article-ja.txt
# Translate inline text
ai-tts translate --text "你好世界" --to en
# Auto-detect source language
ai-tts translate --srt movie.srt --to ko| Flag | Default | Description |
|------|---------|-------------|
| --srt <file> | | SRT subtitle file to translate |
| --input <file> | | Plain text / markdown file to translate |
| --text <string> | | Inline text to translate |
| --from <lang> | auto-detect | Source language: zh, en, ja, ko, fr, de, es, etc. |
| --to <lang> | (required) | Target language code |
| --realign | false | Adjust subtitle timing for target language length differences |
| --batch-size <n> | 10 | Captions per translation batch (1-20) |
| --output <path> | <input>-<lang>.srt | Output file path |
Supported languages: zh, en, ja, ko, fr, de, es, pt, ru, ar, th, vi, id, and more.
Cost: 1 quota per batch (~10 captions). A 100-caption SRT costs ~10 quota.
ai-tts video-translate [options]
End-to-end video translation: extracts audio, transcribes, translates subtitles, dubs with TTS, and merges back into video.
# Translate Chinese video to English
ai-tts video-translate --input video.mp4 --to en
# Specify source language
ai-tts video-translate --input video.mp4 --from zh --to ja
# Keep intermediate files (SRT, audio) for debugging
ai-tts video-translate --input video.mp4 --to en --keep-intermediates
# Custom voice and speed
ai-tts video-translate --input video.mp4 --to en --voice v-male-Bk7vD3xP --speed 0.9| Flag | Default | Description |
|------|---------|-------------|
| --input <file> | (required) | Input video file |
| --from <lang> | auto-detect | Source language code |
| --to <lang> | (required) | Target language code |
| --voice <id> | v-female-R2s4N9qJ | TTS voice ID for dubbing |
| --voices <file> | | Voice mapping JSON for multi-speaker |
| --realign | false | Adjust subtitle timing for target language |
| --speed <n> | 1.0 | TTS speed (0.5-2.0) |
| --batch-size <n> | 10 | Translation batch size |
| --keep-intermediates | false | Keep temp files (SRT, audio) |
| --output <path> | <input>-<lang>.mp4 | Output MP4 path |
| --asr-mode <mode> | auto | Override ASR mode: auto, sentence, flash, file |
| --asr-lang <engine> | auto | Override ASR engine: 16k_zh, 16k_en, 16k_ja, 16k_ko |
Pipeline: Video → FFmpeg extract audio → ASR transcribe → LLM translate → TTS dub → FFmpeg merge → Output MP4
Cost: ~3-N quota (1 ASR + 1+ translate batches + 1 per TTS caption)
Requires
ffmpegin PATH.
ai-tts login / logout / status / dashboard
ai-tts login # Open browser to login via email OTP
ai-tts logout # Clear cached token
ai-tts status # Show login status and token info
ai-tts dashboard # Open Web dashboard in browserAuthentication
AI-TTS uses browser-based email OTP login (Supabase):
- CLI starts a temporary local HTTP server
- Opens your browser to the login page
- You enter your email and verification code
- Browser redirects back to the CLI with your token
- Token is cached at
~/.config/ai-tts/token.json
Quota
- Free tier: 100 quota per day
say/synthesize: 1 quota per callnarrate: 1 quota per segmentstory: ~6-8 quota (1 LLM + N TTS)podcast: ~10-20 quotadub: 1 quota per SRT captionasr(cloud): 1 quota per recognitionasr(local): free (no quota)translate: 1 quota per batch (~10 captions)video-translate: ~3-N quota (ASR + translate + TTS)voices: free (no quota)- Quota resets daily
Requirements
- Node.js >= 18.0.0
ffmpegrecommended — needed by most audio/video features:
| Command | Without FFmpeg | With FFmpeg |
|---------|---------------|-------------|
| say / synthesize | Full support | Full support |
| narrate | Full support | Full support |
| story / podcast | Full support | Full support |
| voices | Full support | Full support |
| dub --srt file.srt | Audio output only | Audio output only |
| dub --video / --bgm / --speed-auto | Not available | Full support |
| asr --input file.wav (16kHz mono) | Works (cloud) | Works (cloud + local) |
| asr --input file.mp3 / video | Not available | Full support |
| asr --engine local | Not available | Full support |
| translate | Full support | Full support |
| video-translate | Not available | Full support |
Install FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg
# Windows — download from https://ffmpeg.org/download.htmlOptional dependencies:
nodejs-whisper— for local Whisper ASR without cloud API (npm install -g nodejs-whisper)sox— for microphone recording (asr --mic)
License
UNLICENSED - All rights reserved.
