npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

voxflow

v1.5.1

Published

AI audio content creation CLI — stories, podcasts, narration, dubbing, transcription, translation, and video translation with TTS

Readme

ai-tts

AI audio content creation CLI — stories, podcasts, narration, dubbing, transcription, translation, and TTS synthesis.

Quick Start

# Synthesize a single sentence
npx ai-tts say "你好世界"

# Output as MP3 (smaller file size)
npx ai-tts say "你好世界" --format mp3

# Generate a story with TTS narration
npx ai-tts story --topic "三只小猪"

# Dub a video from SRT subtitles
npx ai-tts dub --srt subtitles.srt --video input.mp4 --output dubbed.mp4

# Transcribe audio to subtitles (SRT)
npx ai-tts asr --input recording.mp3

# Translate SRT subtitles to another language
npx ai-tts translate --srt subtitles.srt --to en

# End-to-end video translation (ASR → translate → dub → merge)
npx ai-tts video-translate --input video.mp4 --to en

# Browse available voices
npx ai-tts voices --search "温柔"

A browser window will open for login on first use. After that, your token is cached automatically.

Install

npm install -g ai-tts

Commands

ai-tts say <text> / ai-tts synthesize <text>

Synthesize a single text snippet to audio.

ai-tts say "你好世界"
ai-tts say "你好世界" --format mp3
ai-tts synthesize "Welcome" --voice v-male-Bk7vD3xP --format mp3
ai-tts say "快速测试" --speed 1.5 --volume 0.8 --pitch 2

| Flag | Default | Description | |------|---------|-------------| | <text> | (required) | Text to synthesize (positional or --text) | | --voice <id> | v-female-R2s4N9qJ | TTS voice ID | | --format <fmt> | pcm | Output format: pcm (WAV), wav, mp3 | | --speed <n> | 1.0 | Speed 0.5-2.0 | | --volume <n> | 1.0 | Volume 0.1-2.0 | | --pitch <n> | 0 | Pitch -12 to 12 | | --output <path> | ./tts-<timestamp>.wav | Output file path |

ai-tts narrate [options]

Narrate a document, text, or script to multi-segment audio.

ai-tts narrate --input article.txt
ai-tts narrate --input article.txt --format mp3
ai-tts narrate --input readme.md --voice v-male-Bk7vD3xP
ai-tts narrate --text "第一段。第二段。第三段。"
ai-tts narrate --script narration-script.json
echo "Hello world" | ai-tts narrate

| Flag | Default | Description | |------|---------|-------------| | --input <file> | | Input .txt or .md file | | --text <text> | | Inline text to narrate | | --script <file> | | JSON script with per-segment voice control | | --voice <id> | v-female-R2s4N9qJ | Default voice ID | | --format <fmt> | pcm | Output format: pcm (WAV), wav, mp3 | | --speed <n> | 1.0 | Speed 0.5-2.0 | | --silence <sec> | 0.8 | Silence between segments 0-5.0 | | --output <path> | ./narration-<timestamp>.wav | Output file path |

Script JSON format (per-segment voice/speed control):

{
  "segments": [
    { "text": "第一段内容", "voiceId": "v-female-R2s4N9qJ", "speed": 1.0 },
    { "text": "第二段内容", "voiceId": "v-male-Bk7vD3xP", "speed": 0.8 }
  ],
  "silence": 1.0,
  "output": "my-narration.wav"
}

ai-tts voices [options]

Browse and filter available TTS voices (no login required).

ai-tts voices
ai-tts voices --search "温柔" --gender female
ai-tts voices --language en --extended
ai-tts voices --json

| Flag | Default | Description | |------|---------|-------------| | --search <query> | | Search by name, tone, style | | --gender <m\|f> | | Filter by gender | | --language <code> | | Filter by language: zh, en, etc. | | --extended | false | Include extended voice library (380+) | | --json | false | Output raw JSON |

ai-tts story [options]

Generate a story with AI and synthesize TTS audio.

ai-tts story --topic "小红帽的故事"
ai-tts story --topic "太空探险" --paragraphs 8 --speed 0.8

| Flag | Default | Description | |------|---------|-------------| | --topic <text> | Children's story | Story prompt | | --voice <id> | v-female-R2s4N9qJ | TTS voice ID | | --output <path> | ./story-<timestamp>.wav | Output WAV file | | --paragraphs <n> | 5 | Number of paragraphs (1-20) | | --speed <n> | 1.0 | Speed (0.5-2.0) | | --silence <sec> | 0.8 | Silence between paragraphs (0-5.0) |

ai-tts podcast [options]

Generate a multi-speaker podcast dialogue.

ai-tts podcast --topic "AI趋势" --exchanges 10
ai-tts podcast --topic "科技新闻" --style casual --length long

| Flag | Default | Description | |------|---------|-------------| | --topic <text> | Tech trends | Podcast topic | | --style <style> | professional | Dialogue style | | --length <len> | medium | short / medium / long | | --exchanges <n> | 8 | Number of exchanges (2-30) | | --output <path> | ./podcast-<timestamp>.wav | Output WAV file | | --speed <n> | 1.0 | Speed (0.5-2.0) | | --silence <sec> | 0.5 | Silence between segments (0-5.0) |

ai-tts dub [options]

Dub audio from SRT subtitles with timeline-precise TTS synthesis. Supports multi-speaker voice mapping, dynamic speed compensation, video merge, and background music mixing.

# Basic: generate dubbed audio from SRT
ai-tts dub --srt subtitles.srt

# Dub and merge into video
ai-tts dub --srt subtitles.srt --video input.mp4 --output dubbed.mp4

# Multi-speaker with voice mapping
ai-tts dub --srt subtitles.srt --voices speakers.json --speed-auto

# Add background music with ducking
ai-tts dub --srt subtitles.srt --bgm music.mp3 --ducking 0.3

# Patch a single caption without full rebuild
ai-tts dub --srt subtitles.srt --patch 5 --output dub-existing.wav

| Flag | Default | Description | |------|---------|-------------| | --srt <file> | (required) | SRT subtitle file | | --video <file> | | Video file — merge dubbed audio into video | | --voice <id> | v-female-R2s4N9qJ | Default TTS voice ID | | --voices <file> | | JSON speaker-to-voiceId map for multi-speaker dubbing | | --speed <n> | 1.0 | TTS speed 0.5-2.0 | | --speed-auto | false | Auto-adjust speed when audio overflows timeslot | | --bgm <file> | | Background music file to mix in | | --ducking <n> | 0.5 | BGM volume ducking 0-1.0 (lower = quieter BGM) | | --patch <id> | | Re-synthesize a single caption by ID (patch mode) | | --output <path> | ./dub-<timestamp>.wav | Output file path (.wav or .mp4 with --video) |

SRT format with speaker tags (optional [Speaker: xxx] extension):

1
00:00:01,000 --> 00:00:03,500
[Speaker: Alice]
Hello, welcome to the show!

2
00:00:04,000 --> 00:00:06,500
[Speaker: Bob]
Thanks for having me.

Voice mapping JSON (speakers.json):

{
  "Alice": "v-female-R2s4N9qJ",
  "Bob": "v-male-Bk7vD3xP"
}

Requires ffmpeg in PATH for --video, --bgm, and --speed-auto features.

ai-tts asr [options] / ai-tts transcribe [options]

Transcribe audio or video files to text. Supports cloud ASR (Tencent Cloud, 3 modes) and local Whisper (offline, no quota).

# Transcribe with auto engine detection (local Whisper if available, else cloud)
ai-tts asr --input recording.mp3

# Force local Whisper (no login needed, no quota used)
ai-tts asr --input recording.mp3 --engine local

# Use a larger Whisper model for better accuracy
ai-tts asr --input meeting.wav --engine local --model small

# Cloud ASR with speaker diarization
ai-tts asr --input meeting.wav --engine cloud --speakers --speaker-number 3

# Transcribe video file, output plain text
ai-tts asr --input video.mp4 --format txt

# Remote URL (cloud only)
ai-tts asr --url https://example.com/audio.wav --mode flash

# Record from microphone (cloud only)
ai-tts asr --mic --format txt

| Flag | Default | Description | |------|---------|-------------| | --input <file> | | Local audio or video file | | --url <url> | | Remote audio URL (cloud only) | | --mic | | Record from microphone (cloud only, requires sox) | | --engine <type> | auto | Engine: auto, local, cloud | | --model <name> | base | Whisper model: tiny, base, small, medium, large | | --mode <type> | auto | Cloud mode: auto, sentence, flash, file | | --lang <model> | 16k_zh | Language: 16k_zh, 16k_en, 16k_zh_en, 16k_ja, 16k_ko | | --format <fmt> | srt | Output: srt, txt, json | | --output <path> | <input>.<format> | Output file path | | --speakers | false | Enable speaker diarization (cloud only) | | --speaker-number <n> | | Expected speakers (with --speakers) | | --task-id <id> | | Resume async task polling (cloud only) |

Engine selection:

  • auto — Uses local Whisper if nodejs-whisper is installed, otherwise falls back to cloud
  • local — Local Whisper via whisper.cpp (no login, no quota, offline capable)
  • cloud — Tencent Cloud ASR (requires login, uses quota)

Local Whisper setup (optional):

npm install -g nodejs-whisper
# Model downloads automatically on first use (~142 MB for base)

Requires ffmpeg in PATH for audio extraction from video files.

ai-tts translate [options]

Translate SRT subtitles, plain text, or text files using LLM-powered batch translation.

# Translate SRT file (Chinese → English)
ai-tts translate --srt subtitles.srt --to en

# Translate with timing realignment for target language
ai-tts translate --srt subtitles.srt --to en --realign

# Translate a text file
ai-tts translate --input article.txt --to ja --output article-ja.txt

# Translate inline text
ai-tts translate --text "你好世界" --to en

# Auto-detect source language
ai-tts translate --srt movie.srt --to ko

| Flag | Default | Description | |------|---------|-------------| | --srt <file> | | SRT subtitle file to translate | | --input <file> | | Plain text / markdown file to translate | | --text <string> | | Inline text to translate | | --from <lang> | auto-detect | Source language: zh, en, ja, ko, fr, de, es, etc. | | --to <lang> | (required) | Target language code | | --realign | false | Adjust subtitle timing for target language length differences | | --batch-size <n> | 10 | Captions per translation batch (1-20) | | --output <path> | <input>-<lang>.srt | Output file path |

Supported languages: zh, en, ja, ko, fr, de, es, pt, ru, ar, th, vi, id, and more.

Cost: 1 quota per batch (~10 captions). A 100-caption SRT costs ~10 quota.

ai-tts video-translate [options]

End-to-end video translation: extracts audio, transcribes, translates subtitles, dubs with TTS, and merges back into video.

# Translate Chinese video to English
ai-tts video-translate --input video.mp4 --to en

# Specify source language
ai-tts video-translate --input video.mp4 --from zh --to ja

# Keep intermediate files (SRT, audio) for debugging
ai-tts video-translate --input video.mp4 --to en --keep-intermediates

# Custom voice and speed
ai-tts video-translate --input video.mp4 --to en --voice v-male-Bk7vD3xP --speed 0.9

| Flag | Default | Description | |------|---------|-------------| | --input <file> | (required) | Input video file | | --from <lang> | auto-detect | Source language code | | --to <lang> | (required) | Target language code | | --voice <id> | v-female-R2s4N9qJ | TTS voice ID for dubbing | | --voices <file> | | Voice mapping JSON for multi-speaker | | --realign | false | Adjust subtitle timing for target language | | --speed <n> | 1.0 | TTS speed (0.5-2.0) | | --batch-size <n> | 10 | Translation batch size | | --keep-intermediates | false | Keep temp files (SRT, audio) | | --output <path> | <input>-<lang>.mp4 | Output MP4 path | | --asr-mode <mode> | auto | Override ASR mode: auto, sentence, flash, file | | --asr-lang <engine> | auto | Override ASR engine: 16k_zh, 16k_en, 16k_ja, 16k_ko |

Pipeline: Video → FFmpeg extract audio → ASR transcribe → LLM translate → TTS dub → FFmpeg merge → Output MP4

Cost: ~3-N quota (1 ASR + 1+ translate batches + 1 per TTS caption)

Requires ffmpeg in PATH.

ai-tts login / logout / status / dashboard

ai-tts login       # Open browser to login via email OTP
ai-tts logout      # Clear cached token
ai-tts status      # Show login status and token info
ai-tts dashboard   # Open Web dashboard in browser

Authentication

AI-TTS uses browser-based email OTP login (Supabase):

  1. CLI starts a temporary local HTTP server
  2. Opens your browser to the login page
  3. You enter your email and verification code
  4. Browser redirects back to the CLI with your token
  5. Token is cached at ~/.config/ai-tts/token.json

Quota

  • Free tier: 100 quota per day
  • say/synthesize: 1 quota per call
  • narrate: 1 quota per segment
  • story: ~6-8 quota (1 LLM + N TTS)
  • podcast: ~10-20 quota
  • dub: 1 quota per SRT caption
  • asr (cloud): 1 quota per recognition
  • asr (local): free (no quota)
  • translate: 1 quota per batch (~10 captions)
  • video-translate: ~3-N quota (ASR + translate + TTS)
  • voices: free (no quota)
  • Quota resets daily

Requirements

  • Node.js >= 18.0.0
  • ffmpeg recommended — needed by most audio/video features:

| Command | Without FFmpeg | With FFmpeg | |---------|---------------|-------------| | say / synthesize | Full support | Full support | | narrate | Full support | Full support | | story / podcast | Full support | Full support | | voices | Full support | Full support | | dub --srt file.srt | Audio output only | Audio output only | | dub --video / --bgm / --speed-auto | Not available | Full support | | asr --input file.wav (16kHz mono) | Works (cloud) | Works (cloud + local) | | asr --input file.mp3 / video | Not available | Full support | | asr --engine local | Not available | Full support | | translate | Full support | Full support | | video-translate | Not available | Full support |

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows — download from https://ffmpeg.org/download.html

Optional dependencies:

  • nodejs-whisper — for local Whisper ASR without cloud API (npm install -g nodejs-whisper)
  • sox — for microphone recording (asr --mic)

License

UNLICENSED - All rights reserved.