npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

voxflow

v1.8.8

Published

AI audio content creation CLI — stories, podcasts, narration, dubbing, transcription, translation, and video translation with TTS

Readme

voxflow

AI audio content creation CLI — stories, podcasts, narration, dubbing, transcription, translation, and TTS synthesis.

Quick Start

# Synthesize a single sentence
npx voxflow say "你好世界"

# Output as MP3 (smaller file size)
npx voxflow say "你好世界" --format mp3

# Generate a story with TTS narration
npx voxflow story --topic "三只小猪"

# Dub a video from SRT subtitles
npx voxflow dub --srt subtitles.srt --video input.mp4 --output dubbed.mp4

# Transcribe audio to subtitles (SRT)
npx voxflow asr --input recording.mp3

# Translate SRT subtitles to another language
npx voxflow translate --srt subtitles.srt --to en

# End-to-end video translation (ASR → translate → dub → merge)
npx voxflow video-translate --input video.mp4 --to en

# One-command build + local delivery (for Skill/agent orchestration)
npx voxflow publish --video input.mp4 --audio narration.wav --publish local

# Browse available voices
npx voxflow voices --search "温柔"

A browser window will open for login on first use. After that, your token is cached automatically.

Install

npm install -g voxflow

Commands

voxflow say <text> / voxflow synthesize <text>

Synthesize a single text snippet to audio.

voxflow say "你好世界"
voxflow say "你好世界" --format mp3
voxflow synthesize "Welcome" --voice v-male-Bk7vD3xP --format mp3
voxflow say "快速测试" --speed 1.5 --volume 0.8 --pitch 2

| Flag | Default | Description | |------|---------|-------------| | <text> | (required) | Text to synthesize (positional or --text) | | --voice <id> | v-female-R2s4N9qJ | TTS voice ID | | --format <fmt> | pcm | Output format: pcm (WAV), wav, mp3 | | --speed <n> | 1.0 | Speed 0.5-2.0 | | --volume <n> | 1.0 | Volume 0.1-2.0 | | --pitch <n> | 0 | Pitch -12 to 12 | | --output <path> | ./tts-<timestamp>.wav | Output file path |

voxflow narrate [options]

Narrate a document, text, or script to multi-segment audio.

voxflow narrate --input article.txt
voxflow narrate --input article.txt --format mp3
voxflow narrate --input readme.md --voice v-male-Bk7vD3xP
voxflow narrate --text "第一段。第二段。第三段。"
voxflow narrate --script narration-script.json
echo "Hello world" | voxflow narrate

| Flag | Default | Description | |------|---------|-------------| | --input <file> | | Input .txt or .md file | | --text <text> | | Inline text to narrate | | --script <file> | | JSON script with per-segment voice control | | --voice <id> | v-female-R2s4N9qJ | Default voice ID | | --format <fmt> | pcm | Output format: pcm (WAV), wav, mp3 | | --speed <n> | 1.0 | Speed 0.5-2.0 | | --silence <sec> | 0.8 | Silence between segments 0-5.0 | | --output <path> | ./narration-<timestamp>.wav | Output file path |

Script JSON format (per-segment voice/speed control):

{
  "segments": [
    { "text": "第一段内容", "voiceId": "v-female-R2s4N9qJ", "speed": 1.0 },
    { "text": "第二段内容", "voiceId": "v-male-Bk7vD3xP", "speed": 0.8 }
  ],
  "silence": 1.0,
  "output": "my-narration.wav"
}

voxflow voices [options]

Browse and filter available TTS voices (no login required).

voxflow voices
voxflow voices --search "温柔" --gender female
voxflow voices --language en --extended
voxflow voices --json

| Flag | Default | Description | |------|---------|-------------| | --search <query> | | Search by name, tone, style | | --gender <m\|f> | | Filter by gender | | --language <code> | | Filter by language: zh, en, etc. | | --extended | false | Include extended voice library (380+) | | --json | false | Output raw JSON |

voxflow story [options]

Generate a story with AI and synthesize TTS audio.

voxflow story --topic "小红帽的故事"
voxflow story --topic "太空探险" --paragraphs 8 --speed 0.8

| Flag | Default | Description | |------|---------|-------------| | --topic <text> | Children's story | Story prompt | | --voice <id> | v-female-R2s4N9qJ | TTS voice ID | | --output <path> | ./story-<timestamp>.wav | Output WAV file | | --paragraphs <n> | 5 | Number of paragraphs (1-20) | | --speed <n> | 1.0 | Speed (0.5-2.0) | | --silence <sec> | 0.8 | Silence between paragraphs (0-5.0) |

voxflow podcast [options]

Generate a multi-speaker podcast dialogue with AI script generation and multi-voice TTS.

# Quick start — AI generates script + synthesizes audio
voxflow podcast --topic "AI in healthcare"

# Use a template with colloquial control
voxflow podcast --topic "tech news" --template news --colloquial high --speakers 3

# English podcast
voxflow podcast --topic "AI ethics debate" --language en --template discussion

# Generate script only (no TTS), export as JSON
voxflow podcast --topic "量子计算入门" --format json --no-tts

# Synthesize from a previously exported .podcast.json
voxflow podcast --input my-podcast.podcast.json --output final.wav

# Legacy engine (lower quota cost)
voxflow podcast --topic "AI趋势" --engine legacy --exchanges 10

| Flag | Default | Description | |------|---------|-------------| | --topic <text> | tech trends | Podcast topic or prompt | | --engine <type> | auto (→ ai-sdk) | auto, legacy, or ai-sdk | | --template <name> | interview | interview, discussion, news, story, tutorial | | --colloquial <lvl> | medium | Conversational tone: low, medium, high | | --speakers <n> | 2 | Speaker count: 1, 2, or 3 | | --language <code> | zh-CN | zh-CN, en, ja | | --format json | — | Also output .podcast.json alongside audio | | --input <file> | — | Load .podcast.json for synthesis (skip LLM) | | --no-tts | false | Generate script only, skip TTS synthesis | | --length <len> | medium | short, medium, long | | --exchanges <n> | 8 | Number of exchanges, 2-30 (legacy engine) | | --style <style> | — | Legacy: dialogue style (maps to --template) | | --voice <id> | — | Override TTS voice for all speakers | | --bgm <file> | — | Background music file to mix in | | --ducking <n> | 0.2 | BGM volume ducking (0-1.0) | | --output <path> | ./podcast-<ts>.wav | Output file path | | --speed <n> | 1.0 | TTS speed (0.5-2.0) | | --silence <sec> | 0.5 | Silence between segments (0-5.0) |

Two-step workflow (recommended for editing):

  1. voxflow podcast --topic "..." --format json --no-tts → generates .podcast.json
  2. Edit the JSON (speakers, dialogue, voice mapping)
  3. voxflow podcast --input edited.podcast.json → synthesizes audio

voxflow dub [options]

Dub audio from SRT subtitles with timeline-precise TTS synthesis. Supports multi-speaker voice mapping, dynamic speed compensation, video merge, and background music mixing.

# Basic: generate dubbed audio from SRT
voxflow dub --srt subtitles.srt

# Dub and merge into video
voxflow dub --srt subtitles.srt --video input.mp4 --output dubbed.mp4

# Multi-speaker with voice mapping
voxflow dub --srt subtitles.srt --voices speakers.json --speed-auto

# Add background music with ducking
voxflow dub --srt subtitles.srt --bgm music.mp3 --ducking 0.3

# Patch a single caption without full rebuild
voxflow dub --srt subtitles.srt --patch 5 --output dub-existing.wav

| Flag | Default | Description | |------|---------|-------------| | --srt <file> | (required) | SRT subtitle file | | --video <file> | | Video file — merge dubbed audio into video | | --voice <id> | v-female-R2s4N9qJ | Default TTS voice ID | | --voices <file> | | JSON speaker-to-voiceId map for multi-speaker dubbing | | --speed <n> | 1.0 | TTS speed 0.5-2.0 | | --speed-auto | false | Auto-adjust speed when audio overflows timeslot | | --bgm <file> | | Background music file to mix in | | --ducking <n> | 0.2 | BGM volume ducking 0-1.0 (lower = quieter BGM) | | --patch <id> | | Re-synthesize a single caption by ID (patch mode) | | --output <path> | ./dub-<timestamp>.wav | Output file path (.wav or .mp4 with --video) |

SRT format with speaker tags (optional [Speaker: xxx] extension):

1
00:00:01,000 --> 00:00:03,500
[Speaker: Alice]
Hello, welcome to the show!

2
00:00:04,000 --> 00:00:06,500
[Speaker: Bob]
Thanks for having me.

Voice mapping JSON (speakers.json):

{
  "Alice": "v-female-R2s4N9qJ",
  "Bob": "v-male-Bk7vD3xP"
}

Requires ffmpeg in PATH for --video, --bgm, and --speed-auto features.

voxflow asr [options] / voxflow transcribe [options]

Transcribe audio or video files to text. Supports cloud ASR (Tencent Cloud, 3 modes) and local Whisper (offline, no quota).

# Transcribe with auto engine detection (local Whisper if available, else cloud)
voxflow asr --input recording.mp3

# Force local Whisper (no login needed, no quota used)
voxflow asr --input recording.mp3 --engine local

# Use a larger Whisper model for better accuracy
voxflow asr --input meeting.wav --engine local --model small

# Cloud ASR with speaker diarization
voxflow asr --input meeting.wav --engine cloud --speakers --speaker-number 3

# Transcribe video file, output plain text
voxflow asr --input video.mp4 --format txt

# Remote URL (cloud only)
voxflow asr --url https://example.com/audio.wav --mode flash

# Record from microphone (cloud only)
voxflow asr --mic --format txt

| Flag | Default | Description | |------|---------|-------------| | --input <file> | | Local audio or video file | | --url <url> | | Remote audio URL (cloud only) | | --mic | | Record from microphone (cloud only, requires sox) | | --engine <type> | auto | Engine: auto, local, cloud | | --model <name> | base | Whisper model: tiny, base, small, medium, large | | --mode <type> | auto | Cloud mode: auto, sentence, flash, file | | --lang <model> | 16k_zh | Language: 16k_zh, 16k_en, 16k_zh_en, 16k_ja, 16k_ko | | --format <fmt> | srt | Output: srt, txt, json | | --output <path> | <input>.<format> | Output file path | | --speakers | false | Enable speaker diarization (cloud only) | | --speaker-number <n> | | Expected speakers (with --speakers) | | --task-id <id> | | Resume async task polling (cloud only) |

Engine selection:

  • auto — Uses local Whisper if nodejs-whisper is installed, otherwise falls back to cloud
  • local — Local Whisper via whisper.cpp (no login, no quota, offline capable)
  • cloud — Tencent Cloud ASR (requires login, uses quota)

Local Whisper setup (optional):

npm install -g nodejs-whisper
# Model downloads automatically on first use (~142 MB for base)

Requires ffmpeg in PATH for audio extraction from video files.

voxflow translate [options]

Translate SRT subtitles, plain text, or text files using LLM-powered batch translation.

# Translate SRT file (Chinese → English)
voxflow translate --srt subtitles.srt --to en

# Translate with timing realignment for target language
voxflow translate --srt subtitles.srt --to en --realign

# Translate a text file
voxflow translate --input article.txt --to ja --output article-ja.txt

# Translate inline text
voxflow translate --text "你好世界" --to en

# Auto-detect source language
voxflow translate --srt movie.srt --to ko

| Flag | Default | Description | |------|---------|-------------| | --srt <file> | | SRT subtitle file to translate | | --input <file> | | Plain text / markdown file to translate | | --text <string> | | Inline text to translate | | --from <lang> | auto-detect | Source language: zh, en, ja, ko, fr, de, es, etc. | | --to <lang> | (required) | Target language code | | --realign | false | Adjust subtitle timing for target language length differences | | --batch-size <n> | 10 | Captions per translation batch (1-20) | | --output <path> | <input>-<lang>.srt | Output file path |

Supported languages: zh, en, ja, ko, fr, de, es, pt, ru, ar, th, vi, id, and more.

Cost: 1 quota per batch (~10 captions). A 100-caption SRT costs ~10 quota.

voxflow video-translate [options]

End-to-end video translation: extracts audio, transcribes, translates subtitles, dubs with TTS, and merges back into video.

# Translate Chinese video to English
voxflow video-translate --input video.mp4 --to en

# Specify source language
voxflow video-translate --input video.mp4 --from zh --to ja

# Keep intermediate files (SRT, audio) for debugging
voxflow video-translate --input video.mp4 --to en --keep-intermediates

# Custom voice and speed
voxflow video-translate --input video.mp4 --to en --voice v-male-Bk7vD3xP --speed 0.9

| Flag | Default | Description | |------|---------|-------------| | --input <file> | (required) | Input video file | | --from <lang> | auto-detect | Source language code | | --to <lang> | (required) | Target language code | | --voice <id> | v-female-R2s4N9qJ | TTS voice ID for dubbing | | --voices <file> | | Voice mapping JSON for multi-speaker | | --realign | false | Adjust subtitle timing for target language | | --speed <n> | 1.0 | TTS speed (0.5-2.0) | | --batch-size <n> | 10 | Translation batch size | | --keep-intermediates | false | Keep temp files (SRT, audio) | | --output <path> | <input>-<lang>.mp4 | Output MP4 path | | --asr-mode <mode> | auto | Override ASR mode: auto, sentence, flash, file | | --asr-lang <engine> | auto | Override ASR engine: 16k_zh, 16k_en, 16k_ja, 16k_ko |

Pipeline: Video → FFmpeg extract audio → ASR transcribe → LLM translate → TTS dub → FFmpeg merge → Output MP4

Cost: ~3-N quota (1 ASR + 1+ translate batches + 1 per TTS caption)

Requires ffmpeg in PATH.

voxflow publish [options]

Single command for final deliverables. Designed for agent skills and automation orchestration:

  • Build final MP4 (translate+dub / dub / merge)
  • Deliver to local directory or via webhook
  • Return structured JSON output for downstream processing

Note: --platform is a metadata tag only — it does NOT upload to any platform. Use --publish webhook to integrate with your own distribution service.

# Mode A: video-translate + local delivery
voxflow publish --input video.mp4 --to en --publish local

# Mode B: dub existing subtitles into video
voxflow publish --srt subtitles.srt --video input.mp4 --publish local

# Mode C: merge existing audio into video
voxflow publish --video input.mp4 --audio narration.mp3 --publish local

# Deliver via webhook (e.g. custom distribution service)
voxflow publish --input video.mp4 --to ja \
  --publish webhook \
  --publish-webhook https://publisher.example.com/hook \
  --json

| Flag | Default | Description | |------|---------|-------------| | --input <video> | | Mode A: source video for translate+dub (requires --to) | | --to <lang> | | Target language for Mode A | | --from <lang> | auto | Source language for Mode A | | --srt <file> | | Mode B: SRT subtitle file (requires --video) | | --video <file> | | Mode B/Mode C video file | | --audio <file> | | Mode C: external narration audio | | --voice <id> | v-female-R2s4N9qJ | TTS voice for Mode A/B | | --voices <file> | | Multi-speaker voice mapping JSON | | --output <path> | auto | Final MP4 output path | | --publish <target> | local | local | webhook | none | | --publish-dir <dir> | ./published | Local publish directory | | --publish-webhook <url> | | Webhook URL for distribution service | | --platform <name> | generic | Platform metadata tag (not an actual upload target) | | --title <text> | filename | Title metadata | | --json | false | Print machine-readable JSON result |

voxflow login / logout / status / dashboard

voxflow login       # Open browser to login via email OTP
voxflow logout      # Clear cached token
voxflow status      # Show login status and token info
voxflow dashboard   # Open Web dashboard in browser

voxflow add <name> (experimental — Day-1 MVP)

Pull a curated flow / voice recipe / CLI preset from the official registry into your current project.

voxflow add --list                     # Browse all items in the registry
voxflow add dub-anime-jp-zh            # Pull a preset (resolves to voxflow/dub-anime-jp-zh)
voxflow add chico/my-recipe            # Explicit author for community items
voxflow add foo --force                # Overwrite existing local files
voxflow add foo --registry <url>       # Use a different registry (e.g. enterprise private)

After add, files land under presets/<name>/ (preset), recipes/<name>/ (voice-recipe), or flows/<name>/ (flow). The CLI prints a Try it: hint with the exact command to use the just-installed item.

Day-1 MVP scope: no dependsOn cascading, no ETag cache, no private-registry token. Coming in Phase 2 — see docs/product/cli-registry.md.

Authentication

voxflow uses browser-based email OTP login (Supabase):

  1. CLI starts a temporary local HTTP server
  2. Opens your browser to the login page
  3. You enter your email and verification code
  4. Browser redirects back to the CLI with your token
  5. Token is cached at ~/.config/voxflow/token.json

Quota

  • Free tier: 10,000 quota per month (1 basic TTS = 100 quota)
  • say/synthesize: 100 quota per call
  • narrate: 100 quota per segment
  • story: ~600-800 quota (1 LLM + N TTS)
  • podcast (ai-sdk): ~5,000-10,000 quota (script) + 100/segment (TTS)
  • podcast (legacy): ~200 quota (script) + 100/segment (TTS)
  • dub: 100 quota per SRT caption
  • asr (cloud): 100 quota per recognition
  • asr (local): free (no quota)
  • translate: 100 quota per batch (~10 captions)
  • video-translate: ~300-N quota (ASR + translate + TTS)
  • voices: free (no quota)
  • Quota resets monthly

Requirements

  • Node.js >= 18.0.0
  • ffmpeg recommended — needed by most audio/video features:

| Command | Without FFmpeg | With FFmpeg | |---------|---------------|-------------| | say / synthesize | Full support | Full support | | narrate | Full support | Full support | | story / podcast | Full support | Full support | | voices | Full support | Full support | | dub --srt file.srt | Audio output only | Audio output only | | dub --video / --bgm / --speed-auto | Not available | Full support | | asr --input file.wav (16kHz mono) | Works (cloud) | Works (cloud + local) | | asr --input file.mp3 / video | Not available | Full support | | asr --engine local | Not available | Full support | | translate | Full support | Full support | | video-translate | Not available | Full support |

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows — download from https://ffmpeg.org/download.html

Optional dependencies:

  • nodejs-whisper — for local Whisper ASR without cloud API (npm install -g nodejs-whisper)
  • sox — for microphone recording (asr --mic)

Claude Code / AI Agent Integration

The voxflow CLI is designed to be called by AI agents (Claude Code, Cursor, etc.) as the unified execution layer. No API keys or Python scripts needed — all auth goes through voxflow login (JWT).

Skill documentation: See cli/skills/podcast/SKILL.md for the full podcast skill reference.

Plugin install (Claude Code / Cursor / Codex)

The CLI ships agent plugin manifests so it can be installed as a first-class plugin in any major AI coding environment. Each manifest points to the shared cli/skills/ directory.

# Claude Code — local try (plugin root is cli/, manifest at cli/.claude-plugin/plugin.json)
claude --plugin-dir cli

# Verify the manifest is valid:
claude plugin tag cli --dry-run

# Codex — sparse install from GitHub (plugin metadata + skills only)
codex plugin marketplace add VoxFlowStudio/FlowStudio \
  --sparse cli/.codex-plugin --sparse cli/skills

# Cursor — sideload from a cloned repo (Settings → Plugins → Load unpacked → cli/)

Note: plugin install only ships the agent-side manifest and skill docs. To actually run voxflow commands, install the CLI separately: npm install -g voxflow.

Manifests live at cli/.claude-plugin/plugin.json, cli/.cursor-plugin/plugin.json, cli/.codex-plugin/plugin.json. Claude Code discovers skills/ relative to the plugin root (cli/), so the Claude manifest omits the skills field and relies on the folder-name convention.

Typical agent workflow:

# 1. Login (one-time)
voxflow login

# 2. Generate script only
voxflow podcast --topic "Your topic" --format json --no-tts

# 3. Agent edits the .podcast.json as needed

# 4. Synthesize from edited script
voxflow podcast --input edited.podcast.json --output final.wav

CI/non-interactive environments: Set VOXFLOW_TOKEN env var to skip browser login.

License

UNLICENSED - All rights reserved.