npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@visionengine/subtitle-generate

v1.0.2

Published

VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment

Readme

@visionengine/subtitle-generate

English 中文文档

VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment using ByteDance's speech recognition API.

Features

  • Subtitle Generation - Generate subtitles from audio/video files with speech recognition
  • Subtitle Alignment - Align existing subtitle text with audio for precise timing
  • Multiple Languages - Support for Chinese, English, Japanese, Korean, and more
  • SRT Output - Automatically save subtitles in SRT format
  • Word-level Timing - Get precise timing for each word/character
  • Speaker Detection - Optional speaker identification

Installation

As MCP Server

Add to your MCP client configuration:

{
  "mcpServers": {
    "ve-subtitle-generate": {
      "type": "local",
      "command": "npx",
      "args": ["-y", "@visionengine/subtitle-generate@latest"],
      "transport": "stdio",
      "env": {
        "APP_ID": "your_app_id",
        "ACCESS_TOKEN": "your_access_token",
        "WORKDIR": "./media"
      }
    }
  }
}

As NPM Package

npm install -g @visionengine/subtitle-generate

Configuration

Environment variables:

  • API_BASE_URL - API endpoint (default: https://openspeech.bytedance.com)
  • APP_ID - Your application ID (required)
  • ACCESS_TOKEN - Your Bearer token for authentication (required)
  • WORKDIR - Base directory for relative file paths (default: ./)

Tools

subtitle_generate

Generate subtitles from audio/video files using speech recognition.

Parameters:

  • audioPath (string, required) - Audio/video file path (relative to WORKDIR or absolute)
  • language (string, optional) - Language code: zh-CN, en-US, ja-JP, ko-KR, etc.
  • wordsPerLine (number, optional) - Maximum words per line (default: 46)
  • maxLines (number, optional) - Maximum lines per screen (default: 1)
  • useItn (boolean, optional) - Convert Chinese numbers to Arabic numerals
  • captionType (enum, optional) - 'auto', 'speech', or 'singing'
  • usePunc (boolean, optional) - Add punctuation marks
  • useDdc (boolean, optional) - Add silence annotations
  • withSpeakerInfo (boolean, optional) - Return speaker information

Supported Languages:

| Language | Code | Recommended words_per_line | |----------|------|---------------------------| | Chinese (Simplified) | zh-CN | 15 | | Cantonese | yue | 15 | | English (US) | en-US | 55 | | Japanese | ja-JP | 32 | | Korean | ko-KR | 32 | | Spanish | es-MX | 55 | | Russian | ru-RU | 55 | | French | fr-FR | 55 |

Example:

// Basic usage
await subtitle_generate({
  audioPath: "./video.mp4"
});

// With options
await subtitle_generate({
  audioPath: "./video.mp4",
  language: "zh-CN",
  wordsPerLine: 15,
  maxLines: 2,
  captionType: "speech",
  usePunc: true
});

Output:

  • SRT file saved in the same directory as the input file
  • JSON response with utterances and timing information

subtitle_align

Align existing subtitle text with audio for precise timing.

Parameters:

  • audioPath (string, required) - Audio/video file path (relative to WORKDIR or absolute)
  • subtitleText (string, required) - The subtitle text to align with the audio
  • captionType (enum, required) - 'speech' or 'singing'
  • staPuncMode (enum, optional) - Punctuation mode: '1', '2', or '3'

Punctuation Modes:

  • 1 (default) - Omit trailing punctuation from alignment results
  • 2 - Replace punctuation with spaces
  • 3 - Keep original punctuation

Example:

// Align speech subtitle
await subtitle_align({
  audioPath: "./speech.wav",
  subtitleText: "Hello, welcome to our presentation today.",
  captionType: "speech"
});

// Align song lyrics
await subtitle_align({
  audioPath: "./song.mp3",
  subtitleText: "这是一首美丽的歌曲",
  captionType: "singing",
  staPuncMode: "3"
});

Output:

  • SRT file saved with _aligned suffix
  • JSON response with word-level timing information

Usage Examples

MCP Client

Once configured as an MCP server, the tools are available through your MCP client:

> Generate subtitles for video.mp4
> Align these lyrics with song.mp3: "今天天气真好..."

Direct Usage

# Install globally
npm install -g @visionengine/subtitle-generate

# Set environment variables
export APP_ID="your_app_id"
export ACCESS_TOKEN="your_access_token"
export WORKDIR="./media"

# Run the server
ve-subtitle-generate

Output Format

The tools save subtitles in SRT format:

1
00:00:00,000 --> 00:00:03,197
如果您没有其他需要举报的话这边就先挂断了

2
00:00:03,442 --> 00:00:04,877
祝您生活愉快再见

Error Codes

| Code | Meaning | Description | |------|---------|-------------| | 0 | Success | - | | 2000 | Processing | Task is being processed | | 1001 | Invalid parameters | Missing/invalid request parameters | | 1002 | No permission | Token invalid/expired | | 1003 | Rate limited | QPS exceeded | | 1010 | Audio too long | Duration exceeded threshold | | 1012 | Invalid audio format | Audio decode failure | | 1013 | Silent audio | No speech detected |

Development

Build

pnpm build

Test

pnpm test

Local Testing

# Build first
pnpm build

# Run locally
node dist/index.js

Support

For issues and questions: