@visionengine/subtitle-generate

v1.0.2

Published

21 hours ago

VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment

0High
0Medium
0Low

yc.ma

mcp subtitle caption speech-to-text video-caption audio-recognition visionengine

@visionengine/subtitle-generate

VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment using ByteDance's speech recognition API.

Features

Subtitle Generation - Generate subtitles from audio/video files with speech recognition
Subtitle Alignment - Align existing subtitle text with audio for precise timing
Multiple Languages - Support for Chinese, English, Japanese, Korean, and more
SRT Output - Automatically save subtitles in SRT format
Word-level Timing - Get precise timing for each word/character
Speaker Detection - Optional speaker identification

Installation

As MCP Server

Add to your MCP client configuration:

{
  "mcpServers": {
    "ve-subtitle-generate": {
      "type": "local",
      "command": "npx",
      "args": ["-y", "@visionengine/subtitle-generate@latest"],
      "transport": "stdio",
      "env": {
        "APP_ID": "your_app_id",
        "ACCESS_TOKEN": "your_access_token",
        "WORKDIR": "./media"
      }
    }
  }
}

As NPM Package

npm install -g @visionengine/subtitle-generate

Configuration

Environment variables:

API_BASE_URL - API endpoint (default: https://openspeech.bytedance.com)
APP_ID - Your application ID (required)
ACCESS_TOKEN - Your Bearer token for authentication (required)
WORKDIR - Base directory for relative file paths (default: ./)

Tools

subtitle_generate

Generate subtitles from audio/video files using speech recognition.

Parameters:

audioPath (string, required) - Audio/video file path (relative to WORKDIR or absolute)
language (string, optional) - Language code: zh-CN, en-US, ja-JP, ko-KR, etc.
wordsPerLine (number, optional) - Maximum words per line (default: 46)
maxLines (number, optional) - Maximum lines per screen (default: 1)
useItn (boolean, optional) - Convert Chinese numbers to Arabic numerals
captionType (enum, optional) - 'auto', 'speech', or 'singing'
usePunc (boolean, optional) - Add punctuation marks
useDdc (boolean, optional) - Add silence annotations
withSpeakerInfo (boolean, optional) - Return speaker information

Supported Languages:

| Language | Code | Recommended words_per_line | |----------|------|---------------------------| | Chinese (Simplified) | zh-CN | 15 | | Cantonese | yue | 15 | | English (US) | en-US | 55 | | Japanese | ja-JP | 32 | | Korean | ko-KR | 32 | | Spanish | es-MX | 55 | | Russian | ru-RU | 55 | | French | fr-FR | 55 |

Example:

// Basic usage
await subtitle_generate({
  audioPath: "./video.mp4"
});

// With options
await subtitle_generate({
  audioPath: "./video.mp4",
  language: "zh-CN",
  wordsPerLine: 15,
  maxLines: 2,
  captionType: "speech",
  usePunc: true
});

Output:

SRT file saved in the same directory as the input file
JSON response with utterances and timing information

subtitle_align

Align existing subtitle text with audio for precise timing.

Parameters:

audioPath (string, required) - Audio/video file path (relative to WORKDIR or absolute)
subtitleText (string, required) - The subtitle text to align with the audio
captionType (enum, required) - 'speech' or 'singing'
staPuncMode (enum, optional) - Punctuation mode: '1', '2', or '3'

Punctuation Modes:

1 (default) - Omit trailing punctuation from alignment results
2 - Replace punctuation with spaces
3 - Keep original punctuation

Example:

// Align speech subtitle
await subtitle_align({
  audioPath: "./speech.wav",
  subtitleText: "Hello, welcome to our presentation today.",
  captionType: "speech"
});

// Align song lyrics
await subtitle_align({
  audioPath: "./song.mp3",
  subtitleText: "这是一首美丽的歌曲",
  captionType: "singing",
  staPuncMode: "3"
});

Output:

SRT file saved with _aligned suffix
JSON response with word-level timing information

Usage Examples

MCP Client

Once configured as an MCP server, the tools are available through your MCP client:

> Generate subtitles for video.mp4
> Align these lyrics with song.mp3: "今天天气真好..."

Direct Usage

# Install globally
npm install -g @visionengine/subtitle-generate

# Set environment variables
export APP_ID="your_app_id"
export ACCESS_TOKEN="your_access_token"
export WORKDIR="./media"

# Run the server
ve-subtitle-generate

Output Format

The tools save subtitles in SRT format:

1
00:00:00,000 --> 00:00:03,197
如果您没有其他需要举报的话这边就先挂断了

2
00:00:03,442 --> 00:00:04,877
祝您生活愉快再见

Error Codes

| Code | Meaning | Description | |------|---------|-------------| | 0 | Success | - | | 2000 | Processing | Task is being processed | | 1001 | Invalid parameters | Missing/invalid request parameters | | 1002 | No permission | Token invalid/expired | | 1003 | Rate limited | QPS exceeded | | 1010 | Audio too long | Duration exceeded threshold | | 1012 | Invalid audio format | Audio decode failure | | 1013 | Silent audio | No speech detected |

Development

Build

pnpm build

Test

pnpm test

Local Testing

# Build first
pnpm build

# Run locally
node dist/index.js

Support

For issues and questions:

Email: [email protected]
Website: https://visionengine-tech.com

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@visionengine/subtitle-generate

Features

Installation

As MCP Server

As NPM Package

Configuration

Tools

subtitle_generate

subtitle_align

Usage Examples

MCP Client

Direct Usage

Output Format

Error Codes

Development

Build

Test

Local Testing

Support