@visionengine/subtitle-generate
v1.0.2
Published
VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment
Maintainers
Readme
@visionengine/subtitle-generate
VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment using ByteDance's speech recognition API.
Features
- Subtitle Generation - Generate subtitles from audio/video files with speech recognition
- Subtitle Alignment - Align existing subtitle text with audio for precise timing
- Multiple Languages - Support for Chinese, English, Japanese, Korean, and more
- SRT Output - Automatically save subtitles in SRT format
- Word-level Timing - Get precise timing for each word/character
- Speaker Detection - Optional speaker identification
Installation
As MCP Server
Add to your MCP client configuration:
{
"mcpServers": {
"ve-subtitle-generate": {
"type": "local",
"command": "npx",
"args": ["-y", "@visionengine/subtitle-generate@latest"],
"transport": "stdio",
"env": {
"APP_ID": "your_app_id",
"ACCESS_TOKEN": "your_access_token",
"WORKDIR": "./media"
}
}
}
}As NPM Package
npm install -g @visionengine/subtitle-generateConfiguration
Environment variables:
API_BASE_URL- API endpoint (default: https://openspeech.bytedance.com)APP_ID- Your application ID (required)ACCESS_TOKEN- Your Bearer token for authentication (required)WORKDIR- Base directory for relative file paths (default: ./)
Tools
subtitle_generate
Generate subtitles from audio/video files using speech recognition.
Parameters:
audioPath(string, required) - Audio/video file path (relative to WORKDIR or absolute)language(string, optional) - Language code: zh-CN, en-US, ja-JP, ko-KR, etc.wordsPerLine(number, optional) - Maximum words per line (default: 46)maxLines(number, optional) - Maximum lines per screen (default: 1)useItn(boolean, optional) - Convert Chinese numbers to Arabic numeralscaptionType(enum, optional) - 'auto', 'speech', or 'singing'usePunc(boolean, optional) - Add punctuation marksuseDdc(boolean, optional) - Add silence annotationswithSpeakerInfo(boolean, optional) - Return speaker information
Supported Languages:
| Language | Code | Recommended words_per_line | |----------|------|---------------------------| | Chinese (Simplified) | zh-CN | 15 | | Cantonese | yue | 15 | | English (US) | en-US | 55 | | Japanese | ja-JP | 32 | | Korean | ko-KR | 32 | | Spanish | es-MX | 55 | | Russian | ru-RU | 55 | | French | fr-FR | 55 |
Example:
// Basic usage
await subtitle_generate({
audioPath: "./video.mp4"
});
// With options
await subtitle_generate({
audioPath: "./video.mp4",
language: "zh-CN",
wordsPerLine: 15,
maxLines: 2,
captionType: "speech",
usePunc: true
});Output:
- SRT file saved in the same directory as the input file
- JSON response with utterances and timing information
subtitle_align
Align existing subtitle text with audio for precise timing.
Parameters:
audioPath(string, required) - Audio/video file path (relative to WORKDIR or absolute)subtitleText(string, required) - The subtitle text to align with the audiocaptionType(enum, required) - 'speech' or 'singing'staPuncMode(enum, optional) - Punctuation mode: '1', '2', or '3'
Punctuation Modes:
1(default) - Omit trailing punctuation from alignment results2- Replace punctuation with spaces3- Keep original punctuation
Example:
// Align speech subtitle
await subtitle_align({
audioPath: "./speech.wav",
subtitleText: "Hello, welcome to our presentation today.",
captionType: "speech"
});
// Align song lyrics
await subtitle_align({
audioPath: "./song.mp3",
subtitleText: "这是一首美丽的歌曲",
captionType: "singing",
staPuncMode: "3"
});Output:
- SRT file saved with
_alignedsuffix - JSON response with word-level timing information
Usage Examples
MCP Client
Once configured as an MCP server, the tools are available through your MCP client:
> Generate subtitles for video.mp4
> Align these lyrics with song.mp3: "今天天气真好..."Direct Usage
# Install globally
npm install -g @visionengine/subtitle-generate
# Set environment variables
export APP_ID="your_app_id"
export ACCESS_TOKEN="your_access_token"
export WORKDIR="./media"
# Run the server
ve-subtitle-generateOutput Format
The tools save subtitles in SRT format:
1
00:00:00,000 --> 00:00:03,197
如果您没有其他需要举报的话这边就先挂断了
2
00:00:03,442 --> 00:00:04,877
祝您生活愉快再见Error Codes
| Code | Meaning | Description | |------|---------|-------------| | 0 | Success | - | | 2000 | Processing | Task is being processed | | 1001 | Invalid parameters | Missing/invalid request parameters | | 1002 | No permission | Token invalid/expired | | 1003 | Rate limited | QPS exceeded | | 1010 | Audio too long | Duration exceeded threshold | | 1012 | Invalid audio format | Audio decode failure | | 1013 | Silent audio | No speech detected |
Development
Build
pnpm buildTest
pnpm testLocal Testing
# Build first
pnpm build
# Run locally
node dist/index.jsSupport
For issues and questions:
- Email: [email protected]
- Website: https://visionengine-tech.com
