@visionengine/audio-tts
v1.0.4
Published
VisionEngine Audio TTS MCP Server - Text-to-speech synthesis and voice list query service
Maintainers
Readme
@visionengine/audio-tts
VisionEngine Audio TTS MCP Server - Text-to-speech synthesis using Volcengine TTS API with support for multiple voices and languages.
Features
- Text-to-Speech Synthesis - Convert text to natural-sounding speech audio files
- Multiple Voices - Support for various voice types including male/female voices with different styles
- Voice Query - Filter available voices by language
- Audio Customization - Adjust speech rate, volume, pitch, and emotion
- Multiple Formats - Support for MP3, OGG Opus, and PCM output formats
- TTS 2.0 Support - Context-aware speech synthesis with style hints
Installation
As MCP Server
Add to your MCP client configuration:
{
"mcpServers": {
"ve-audio-tts": {
"type": "local",
"command": "npx",
"args": ["-y", "@visionengine/audio-tts@latest"],
"transport": "stdio",
"env": {
"API_URL": "https://openspeech.bytedance.com/api/v3/tts/unidirectional",
"APP_ID": "your_app_id",
"ACCESS_TOKEN": "your_access_key",
"RESOURCE_ID": "seed-tts-2.0",
"WORKDIR": "./public"
}
}
}
}As NPM Package
npm install -g @visionengine/audio-ttsConfiguration
Environment variables:
API_URL- TTS API endpoint (default: https://openspeech.bytedance.com/api/v3/tts/unidirectional)APP_ID- Your Volcengine App ID (required)ACCESS_TOKEN- Your Volcengine Access Key (required)RESOURCE_ID- TTS resource ID (default: seed-tts-2.0)WORKDIR- Directory for saving generated audio files (default: ./)
Tools
tts
Synthesize speech from text and save to an audio file.
Parameters:
text(string, required) - Text content to synthesize into speechspeaker(string, required) - Voice speaker ID (e.g., 'zh_female_vv_uranus_bigtts')format(string, optional) - Audio format: mp3 (default), ogg_opus, or pcmsampleRate(number, optional) - Audio sample rate: 8000, 16000, 22050, 24000, 32000, 44100, 48000 (default: 24000)speechRate(number, optional) - Speech rate: -50 (0.5x) to 100 (2.0x), default: 0loudnessRate(number, optional) - Volume: -50 (0.5x) to 100 (2.0x), default: 0emotion(string, optional) - Emotion setting for supported voices (e.g., 'happy', 'sad')emotionScale(number, optional) - Emotion intensity: 1-5, default: 4contextTexts(string[], optional) - Context hints for TTS 2.0 to adjust styleexplicitLanguage(string, optional) - Explicit language: zh-cn, en, ja, es-mx, id, pt-br, de, frpitch(number, optional) - Pitch adjustment: -12 to 12, default: 0
Example:
// Basic usage
await tts({
text: "Hello, welcome to VisionEngine!",
speaker: "zh_female_vv_uranus_bigtts"
});
// With customization
await tts({
text: "This is a test with custom settings.",
speaker: "zh_male_m191_uranus_bigtts",
format: "mp3",
speechRate: 10,
loudnessRate: 5,
pitch: 2
});list-voices
Query available TTS voices filtered by language.
Parameters:
language(string, optional) - Filter by language code: zh, zh-cn, en, ja, es, id, pt, de, fr. Leave empty for all voices.
Example:
// Get all voices
await listVoices({});
// Get Chinese voices only
await listVoices({
language: "zh"
});Response:
{
"total": 10,
"language": "zh",
"voices": [
{
"voiceType": "zh_female_vv_uranus_bigtts",
"name": "Vivi 2.0",
"gender": "女",
"age": "青年",
"description": "语调平稳、咬字柔和、自带治愈安抚力的女声音色",
"categories": ["通用场景"],
"languages": ["zh-cn"],
"trialURL": "https://..."
}
]
}Usage Examples
MCP Client
Once configured as an MCP server, the tools are available through your MCP client:
> Use tts tool to generate speech from "Hello World" with speaker zh_female_vv_uranus_bigtts
> Use list-voices tool to get available Chinese voicesDirect Usage
# Install globally
npm install -g @visionengine/audio-tts
# Set environment variables
export APP_ID="your_app_id"
export ACCESS_TOKEN="your_access_key"
export WORKDIR="./audio"
# Run the server
ve-audio-ttsClaude Desktop Configuration
Add to your Claude Desktop configuration file:
macOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"ve-audio-tts": {
"command": "npx",
"args": ["-y", "@visionengine/audio-tts@latest"],
"env": {
"APP_ID": "your_app_id",
"ACCESS_TOKEN": "your_access_key",
"WORKDIR": "/Users/username/Audio"
}
}
}
}Restart Claude Desktop to use.
Available Voices
| Voice Type | Name | Gender | Description | |------------|------|--------|-------------| | zh_female_vv_uranus_bigtts | Vivi 2.0 | Female | Gentle and soothing female voice | | zh_female_xiaohe_uranus_bigtts | 小何 2.0 | Female | Sweet and lively young female voice | | zh_male_taocheng_uranus_bigtts | 小天 2.0 | Male | Clear and warm young male voice | | zh_male_m191_uranus_bigtts | 云舟 2.0 | Male | Mature and magnetic male voice | | zh_female_santongyongns_saturn_bigtts | 流畅女声 | Female | Smooth and natural female voice | | zh_female_meilinvyou_saturn_bigtts | 魅力女友 | Female | Charming and gentle female voice |
Use list-voices tool to get the complete list.
Development
Build
npm run buildTest
npm testLocal Testing
# Build first
npm run build
# Run locally
node dist/index.jsSupported Audio Formats
- MP3 - Compressed audio (default)
- OGG Opus - High-quality compressed audio
- PCM - Raw uncompressed audio
Support
For issues and questions:
- Email: [email protected]
- Website: https://visionengine-tech.com
