@visionengine/audio-tts

v1.0.4

Published

10 days ago

VisionEngine Audio TTS MCP Server - Text-to-speech synthesis and voice list query service

0High
0Medium
0Low

yc.ma

mcp tts text-to-speech audio speech-synthesis volcengine visionengine

@visionengine/audio-tts

VisionEngine Audio TTS MCP Server - Text-to-speech synthesis using Volcengine TTS API with support for multiple voices and languages.

Features

Text-to-Speech Synthesis - Convert text to natural-sounding speech audio files
Multiple Voices - Support for various voice types including male/female voices with different styles
Voice Query - Filter available voices by language
Audio Customization - Adjust speech rate, volume, pitch, and emotion
Multiple Formats - Support for MP3, OGG Opus, and PCM output formats
TTS 2.0 Support - Context-aware speech synthesis with style hints

Installation

As MCP Server

Add to your MCP client configuration:

{
  "mcpServers": {
    "ve-audio-tts": {
      "type": "local",
      "command": "npx",
      "args": ["-y", "@visionengine/audio-tts@latest"],
      "transport": "stdio",
      "env": {
        "API_URL": "https://openspeech.bytedance.com/api/v3/tts/unidirectional",
        "APP_ID": "your_app_id",
        "ACCESS_TOKEN": "your_access_key",
        "RESOURCE_ID": "seed-tts-2.0",
        "WORKDIR": "./public"
      }
    }
  }
}

As NPM Package

npm install -g @visionengine/audio-tts

Configuration

Environment variables:

API_URL - TTS API endpoint (default: https://openspeech.bytedance.com/api/v3/tts/unidirectional)
APP_ID - Your Volcengine App ID (required)
ACCESS_TOKEN - Your Volcengine Access Key (required)
RESOURCE_ID - TTS resource ID (default: seed-tts-2.0)
WORKDIR - Directory for saving generated audio files (default: ./)

Tools

tts

Synthesize speech from text and save to an audio file.

Parameters:

text (string, required) - Text content to synthesize into speech
speaker (string, required) - Voice speaker ID (e.g., 'zh_female_vv_uranus_bigtts')
format (string, optional) - Audio format: mp3 (default), ogg_opus, or pcm
sampleRate (number, optional) - Audio sample rate: 8000, 16000, 22050, 24000, 32000, 44100, 48000 (default: 24000)
speechRate (number, optional) - Speech rate: -50 (0.5x) to 100 (2.0x), default: 0
loudnessRate (number, optional) - Volume: -50 (0.5x) to 100 (2.0x), default: 0
emotion (string, optional) - Emotion setting for supported voices (e.g., 'happy', 'sad')
emotionScale (number, optional) - Emotion intensity: 1-5, default: 4
contextTexts (string[], optional) - Context hints for TTS 2.0 to adjust style
explicitLanguage (string, optional) - Explicit language: zh-cn, en, ja, es-mx, id, pt-br, de, fr
pitch (number, optional) - Pitch adjustment: -12 to 12, default: 0

Example:

// Basic usage
await tts({
  text: "Hello, welcome to VisionEngine!",
  speaker: "zh_female_vv_uranus_bigtts"
});

// With customization
await tts({
  text: "This is a test with custom settings.",
  speaker: "zh_male_m191_uranus_bigtts",
  format: "mp3",
  speechRate: 10,
  loudnessRate: 5,
  pitch: 2
});

list-voices

Query available TTS voices filtered by language.

Parameters:

language (string, optional) - Filter by language code: zh, zh-cn, en, ja, es, id, pt, de, fr. Leave empty for all voices.

Example:

// Get all voices
await listVoices({});

// Get Chinese voices only
await listVoices({
  language: "zh"
});

Response:

{
  "total": 10,
  "language": "zh",
  "voices": [
    {
      "voiceType": "zh_female_vv_uranus_bigtts",
      "name": "Vivi 2.0",
      "gender": "女",
      "age": "青年",
      "description": "语调平稳、咬字柔和、自带治愈安抚力的女声音色",
      "categories": ["通用场景"],
      "languages": ["zh-cn"],
      "trialURL": "https://..."
    }
  ]
}

Usage Examples

MCP Client

Once configured as an MCP server, the tools are available through your MCP client:

> Use tts tool to generate speech from "Hello World" with speaker zh_female_vv_uranus_bigtts
> Use list-voices tool to get available Chinese voices

Direct Usage

# Install globally
npm install -g @visionengine/audio-tts

# Set environment variables
export APP_ID="your_app_id"
export ACCESS_TOKEN="your_access_key"
export WORKDIR="./audio"

# Run the server
ve-audio-tts

Claude Desktop Configuration

Add to your Claude Desktop configuration file:

macOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "ve-audio-tts": {
      "command": "npx",
      "args": ["-y", "@visionengine/audio-tts@latest"],
      "env": {
        "APP_ID": "your_app_id",
        "ACCESS_TOKEN": "your_access_key",
        "WORKDIR": "/Users/username/Audio"
      }
    }
  }
}

Restart Claude Desktop to use.

Available Voices

| Voice Type | Name | Gender | Description | |------------|------|--------|-------------| | zh_female_vv_uranus_bigtts | Vivi 2.0 | Female | Gentle and soothing female voice | | zh_female_xiaohe_uranus_bigtts | 小何 2.0 | Female | Sweet and lively young female voice | | zh_male_taocheng_uranus_bigtts | 小天 2.0 | Male | Clear and warm young male voice | | zh_male_m191_uranus_bigtts | 云舟 2.0 | Male | Mature and magnetic male voice | | zh_female_santongyongns_saturn_bigtts | 流畅女声 | Female | Smooth and natural female voice | | zh_female_meilinvyou_saturn_bigtts | 魅力女友 | Female | Charming and gentle female voice |

Use list-voices tool to get the complete list.

Development

Build

npm run build

Test

npm test

Local Testing

# Build first
npm run build

# Run locally
node dist/index.js

Supported Audio Formats

MP3 - Compressed audio (default)
OGG Opus - High-quality compressed audio
PCM - Raw uncompressed audio

Support

For issues and questions:

Email: [email protected]
Website: https://visionengine-tech.com

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@visionengine/audio-tts

Features

Installation

As MCP Server

As NPM Package

Configuration

Tools

tts

list-voices

Usage Examples

MCP Client

Direct Usage

Claude Desktop Configuration

Available Voices

Development

Build

Test

Local Testing

Supported Audio Formats

Support