@sylweriusz/mcp-kokoro-voice
v1.2.0
Published
MCP Kokoro Voice - Local-first voice synthesis with Kokoro TTS and macOS fallback for AI agents.
Maintainers
Readme
🎵 MCP Nexus Voice v5.0
Local-First Voice Synthesis with Multiple Engine Support
AI agents can express themselves with natural voice synthesis using zero cloud dependencies. Support for XTTS2, Kokoro TTS, and macOS system voice fallback.
✨ Architecture
Multi-Engine Support with Intelligent Fallback:
- 🎤 XTTS2 - High-quality voice synthesis with custom voice support (optional, configured via VOICE_CHANNEL)
- 🎌 Kokoro TTS - Local high-quality synthesis with bf_isabella (female English voice)
- 🍎 macOS Fallback - Automatic fallback to system 'say' command with Zoe (Premium) voice
- 🔒 Security Hardened - Command injection protection, input validation, queue limits
- ⚡ Dual Queue System - Sequential synthesis (1 concurrent worker) with sequential playback
- 🛡️ Production Ready - Mutex protection, DoS prevention, proper error handling, waiting queue with 30s timeout
Zero cloud dependencies. Complete privacy. Enterprise security.
🚀 Quick Start
Prerequisites
- Node.js 18+ - Required for MCP server
- macOS - Required for audio playback (afplay) and fallback voice (say)
- Kokoro TTS Server (Optional) - If running, provides high-quality voice synthesis. Otherwise falls back to macOS system voice
Installation Methods
Method 1: From Distribution Package (USB Drive)
Creating the Package:
# Build distribution package
npm run package
# Output files in dist-packages/:
# - mcp-kokoro-voice-v1.0.0.zip (~927 KB)
# - install.shInstalling on Target Mac:
- Copy both files from
dist-packages/to USB drive - On target Mac, navigate to USB drive location
- Run installer:
chmod +x install.sh
./install.sh- Follow on-screen instructions to configure Claude Desktop
- Add configuration to
~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"speech": {
"command": "node",
"args": [
"/Users/YOUR_USERNAME/.mcp-servers/mcp-kokoro-voice-v1.0.0/dist/index.js"
],
"env": {
"KOKORO_API_URL": "http://localhost:8880",
"KOKORO_MODEL": "mlx-community/Kokoro-82M-8bit"
}
}
}
}- Restart Claude Desktop
- Verify with
/mcpcommand
Method 2: NPM Installation (Recommended)
# Install MCP Kokoro Voice globally
npm install -g @sylweriusz/mcp-kokoro-voiceClaude Desktop Configuration (for NPM installation)
Add this to: ~/Library/Application Support/Claude/claude_desktop_config.json
Option 1: Kokoro TTS (Default)
{
"mcpServers": {
"speech": {
"command": "npx",
"args": ["-y", "@sylweriusz/mcp-kokoro-voice"],
"env": {
"VOICE_CHANNEL": "KOKORO",
"KOKORO_API_URL": "http://localhost:8880"
}
}
}
}Option 2: XTTS2 with Kokoro Fallback
{
"mcpServers": {
"speech": {
"command": "npx",
"args": ["-y", "@sylweriusz/mcp-kokoro-voice"],
"env": {
"VOICE_CHANNEL": "XTTS2",
"XTTS2_API_URL": "http://your-xtts2-server:5002",
"XTTS2_MALE_VOICE": "patrick_stewart",
"XTTS2_FEMALE_VOICE": "langusta",
"XTTS2_SPEED": "1.0",
"XTTS2_PITCH": "1.0",
"KOKORO_API_URL": "http://localhost:8880"
}
}
}
}Configuration Notes:
VOICE_CHANNEL: Choose betweenXTTS2orKOKORO(defaults toKOKORO)KOKORO_API_URLis optional (defaults tohttp://localhost:8880)XTTS2_API_URL: Required when using XTTS2 channelXTTS2_MALE_VOICEandXTTS2_FEMALE_VOICE: Custom voice names (optional)XTTS2_SPEED: Speech tempo - 1.0=normal, 1.2=20% faster, 0.8=20% slower (optional, defaults to 1.0)XTTS2_PITCH: Voice pitch/frequency - 1.0=normal, 1.1=10% higher, 0.9=10% lower (optional, defaults to 1.0)- If primary engine is unavailable, automatically falls back to next available engine
- Restart Claude Desktop after configuration changes
Verification
# Test installation
npx @sylweriusz/mcp-kokoro-voice
# Should show:
# 🎵 MCP Nexus Voice v1.0 ready
# 🎌 Kokoro TTS: Available (or Unavailable if fallback active)🎮 Usage
Basic Voice Expression
With XTTS2 (Multi-language):
// English with default settings
say({
text: "Hello! I'm excited to help you today!",
language: "en"
})
// Polish with specific voice
say({
text: "Dzień dobry! Jak się masz?",
language: "pl",
voice: "narrator"
})
// Spanish with custom voice
say({
text: "¡Hola! ¿Cómo estás?",
language: "es",
voice: "langusta"
})With Kokoro (English only):
// Voice synthesis with automatic fallback
// - Kokoro TTS available: Uses bf_isabella (female English)
// - Kokoro unavailable: Falls back to macOS Zoe (Premium)
say("Hello! I'm excited to help you today!")Voice Quality Tuning (XTTS2)
Speed and pitch are configured globally via environment variables and apply to all synthesis:
- XTTS2_SPEED: Controls speaking tempo (how fast the voice talks)
- XTTS2_PITCH: Controls voice frequency/octave (how high/low the voice sounds)
Example configurations:
# Natural female voice (recommended starting point)
XTTS2_SPEED=1.04 # Slightly faster for better flow
XTTS2_PITCH=1.10 # Higher pitch for feminine tone
# Natural male voice
XTTS2_SPEED=1.0 # Normal tempo
XTTS2_PITCH=0.95 # Slightly lower for masculine tone
# Fast-paced narration
XTTS2_SPEED=1.2 # 20% faster
XTTS2_PITCH=1.0 # Normal pitch
# Slow, deliberate speech
XTTS2_SPEED=0.85 # 15% slower
XTTS2_PITCH=1.0 # Normal pitch📝 Text Preprocessing Requirements (CRITICAL)
For optimal synthesis quality, preprocess text according to these guidelines:
1. Language & Translation
- Always provide English text only
- Translate non-English text to natural, conversational English
2. TTS Optimization (Critical for Quality)
// Expand abbreviations
say("Doctor Smith called Mister Johnson") // Not: Dr. Smith called Mr. Johnson
// Convert numbers to words
say("one hundred twenty-three") // Not: 123
say("twenty twenty-four") // Not: 2024
// Spell out currency
say("fifty dollars") // Not: $50
say("twenty-five euros") // Not: €25
// Expand dates
say("January first") // Not: Jan 1st
say("December twenty-fifth") // Not: 12/25
// Convert times
say("three thirty PM") // Not: 3:30 PM
say("two PM") // Not: 14:00
// Spell out symbols
say("and") // Not: &
say("percent") // Not: %
say("at") // Not: @
// Handle acronyms with dots for pronunciation
say("N.A.S.A.") // Not: NASA
say("F.B.I.") // Not: FBI3. Speech Flow Optimization
// Use natural punctuation for speech pauses
say("Welcome to our platform. Let's get started with your project.")
// Break long sentences into speakable segments
say("First, we'll analyze the data. Then, we'll generate the report.")
// Ensure text flows naturally when spoken aloud
say("The meeting is scheduled for tomorrow at nine AM in conference room B.")Example Transformation
// ❌ Poor: Raw text with abbreviations and symbols
"Dr. Smith earned $1,000 on Jan 1st @ 3:30 PM (approx. 50%)"
// ✅ Good: Preprocessed for optimal synthesis
say("Doctor Smith earned one thousand dollars on January first at three thirty PM, approximately fifty percent")🛠️ Environment Setup
Environment Variables
# Voice Channel Selection (XTTS2 or KOKORO)
export VOICE_CHANNEL=XTTS2 # or KOKORO (default)
# XTTS2 Configuration (when using XTTS2 channel)
export XTTS2_API_URL=http://your-xtts2-server:5002
export XTTS2_MALE_VOICE=patrick_stewart
export XTTS2_FEMALE_VOICE=langusta
# Optional: Adjust speed and pitch for optimal voice quality
export XTTS2_SPEED=1.0 # Speech tempo: 1.0=normal, 1.04=4% faster, 1.2=20% faster
export XTTS2_PITCH=1.0 # Voice pitch: 1.0=normal, 1.05=5% higher, 1.1=10% higher
# Example: If voice sounds too deep/slow, try these values:
# export XTTS2_SPEED=1.04
# export XTTS2_PITCH=1.10
# Kokoro Configuration
export KOKORO_API_URL=http://localhost:8880 # Optional, defaults to this value
export KOKORO_MODEL=mlx-community/Kokoro-82M-8bit # OptionalNote: The system automatically falls back through available engines: XTTS2 → Kokoro → macOS system voice.
📊 Engine Status
The system shows real-time engine availability at startup:
When using XTTS2:
🎵 MCP Nexus Voice v5.0 ready
🎯 Active Channel: XTTS2
🎤 XTTS2 TTS: Available
🎌 Kokoro TTS (fallback): AvailableWhen using Kokoro (default):
🎵 MCP Nexus Voice v5.0 ready
🎯 Active Channel: KOKORO
🎌 Kokoro TTS: Available
🎤 XTTS2 TTS (alternative): AvailableWhen primary engine is unavailable, synthesis automatically falls back to the next available engine.
🔒 Security Features
- Command Injection Protection - Secure execFile usage prevents malicious commands
- Input Validation - Text length limits, path traversal prevention
- Queue Limits - DoS protection with configurable synthesis and playback limits
- Mutex Protection - Race condition prevention in queue operations
- Secure File Handling - Temporary file cleanup and path validation
🛠️ Troubleshooting
Common Issues
XTTS2 TTS Not Available
If you're using VOICE_CHANNEL=XTTS2 and XTTS2 is unavailable, the system automatically falls back to Kokoro or macOS voice.
To enable XTTS2:
# Check if XTTS2 server is running
curl http://your-xtts2-server:5002/speakers
# Should return: JSON array of available voices
# Check environment variables
echo $VOICE_CHANNEL # Should be: XTTS2
echo $XTTS2_API_URL # Should be: http://your-xtts2-server:5002
# Verify server status in logs
tail -f ~/Library/Logs/Claude/mcp-server-speech.logKokoro TTS Not Available (Fallback Active)
This is not an error - the system automatically uses macOS system voice.
To enable Kokoro TTS for better quality:
# Check if Kokoro server is running
curl -I http://localhost:8880/tts
# Should return: HTTP/1.1 405 Method Not Allowed (endpoint exists)
# Check environment variable (if set)
echo $KOKORO_API_URL
# Should show: http://localhost:8880 or empty (uses default)
# Verify server status in logs
tail -f ~/Library/Logs/Claude/mcp-server-speech.logXTTS2 Audio Speed/Pitch Adjustment
If XTTS2 audio sounds too slow/fast or pitch is too high/low, adjust using native API parameters:
# In .env or Claude Desktop config
# Speed (tempo of speech)
XTTS2_SPEED=1.0 # Normal speed
XTTS2_SPEED=1.04 # 4% faster (subtle)
XTTS2_SPEED=1.2 # 20% faster
XTTS2_SPEED=0.8 # 20% slower
# Pitch (voice frequency/octave)
XTTS2_PITCH=1.0 # Normal pitch
XTTS2_PITCH=1.05 # 5% higher (subtle)
XTTS2_PITCH=1.10 # 10% higher (noticeable)
XTTS2_PITCH=0.9 # 10% lower (deeper voice)Common Adjustments:
- Voice sounds too low/deep:
XTTS2_PITCH=1.05to1.10 - Voice speaks too slow:
XTTS2_SPEED=1.04to1.20 - Voice sounds robotic: Try
XTTS2_SPEED=0.95for more natural tempo
Example Configuration:
{
"env": {
"VOICE_CHANNEL": "XTTS2",
"XTTS2_API_URL": "http://your-server:5002",
"XTTS2_SPEED": "1.04",
"XTTS2_PITCH": "1.10"
}
}Note: These parameters are applied via ffmpeg post-processing (XTTS2 API doesn't reliably support native speed/pitch adjustment). Requires ffmpeg to be installed.
Text Not Playing
- Ensure text is preprocessed according to guidelines (abbreviations expanded, etc.)
- Check text length (max 1000 characters)
- Verify audio system is working:
afplay /System/Library/Sounds/Glass.aiff
Queue Full Errors
- System has protective limits: max 10 synthesis tasks (1 concurrent worker), max 20 playback tasks
- Requests exceeding synthesis queue enter waiting queue with 30-second timeout
- Sequential processing prevents Kokoro TTS server overload
- Wait for current tasks to complete or restart the server
Debug Mode
# Enable MCP debugging
export MCP_TIMEOUT=10000
export PYTHONUNBUFFERED=1
# Run with debug output
claude --mcp-debug📄 License
MIT License - see LICENSE file for details.
🎵 Express yourself vocally - the local way!
