@redlasha/talk-to
v0.1.2
Published
Korean voice MCP server for Claude Code - STT/TTS with local Whisper + Edge TTS
Maintainers
Readme
talk-to
Korean voice MCP server for Claude Code. Speak and listen in Korean.
Features
- voice_listen - Microphone → Korean text (STT)
- voice_speak - Text → Korean speech (TTS)
- voice_converse - Speak then listen (bidirectional)
Quick Start
1. Install system dependencies
Linux:
sudo apt install sox libsox-fmt-all
pip install edge-ttsmacOS:
brew install sox
pip install edge-ttsWindows:
pip install edge-tts
# sox is NOT required on Windows — audio uses PowerShell natively2. Add to Claude Code
Linux / macOS:
claude mcp add talk-to -- npx -y @redlasha/talk-to mcpWindows:
claude mcp add talk-to -- cmd /c npx -y @redlasha/talk-to mcpWith Groq API key (optional cloud STT fallback):
# Linux / macOS
claude mcp add --env GROQ_API_KEY=gsk_xxx talk-to -- npx -y @redlasha/talk-to mcp
# Windows
claude mcp add --env GROQ_API_KEY=gsk_xxx talk-to -- cmd /c npx -y @redlasha/talk-to mcpDone. Start a new Claude Code session and the tools are available.
3. STT Backend (choose one)
Option A: Local Whisper (recommended, free)
# whisper.cpp server on port 2022
./whisper-server -m models/ggml-medium.bin -l ko --port 2022GPU acceleration is strongly recommended. Without GPU, STT takes 5-10 seconds per utterance. With GPU, it drops to under 1 second. See GPU Setup below.
Option B: Groq API (cloud fallback)
# Set via claude mcp add --env, or export directly
export GROQ_API_KEY=gsk_your_key_hereCLI Commands
npx @redlasha/talk-to check # Verify dependencies (auto-detects OS)
npx @redlasha/talk-to test # Quick voice roundtrip test
npx @redlasha/talk-to setup # Print .mcp.json config
npx @redlasha/talk-to help # Show helpKorean Voices
| Voice | ID |
|-------|----|
| Female (default) | ko-KR-SunHiNeural |
| Male | ko-KR-InJoonNeural |
Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| GROQ_API_KEY | Groq API key for cloud STT | No (if local Whisper) |
| WHISPER_URL | Local Whisper server URL | No (default: http://localhost:2022) |
GPU Setup for Whisper
GPU acceleration makes the difference between unusable and real-time STT.
| Setup | Latency | Experience | |-------|---------|------------| | CPU only | 5-10s per utterance | Painful, conversation breaks | | NVIDIA GPU (CUDA) | < 1s | Natural conversation flow | | Apple Silicon (Metal) | < 1s | Built-in, no extra setup |
Linux (NVIDIA CUDA)
# 1. Install CUDA toolkit
sudo apt install nvidia-cuda-toolkit
# 2. Build whisper.cpp with CUDA
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
# 3. Download Korean-optimized model
./models/download-ggml-model.sh medium
# 4. Run server with GPU
./build/bin/whisper-server -m models/ggml-medium.bin -l ko --port 2022macOS (Apple Silicon)
Metal is enabled by default. No extra setup needed:
cmake -B build
cmake --build build --config Release
./build/bin/whisper-server -m models/ggml-medium.bin -l ko --port 2022Windows (NVIDIA CUDA)
# Install CUDA Toolkit from https://developer.nvidia.com/cuda-downloads
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
.\build\bin\Release\whisper-server.exe -m models\ggml-medium.bin -l ko --port 2022Model Selection
| Model | Size | Accuracy | GPU VRAM |
|-------|------|----------|----------|
| base | 150MB | Good | ~300MB |
| small | 460MB | Better | ~1GB |
| medium | 1.5GB | Best for Korean | ~2.5GB |
| large-v3 | 3GB | Highest accuracy | ~5GB |
medium is recommended for Korean - best balance of accuracy and speed.
Platform Support
| Platform | Audio | edge-tts | Whisper | Status | |----------|-------|----------|---------|--------| | Linux (Ubuntu/Debian) | sox (apt) | pip | whisper.cpp | Tested | | macOS | sox (brew) | pip | whisper.cpp | Supported | | Windows | PowerShell (built-in) | pip | whisper.cpp | Supported |
License
MIT
