@redlasha/talk-to

v0.1.2

Published

2 months ago

Korean voice MCP server for Claude Code - STT/TTS with local Whisper + Edge TTS

0High
0Medium
0Low

redlasha

mcp korean voice stt tts claude-code whisper edge-tts

talk-to

Korean voice MCP server for Claude Code. Speak and listen in Korean.

Features

voice_listen - Microphone → Korean text (STT)
voice_speak - Text → Korean speech (TTS)
voice_converse - Speak then listen (bidirectional)

Quick Start

1. Install system dependencies

Linux:

sudo apt install sox libsox-fmt-all
pip install edge-tts

macOS:

brew install sox
pip install edge-tts

Windows:

pip install edge-tts
# sox is NOT required on Windows — audio uses PowerShell natively

2. Add to Claude Code

Linux / macOS:

claude mcp add talk-to -- npx -y @redlasha/talk-to mcp

Windows:

claude mcp add talk-to -- cmd /c npx -y @redlasha/talk-to mcp

With Groq API key (optional cloud STT fallback):

# Linux / macOS
claude mcp add --env GROQ_API_KEY=gsk_xxx talk-to -- npx -y @redlasha/talk-to mcp

# Windows
claude mcp add --env GROQ_API_KEY=gsk_xxx talk-to -- cmd /c npx -y @redlasha/talk-to mcp

Done. Start a new Claude Code session and the tools are available.

3. STT Backend (choose one)

Option A: Local Whisper (recommended, free)

# whisper.cpp server on port 2022
./whisper-server -m models/ggml-medium.bin -l ko --port 2022

GPU acceleration is strongly recommended. Without GPU, STT takes 5-10 seconds per utterance. With GPU, it drops to under 1 second. See GPU Setup below.

Option B: Groq API (cloud fallback)

# Set via claude mcp add --env, or export directly
export GROQ_API_KEY=gsk_your_key_here

CLI Commands

npx @redlasha/talk-to check   # Verify dependencies (auto-detects OS)
npx @redlasha/talk-to test    # Quick voice roundtrip test
npx @redlasha/talk-to setup   # Print .mcp.json config
npx @redlasha/talk-to help    # Show help

Korean Voices

| Voice | ID | |-------|----| | Female (default) | ko-KR-SunHiNeural | | Male | ko-KR-InJoonNeural |

Environment Variables

| Variable | Description | Required | |----------|-------------|----------| | GROQ_API_KEY | Groq API key for cloud STT | No (if local Whisper) | | WHISPER_URL | Local Whisper server URL | No (default: http://localhost:2022) |

GPU Setup for Whisper

GPU acceleration makes the difference between unusable and real-time STT.

| Setup | Latency | Experience | |-------|---------|------------| | CPU only | 5-10s per utterance | Painful, conversation breaks | | NVIDIA GPU (CUDA) | < 1s | Natural conversation flow | | Apple Silicon (Metal) | < 1s | Built-in, no extra setup |

Linux (NVIDIA CUDA)

# 1. Install CUDA toolkit
sudo apt install nvidia-cuda-toolkit

# 2. Build whisper.cpp with CUDA
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

# 3. Download Korean-optimized model
./models/download-ggml-model.sh medium

# 4. Run server with GPU
./build/bin/whisper-server -m models/ggml-medium.bin -l ko --port 2022

macOS (Apple Silicon)

Metal is enabled by default. No extra setup needed:

cmake -B build
cmake --build build --config Release
./build/bin/whisper-server -m models/ggml-medium.bin -l ko --port 2022

Windows (NVIDIA CUDA)

# Install CUDA Toolkit from https://developer.nvidia.com/cuda-downloads
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
.\build\bin\Release\whisper-server.exe -m models\ggml-medium.bin -l ko --port 2022

Model Selection

| Model | Size | Accuracy | GPU VRAM | |-------|------|----------|----------| | base | 150MB | Good | ~300MB | | small | 460MB | Better | ~1GB | | medium | 1.5GB | Best for Korean | ~2.5GB | | large-v3 | 3GB | Highest accuracy | ~5GB |

medium is recommended for Korean - best balance of accuracy and speed.

Platform Support

| Platform | Audio | edge-tts | Whisper | Status | |----------|-------|----------|---------|--------| | Linux (Ubuntu/Debian) | sox (apt) | pip | whisper.cpp | Tested | | macOS | sox (brew) | pip | whisper.cpp | Supported | | Windows | PowerShell (built-in) | pip | whisper.cpp | Supported |

License

MIT