sotto
v1.1.4
Published
Voice input for Claude Code — real-time local speech-to-text via whisper.cpp
Downloads
603
Maintainers
Readme
sotto
Voice input for Claude Code. Speak instead of typing.
A local, open-source MCP server that streams your voice to whisper.cpp for real-time transcription and sends the text to Claude Code. Everything runs on your machine — no cloud APIs, no network calls.
macOS only. Sotto uses
osascriptand the Cocoa framework for its floating status indicator. Linux and Windows are not supported.
How It Works
You speak → sotto streams audio to whisper-stream for live transcription
→ a floating indicator shows status and live text
→ silence detected or you click stop → text returned to Claude
→ Claude treats it as your message and respondsPrerequisites
- macOS (Apple Silicon recommended, Intel works too)
- Node.js >= 18
- whisper-cpp — local speech-to-text with live streaming
Install system dependencies:
brew install whisper-cppInstallation
npm install -g sotto
sotto-setupThe setup command will:
- Verify
whisper-streamis installed (ships with whisper-cpp) - Download the Whisper Base English model (~150MB) to
~/.local/share/sotto/models/ - Create a default config at
~/.config/sotto/config.json
Then register with Claude Code:
# Available in all projects (recommended for most users)
claude mcp add sotto -s user -- sotto
# Or, available only in the current project
claude mcp add sotto -s local -- sottoUse user scope if you want voice input everywhere. Use local scope if you only want sotto in a specific project.
On first use, macOS will prompt you to grant microphone access to your terminal app (Terminal, iTerm2, etc.) in System Settings > Privacy & Security > Microphone.
Usage
In Claude Code, type:
/sotto:listenA floating indicator appears at the bottom of your screen showing:
- Recording status (listening / transcribing)
- Live transcription text as you speak
- A stop button to end recording early
Recording stops automatically after silence is detected, or when you click the stop button. Your speech is transcribed and sent to Claude as text.
Configuration
Edit ~/.config/sotto/config.json:
| Setting | Default | Env Var | Description |
|---|---|---|---|
| modelPath | ~/.local/share/sotto/models/ggml-base.en.bin | WHISPER_MODEL_PATH | Path to GGML model |
| language | en | WHISPER_LANGUAGE | Language code |
| maxDuration | 30 | WHISPER_MAX_DURATION | Max recording seconds |
Environment variables take precedence over the config file.
Troubleshooting
| Problem | Solution |
|---|---|
| "whisper-stream is not installed" | brew install whisper-cpp |
| "Model not found" | Run sotto-setup |
| "Microphone access denied" | Grant mic access to your terminal in System Settings > Privacy & Security > Microphone |
| No speech detected | Make sure your microphone is working and you're speaking loudly enough |
| Transcription is slow | The base model is ~3s for a 5s clip on Apple Silicon. Try the tiny model for faster results. |
Development
git clone https://github.com/sourabhbgp/sotto.git
cd sotto
npm install
npm run build
npm testLicense
MIT
