sotto

v1.1.4

Published

13 days ago

Voice input for Claude Code — real-time local speech-to-text via whisper.cpp

Downloads

603

0High
0Medium
0Low

sourabhbgp

mcp whisper speech-to-text voice claude claude-code

sotto

Voice input for Claude Code. Speak instead of typing.

A local, open-source MCP server that streams your voice to whisper.cpp for real-time transcription and sends the text to Claude Code. Everything runs on your machine — no cloud APIs, no network calls.

macOS only. Sotto uses osascript and the Cocoa framework for its floating status indicator. Linux and Windows are not supported.

How It Works

You speak → sotto streams audio to whisper-stream for live transcription
→ a floating indicator shows status and live text
→ silence detected or you click stop → text returned to Claude
→ Claude treats it as your message and responds

Prerequisites

macOS (Apple Silicon recommended, Intel works too)
Node.js >= 18
whisper-cpp — local speech-to-text with live streaming

Install system dependencies:

brew install whisper-cpp

Installation

npm install -g sotto
sotto-setup

The setup command will:

Verify whisper-stream is installed (ships with whisper-cpp)
Download the Whisper Base English model (~150MB) to ~/.local/share/sotto/models/
Create a default config at ~/.config/sotto/config.json

Then register with Claude Code:

# Available in all projects (recommended for most users)
claude mcp add sotto -s user -- sotto

# Or, available only in the current project
claude mcp add sotto -s local -- sotto

Use user scope if you want voice input everywhere. Use local scope if you only want sotto in a specific project.

On first use, macOS will prompt you to grant microphone access to your terminal app (Terminal, iTerm2, etc.) in System Settings > Privacy & Security > Microphone.

Usage

In Claude Code, type:

/sotto:listen

A floating indicator appears at the bottom of your screen showing:

Recording status (listening / transcribing)
Live transcription text as you speak
A stop button to end recording early

Recording stops automatically after silence is detected, or when you click the stop button. Your speech is transcribed and sent to Claude as text.

Configuration

Edit ~/.config/sotto/config.json:

| Setting | Default | Env Var | Description | |---|---|---|---| | modelPath | ~/.local/share/sotto/models/ggml-base.en.bin | WHISPER_MODEL_PATH | Path to GGML model | | language | en | WHISPER_LANGUAGE | Language code | | maxDuration | 30 | WHISPER_MAX_DURATION | Max recording seconds |

Environment variables take precedence over the config file.

Troubleshooting

| Problem | Solution | |---|---| | "whisper-stream is not installed" | brew install whisper-cpp | | "Model not found" | Run sotto-setup | | "Microphone access denied" | Grant mic access to your terminal in System Settings > Privacy & Security > Microphone | | No speech detected | Make sure your microphone is working and you're speaking loudly enough | | Transcription is slow | The base model is ~3s for a 5s clip on Apple Silicon. Try the tiny model for faster results. |

Development

git clone https://github.com/sourabhbgp/sotto.git
cd sotto
npm install
npm run build
npm test

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

sotto

How It Works

Prerequisites

Installation

Usage

Configuration

Troubleshooting

Development

License