mcp-listen
v0.1.2
Published
Give your AI agents the ability to listen. Microphone capture and speech-to-text tools for MCP-compatible agents.
Maintainers
Readme
mcp-listen
Give your AI agents the ability to listen
Microphone capture and speech-to-text tools for MCP-compatible agents.
Tools
| Tool | Description |
| ------ | ------------- |
| list_audio_devices | List available microphone input devices |
| capture_audio | Record audio from the microphone and save as WAV |
| voice_query | Capture, transcribe (whisper.cpp), and query a local LLM (Ollama) |
Quick Start
Claude Code
claude mcp add mcp-listen npx mcp-listenClaude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code
Add to your MCP configuration:
{
"mcpServers": {
"mcp-listen": {
"command": "npx",
"args": ["-y", "mcp-listen"]
}
}
}Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.
Global Install
npm install -g mcp-listenRequirements
For list_audio_devices and capture_audio:
- Node.js 18+
- A microphone
For voice_query (optional):
- Ollama running locally
- Whisper GGML model (see Whisper Model Setup)
Tool Reference
list_audio_devices
Returns a JSON array of available audio input devices.
Parameters: None
Example response:
[
{ "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
{ "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]capture_audio
Records audio from the microphone and saves as a WAV file.
Parameters:
| Parameter | Type | Default | Description |
| ---------- | ------ | --------- | ------------- |
| duration_ms | number | 5000 | Recording duration in milliseconds (100-30000) |
| device | number | system default | Device index from list_audio_devices |
Example response:
{
"path": "/tmp/mcp-listen-1712345678901.wav",
"duration_ms": 5000,
"sample_rate": 16000,
"channels": 1,
"size_bytes": 160044
}voice_query
Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.
Parameters:
| Parameter | Type | Default | Description |
| ----------- | ------ | --------- | ------------- |
| duration_ms | number | 5000 | Recording duration in milliseconds (100-30000) |
| device | number | system default | Device index from list_audio_devices |
| whisper_model | string | ggml-base.en.bin | Path or filename of Whisper GGML model |
| language | string | en | Language code for transcription |
| model | string | llama3.2 | Ollama model name |
| prompt | string | You are a helpful assistant. | System prompt for the LLM |
Example response:
{
"transcription": "What is the default port for PostgreSQL?",
"response": "PostgreSQL runs on port 5432 by default.",
"model": "llama3.2"
}How It Works
mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.
Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.
The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.
Whisper Model Setup
The voice_query tool requires a Whisper GGML model file. Download one:
Linux / macOS:
mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.binWindows (PowerShell):
mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.
Ollama Setup
- Install Ollama from https://ollama.com
- Pull a model:
ollama pull llama3.2 - Ensure Ollama is running:
ollama serve
Known Limitations
- Fixed recording duration. You specify how long to record. There is no "stop when I stop talking" mode yet.
voice_queryrequires Ollama running. If Ollama isn't running, the tool returns a clear error message.- Whisper model downloads on first use. The first
voice_querycall requires a pre-downloaded model (~150MB). - No streaming. MCP's request/response pattern means the entire recording is captured, then transcribed, then sent to the LLM. No real-time partial results.
- Temp files.
capture_audiowrites WAV files to the system temp directory. They are not automatically cleaned up.voice_querycleans up after itself.
Troubleshooting
Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.
Ollama: "Ollama is not running"
Some Ollama installations start as a background service automatically. If you see this error, run ollama serve manually or check that the Ollama service is running.
Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.
Powered By
- decibri: Cross-platform microphone capture for Node.js
- voxagent: Voice-powered terminal agent (inspiration for the voice_query pipeline)
License
Apache-2.0. See LICENSE for details.
Copyright 2026 Analytics in Motion
