mcp-listen

v0.1.3

Published

2 months ago

Give your AI agents the ability to listen. Microphone capture and speech-to-text tools for MCP-compatible agents.

0High
0Medium
0Low

rossarmstrong

mcp model-context-protocol voice microphone speech-to-text audio capture whisper ollama decibri stt transcription

mcp-listen

Give your AI agents the ability to listen

Microphone capture and speech-to-text tools for MCP-compatible agents.

Tools

| Tool | Description | | ------ | ------------- | | list_audio_devices | List available microphone input devices | | capture_audio | Record audio from the microphone and save as WAV | | voice_query | Capture, transcribe (whisper.cpp), and query a local LLM (Ollama) |

Quick Start

Claude Code

claude mcp add mcp-listen npx mcp-listen

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Add to your MCP configuration:

{
  "mcpServers": {
    "mcp-listen": {
      "command": "npx",
      "args": ["-y", "mcp-listen"]
    }
  }
}

Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.

Global Install

npm install -g mcp-listen

Requirements

For list_audio_devices and capture_audio:

Node.js 18+
A microphone

For voice_query (optional):

Ollama running locally
Whisper GGML model (see Whisper Model Setup)

Tool Reference

list_audio_devices

Returns a JSON array of available audio input devices.

Parameters: None

Example response:

[
  { "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
  { "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]

capture_audio

Records audio from the microphone and saves as a WAV file.

Parameters:

| Parameter | Type | Default | Description | | ---------- | ------ | --------- | ------------- | | duration_ms | number | 5000 | Recording duration in milliseconds (100-30000) | | device | number | system default | Device index from list_audio_devices |

Example response:

{
  "path": "/tmp/mcp-listen-1712345678901.wav",
  "duration_ms": 5000,
  "sample_rate": 16000,
  "channels": 1,
  "size_bytes": 160044
}

voice_query

Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.

Parameters:

| Parameter | Type | Default | Description | | ----------- | ------ | --------- | ------------- | | duration_ms | number | 5000 | Recording duration in milliseconds (100-30000) | | device | number | system default | Device index from list_audio_devices | | whisper_model | string | ggml-base.en.bin | Path or filename of Whisper GGML model | | language | string | en | Language code for transcription | | model | string | llama3.2 | Ollama model name | | prompt | string | You are a helpful assistant. | System prompt for the LLM |

Example response:

{
  "transcription": "What is the default port for PostgreSQL?",
  "response": "PostgreSQL runs on port 5432 by default.",
  "model": "llama3.2"
}

How It Works

mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.

Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.

The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.

Whisper Model Setup

The voice_query tool requires a Whisper GGML model file. Download one:

Linux / macOS:

mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Windows (PowerShell):

mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"

The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.

Ollama Setup

Install Ollama from https://ollama.com
Pull a model: ollama pull llama3.2
Ensure Ollama is running: ollama serve

Known Limitations

Fixed recording duration. You specify how long to record. There is no "stop when I stop talking" mode yet.
voice_query requires Ollama running. If Ollama isn't running, the tool returns a clear error message.
Whisper model downloads on first use. The first voice_query call requires a pre-downloaded model (~150MB).
No streaming. MCP's request/response pattern means the entire recording is captured, then transcribed, then sent to the LLM. No real-time partial results.
Temp files. capture_audio writes WAV files to the system temp directory. They are not automatically cleaned up. voice_query cleans up after itself.

Troubleshooting

Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.

Ollama: "Ollama is not running" Some Ollama installations start as a background service automatically. If you see this error, run ollama serve manually or check that the Ollama service is running.

Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.

Powered By

decibri: Cross-platform microphone capture for Node.js
voxagent: Voice-powered terminal agent (inspiration for the voice_query pipeline)

License

Apache-2.0. See LICENSE for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mcp-listen

Tools

Quick Start

Claude Code

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Global Install

Requirements

Tool Reference

list_audio_devices

capture_audio

voice_query

How It Works

Whisper Model Setup

Ollama Setup

Known Limitations

Troubleshooting

Powered By

License