cloud-asr-mcp
v0.4.0
Published
MCP server using audio multimodal models for transcription and output styling in one pass. Format-specific outputs (email, todo, blog, etc.) via OpenRouter, Voxtral, OpenAI, and Gemini.
Maintainers
Readme
Cloud ASR MCP
A Model Context Protocol (MCP) server using audio multimodal models for transcription and output styling in one pass—not traditional ASR.
How It Works
Unlike conventional speech-to-text (Whisper, etc.), this MCP uses audio-capable multimodal models (Gemini, GPT-4o Audio, Voxtral) that process audio in a single pass. The key advantage: you can provide text prompt guidance to clean up transcripts on the fly—removing filler words, formatting speaker turns, or applying custom instructions—all in one API call.
Tested: Successfully transcribed 50-minute audio files using Gemini in one pass—no chunking required.
Features
- Single-pass transcription + formatting - Multimodal LLMs process audio and apply formatting in one call
- 4 streamlined tools - Raw, clean, custom prompt, or format-specific output
- 14 provider/model presets - Set once via env var, use everywhere
- 10 built-in formats - Email, to-do, blog, meeting notes, bug report, and more
- Long-form support - Validated with 50+ minute audio files
Tools
| Tool | Description |
|------|-------------|
| transcribe_raw | Verbatim transcription—includes filler words, false starts, repetitions |
| transcribe_basic_clean | Removes filler words, adds punctuation and paragraph breaks |
| transcribe_user_prompt | Your custom system prompt for specific formatting |
| transcribe_format | Apply a built-in format (email, todo, blog, etc.) |
Available Formats for transcribe_format
summary, email, todo, json, blog, feature-request, bug-report, meeting-notes, formal, friendly
Installation
npm install -g cloud-asr-mcpConfiguration
Transcription Presets
Set TRANSCRIPTION_PRESET to select your provider and model (default: 1):
| # | Provider | Model | Description | |---|----------|-------|-------------| | 1 | OpenRouter | gemini-2.5-flash | Default - Fast and cost-effective | | 2 | OpenRouter | gemini-2.5-pro | Higher quality | | 3 | OpenRouter | gemini-2.5-flash-lite | Economy option | | 4 | OpenRouter | gpt-4o-audio | OpenAI via OpenRouter | | 5 | OpenRouter | voxtral-mini | Mistral voice model | | 6 | OpenRouter | voxtral-small | Larger Mistral model | | 7 | Voxtral Direct | voxtral-mini-latest | Direct Mistral API | | 8 | Voxtral Direct | voxtral-small-latest | Direct Mistral API | | 9 | OpenAI Direct | gpt-4o-transcribe | Direct OpenAI API | | 10 | OpenAI Direct | gpt-4o-mini-transcribe | Economy OpenAI | | 11 | Gemini Direct | gemini-flash-latest | Dynamic latest model | | 12 | Gemini Direct | gemini-2.5-flash | Stable flash model | | 13 | Gemini Direct | gemini-2.5-pro | Pro model | | 14 | Gemini Direct | gemini-2.5-flash-lite | Lite model |
Environment Variables
# Required: At least one API key
OPENROUTER_API_KEY="your-key" # For presets 1-6
MISTRAL_API_KEY="your-key" # For presets 7-8
OPENAI_API_KEY="your-key" # For presets 9-10
GEMINI_API_KEY="your-key" # For presets 11-14
# Provider/Model selection (default: 1)
TRANSCRIPTION_PRESET="1"
# Optional: Override default cleanup prompt
DEFAULT_TRANSCRIPTION_PROMPT="Your custom cleanup instructions"
# Optional: For email formatting
USER_NAME="Your Name"
USER_SIGNATURE="Your Title | Company"
# Optional: Transport for remote/MetaMCP
MCP_TRANSPORT="sse"
MCP_PORT="3000"Claude Code Configuration
Add to ~/.claude/settings.json:
{
"mcpServers": {
"cloud-asr": {
"command": "npx",
"args": ["-y", "cloud-asr-mcp"],
"env": {
"OPENROUTER_API_KEY": "your-key",
"TRANSCRIPTION_PRESET": "1"
}
}
}
}Usage Examples
# Basic clean transcription
Use transcribe_basic_clean on /path/to/audio.mp3
# Verbatim transcription
Use transcribe_raw on /path/to/meeting.wav
# Format as email
Use transcribe_format with format "email" on /path/to/voice-note.mp3
# Extract to-do items
Use transcribe_format with format "todo" on /path/to/tasks.mp3
# Custom formatting
Use transcribe_user_prompt with system_prompt "Format as a LinkedIn post" on /path/to/audio.mp3
# Save to file
Transcribe /path/to/audio.mp3 using transcribe_format with format "blog" and save to /home/user/posts/Supported Audio Formats
MP3, WAV, OGG, FLAC, AAC, AIFF, M4A, WEBM, MPEG
Requirements
- Node.js >= 18.0.0
- ffmpeg (for audio downsampling of large files)
License
MIT
