autoglm-asr-mcp

v0.2.2

Published

5 months ago

MCP server for AutoGLM ASR speech-to-text transcription with long-audio chunking and context passing

0High
0Medium
0Low

mcp model-context-protocol asr speech-to-text transcription audio-transcription chinese-asr autoglm zhipu agent-tools llm-tools

AutoGLM ASR MCP Server

MCP server for high-quality speech-to-text transcription using Zhipu AutoGLM ASR.

CN: 一个面向 Agent 的语音转文字 MCP 服务，支持长音频分块、上下文传递和时间戳分段。

For AI-oriented setup details, see AI_SETUP_GUIDE.md.

For AI Agents (TL;DR)

Type: MCP Server
Domain: ASR / speech-to-text / transcription
Input: local audio file path
Output: full transcript text + timestamp segments
Best for: meeting notes, call analysis, subtitle draft, voice memo transcription
Supported audio formats: mp3, wav, m4a, flac, ogg, webm
Core tools: transcribe_audio, get_audio_info

What It Does

Transcribes short and long audio files with automatic chunking.
Uses context-aware modes to balance speed and quality.
Returns readable full text and segment-level timestamps.
Runs over stdio as an MCP server for coding assistants.

Tool Index

| Tool | Purpose | Required Args | Optional Args | Returns | |------|---------|---------------|---------------|---------| | transcribe_audio | Transcribe audio to text | audio_path | context_mode, max_concurrency | Full transcript and time-aligned segments | | get_audio_info | Inspect audio before transcription | audio_path | None | Duration, format, channels, sample rate, estimated chunks |

Features

Fast long-audio transcription with sliding-window concurrency.
Better accuracy through chunk-to-chunk context passing.
Automatic splitting for long inputs (API limit friendly).
Zero-install runtime with npx.
Works with common MCP clients.

Installation

Prerequisites

ffmpeg must be installed:

# macOS
brew install ffmpeg

# Ubuntu/Debian
apt install ffmpeg

# Windows
choco install ffmpeg

Get your API key from Zhipu AI Open Platform.

NPX (Recommended)

npx autoglm-asr-mcp

Quick Start

Add this MCP server to your client config and set AUTOGLM_ASR_API_KEY.

{
  "mcpServers": {
    "autoglm-asr": {
      "command": "npx",
      "args": ["-y", "autoglm-asr-mcp"],
      "env": {
        "AUTOGLM_ASR_API_KEY": "your-api-key"
      }
    }
  }
}

Compatibility

Claude Desktop / Claude Code
Cursor
Windsurf
VS Code MCP
Other MCP-compatible clients

VS Code quick install:

Tools

`transcribe_audio`

Transcribe an audio file into text with timing segments.

Arguments:

| Name | Type | Required | Description | |------|------|----------|-------------| | audio_path | string | Yes | Absolute path to the audio file | | context_mode | string | No | sliding (default), none (fastest), full_serial (best quality, slower) | | max_concurrency | integer | No | Max parallel requests, range 1-20, default 5 |

Returns:

Full transcription text
Timestamped segment list
Basic run stats (chunks, mode, elapsed time)

Common errors:

File not found or unreadable path
Unsupported format or broken audio stream
Missing/invalid API key

`get_audio_info`

Inspect an audio file before transcription.

Arguments:

| Name | Type | Required | Description | |------|------|----------|-------------| | audio_path | string | Yes | Absolute path to the audio file |

Returns:

Duration
Format
Sample rate
Channels
Estimated chunks

Context Modes

| Mode | Speed | Quality | Description | |------|-------|---------|-------------| | sliding | Fast | High | First chunk initializes context, later chunks run in parallel with context | | none | Fastest | Medium | Chunks run independently in parallel | | full_serial | Slow | Best | All chunks transcribed sequentially with full context chain |

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | AUTOGLM_ASR_API_KEY | required | Your Zhipu API key | | AUTOGLM_ASR_API_BASE | https://open.bigmodel.cn/api/paas/v4/audio/transcriptions | API endpoint | | AUTOGLM_ASR_MODEL | glm-asr-2512 | ASR model name | | AUTOGLM_ASR_MAX_CHUNK_DURATION | 25 | Max chunk duration (seconds) | | AUTOGLM_ASR_MAX_CONCURRENCY | 5 | Default concurrency | | AUTOGLM_ASR_CONTEXT_MAX_CHARS | 2000 | Max context size passed between chunks |

Use Cases

Meeting recording to editable transcript
Customer support call transcription
Podcast/video subtitle draft generation
Voice memo indexing and search

Limitations

Requires local file path input (not remote URL input).
Audio quality strongly affects transcription quality.
Very noisy or multi-speaker overlap can reduce accuracy.

Troubleshooting

ffmpeg not found: install ffmpeg and retry.
File not found: pass an absolute existing path.
API errors: verify AUTOGLM_ASR_API_KEY and account quota.

Keywords

mcp, model-context-protocol, asr, speech-to-text, transcription, autoglm, zhipu, chinese-asr, audio-transcription, meeting-transcript, subtitle-generation, voice-to-text, agent-tools, llm-tools, coding-agent

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

AutoGLM ASR MCP Server

For AI Agents (TL;DR)

What It Does

Tool Index

Features

Installation

Prerequisites

NPX (Recommended)

Quick Start

Compatibility

Tools

transcribe_audio

get_audio_info

Context Modes

Environment Variables

Use Cases

Limitations

Troubleshooting

Keywords

License

`transcribe_audio`

`get_audio_info`