@modelcontextprotocol/server-transcript

v1.0.1

Published

6 days ago

MCP App Server for live speech transcription

0High
0Medium
0Low

Transcript Server

Screenshot

An MCP App Server for live speech transcription using the Web Speech API.

MCP Client Configuration

Add to your MCP client configuration (stdio transport):

{
  "mcpServers": {
    "transcript": {
      "command": "npx",
      "args": [
        "-y",
        "--silent",
        "--registry=https://registry.npmjs.org/",
        "@modelcontextprotocol/server-transcript",
        "--stdio"
      ]
    }
  }
}

Local Development

To test local modifications, use this configuration (replace ~/code/ext-apps with your clone path):

{
  "mcpServers": {
    "transcript": {
      "command": "bash",
      "args": [
        "-c",
        "cd ~/code/ext-apps/examples/transcript-server && npm run build >&2 && node dist/index.js --stdio"
      ]
    }
  }
}

Features

Live Transcription: Real-time speech-to-text using browser's Web Speech API
Transitional Model Context: Streams interim transcriptions to the model via ui/update-model-context, allowing the model to see what the user is saying as they speak
Audio Level Indicator: Visual feedback showing microphone input levels
Send to Host: Button to send completed transcriptions as a ui/message to the MCP host
Start/Stop Control: Toggle listening on and off
Clear Transcript: Reset the transcript area

Setup

Prerequisites

Node.js 18+
Chrome, Edge, or Safari (Web Speech API support)

Installation

npm install

Running

# Development mode (with hot reload)
npm run dev

# Production build and serve
npm run start

Usage

The server exposes a single tool:

`transcribe`

Opens a live speech transcription interface.

Parameters: None

Example:

{
  "name": "transcribe",
  "arguments": {}
}

How It Works

Click Start to begin listening
Speak into your microphone
Watch your speech appear as text in real-time (interim text is streamed to model context via ui/update-model-context)
Click Send to send the transcript as a ui/message to the host (clears the model context)
Click Clear to reset the transcript

Architecture

transcript-server/
├── server.ts          # MCP server with transcribe tool
├── server-utils.ts    # HTTP transport utilities
├── mcp-app.html       # Transcript UI entry point
├── src/
│   ├── mcp-app.ts     # App logic, Web Speech API integration
│   ├── mcp-app.css    # Transcript UI styles
│   └── global.css     # Base styles
└── dist/              # Built output (single HTML file)

Notes

Microphone Permission: Requires allow="microphone" on the sandbox iframe (configured via permissions: { microphone: {} } in the resource _meta.ui)
Browser Support: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
Continuous Mode: Recognition automatically restarts when it ends, for seamless transcription

Future Enhancements

Language selection dropdown
Whisper-based offline transcription (see TRANSCRIPTION.md)
Export transcript to file
Timestamps toggle