mcp-mia-narrative

v1.1.2

Published

2 months ago

MCP server for narrative audio generation - enabling LLMs to create immersive audio experiences

Downloads

0High
0Medium
0Low

jgi

mcp tts text-to-speech narrative audio piper multimodal

🎙️ MCP Mia-Narrative

Model Context Protocol server for immersive audio narrative generation.

Enable any LLM to create audio companions for their responses, transforming terminal interactions into multimodal experiences where users can close their eyes and be guided through conversation.

What This Enables

When an LLM uses this MCP server, it can:

🎭 Generate audio narrations with personality-rich voices
🌊 Create immersive summaries of conversation moments
💭 Provide contemplative audio checkpoints during long sessions
🔊 Transform text responses into intimate audio experiences
🎯 Allow users to engage with AI through both text and voice

Prerequisites

This MCP server requires the mia-narrative CLI to be installed and configured:

cd cli/
npm install
npm run build
npm link
npm run setup  # Downloads voice models (~380MB)

Also requires:

Node.js v18+
FFmpeg for audio processing
mpg123 for audio playback (or afplay on macOS)

Installation

cd mcp-mia-narrative/
npm install
npm run build

Configuration

Add to your MCP settings (e.g., Claude Desktop config):

{
  "mcpServers": {
    "mia-narrative": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-mia-narrative/dist/index.js"]
    }
  }
}

Or use with any MCP client:

node dist/index.js

Resources

The server provides three key resources:

1. `narrative://voices`

Complete catalog of available voices with personality profiles:

Mia (professional, technical)
Miette (conversational, warm)
Seraphine (dramatic, expressive)
Jeremy (authoritative male)
Atlas (casual male)
ResoNova (experimental)
Zephyr (contemplative)
Echo (playful)

2. `narrative://guide`

Comprehensive guide for creating effective audio narratives, including:

Voice selection strategies
Audio parameter tuning
Best practices for different content types
Example use cases

3. `narrative://best-practices`

LLM-specific guidelines for when and how to generate audio companions:

Timing considerations
Content crafting techniques
Integration patterns
The multimodal innovation

Tools

`generate_audio`

Core tool for converting text to speech with full control.

Parameters:

text or file: Content to narrate
voiceId: Which voice to use (default: mia)
engine: piper, system, or elevenlabs (default: piper)
speed: Speech rate 0.5-2.0 (default: 1.0)
pitch: Pitch adjustment 0.5-2.0 (default: 1.0)
reverb: Reverb effect 0-1.0 (default: 0.2)
autoplay: Auto-play after generation (default: true)

Example:

{
  text: "We've explored the concept of structural tension...",
  voiceId: "miette",
  speed: 0.9,
  reverb: 0.3,
  autoplay: true
}

`generate_contextual_audio`

High-level tool for creating conversation companions.

Parameters:

conversationContext: What just happened in the conversation
voiceId: Voice for narration (default: miette)
tone: intimate, professional, dramatic, or contemplative (default: intimate)
autoplay: Auto-play (default: true)

Example:

{
  conversationContext: "We just explored how MCPs enable multimodal AI interactions. The key insight was that audio companions transform terminal sessions into immersive experiences where users can close their eyes and absorb ideas differently.",
  voiceId: "zephyr",
  tone: "contemplative",
  autoplay: true
}

`list_voices`

Get all available voices with descriptions.

`read_file_aloud`

Read any text file with a specified voice.

Parameters:

filepath: Path to text file
voiceId: Voice to use (default: mia)
speed: Reading speed (default: 0.95)

Prompts

`create-audio-companion`

Helps LLMs craft effective audio companions for their responses.

Arguments:

context: What was just discussed
voice: Preferred voice

Usage Pattern:

LLM completes text response
LLM uses this prompt to craft audio companion text
LLM calls generate_contextual_audio with the crafted text

`narrative-checkpoint`

Creates reflective audio checkpoints during conversations.

Arguments:

journey_summary: Summary of conversation progress

The Innovation: Dual-Channel Communication

This MCP enables a new mode of human-AI interaction:

Text Channel (Primary)

Detailed, scannable, reference-able
Code, links, structured data
Quick back-and-forth

Audio Channel (Companion)

Immersive, emotional, experiential
Synthesis and reflection
Intimate connection

Users can:

Read detailed responses when focused
Listen to audio companions when eyes-closed ideating
Experience both modalities based on their state and needs

Example LLM Integration

An LLM with this MCP might work like this:

1. User asks about a complex topic
2. LLM provides detailed text response
3. LLM identifies key narrative thread
4. LLM calls generate_contextual_audio with:
   - Distilled essence of the discussion
   - Reference to user's journey
   - Warm, conversational synthesis
5. User hears audio render and play
6. User can close eyes and absorb the moment

Voice Selection Guide

For Technical Content: Mia, Jeremy For Conversation: Miette, Atlas
For Stories: Seraphine, Echo For Reflection: Zephyr, ResoNova For Drama: Seraphine, Echo

Development

npm run dev      # Run with tsx
npm run build    # Compile TypeScript
npm run watch    # Watch mode

Troubleshooting

"mia-narrative: command not found"

Ensure the CLI is built and linked: cd cli && npm run build && npm link

"Audio generated but could not autoplay"

Install mpg123: brew install mpg123 (macOS) or apt-get install mpg123 (Linux)

"Voice models not found"

Run setup: cd cli && npm run setup

Use Cases

Immersive Learning: Audio summaries help visual learners absorb complex topics
Eyes-Free Ideation: Users can close eyes during creative brainstorming
Ambient Guidance: Audio companions during long coding or writing sessions
Conversation Milestones: Reflective checkpoints in extended dialogues
Accessibility: Alternative modality for consuming AI responses

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🎙️ MCP Mia-Narrative

What This Enables

Prerequisites

Installation

Configuration

Resources

1. narrative://voices

2. narrative://guide

3. narrative://best-practices

Tools

generate_audio

generate_contextual_audio

list_voices

read_file_aloud

Prompts

create-audio-companion

narrative-checkpoint

The Innovation: Dual-Channel Communication

Example LLM Integration

Voice Selection Guide

Development

Troubleshooting

Use Cases

License

1. `narrative://voices`

2. `narrative://guide`

3. `narrative://best-practices`

`generate_audio`

`generate_contextual_audio`

`list_voices`

`read_file_aloud`

`create-audio-companion`

`narrative-checkpoint`