mcp-mia-narrative
v1.1.2
Published
MCP server for narrative audio generation - enabling LLMs to create immersive audio experiences
Maintainers
Readme
🎙️ MCP Mia-Narrative
Model Context Protocol server for immersive audio narrative generation.
Enable any LLM to create audio companions for their responses, transforming terminal interactions into multimodal experiences where users can close their eyes and be guided through conversation.
What This Enables
When an LLM uses this MCP server, it can:
- 🎭 Generate audio narrations with personality-rich voices
- 🌊 Create immersive summaries of conversation moments
- 💭 Provide contemplative audio checkpoints during long sessions
- 🔊 Transform text responses into intimate audio experiences
- 🎯 Allow users to engage with AI through both text and voice
Prerequisites
This MCP server requires the mia-narrative CLI to be installed and configured:
cd cli/
npm install
npm run build
npm link
npm run setup # Downloads voice models (~380MB)Also requires:
- Node.js v18+
- FFmpeg for audio processing
- mpg123 for audio playback (or afplay on macOS)
Installation
cd mcp-mia-narrative/
npm install
npm run buildConfiguration
Add to your MCP settings (e.g., Claude Desktop config):
{
"mcpServers": {
"mia-narrative": {
"command": "node",
"args": ["/absolute/path/to/mcp-mia-narrative/dist/index.js"]
}
}
}Or use with any MCP client:
node dist/index.jsResources
The server provides three key resources:
1. narrative://voices
Complete catalog of available voices with personality profiles:
- Mia (professional, technical)
- Miette (conversational, warm)
- Seraphine (dramatic, expressive)
- Jeremy (authoritative male)
- Atlas (casual male)
- ResoNova (experimental)
- Zephyr (contemplative)
- Echo (playful)
2. narrative://guide
Comprehensive guide for creating effective audio narratives, including:
- Voice selection strategies
- Audio parameter tuning
- Best practices for different content types
- Example use cases
3. narrative://best-practices
LLM-specific guidelines for when and how to generate audio companions:
- Timing considerations
- Content crafting techniques
- Integration patterns
- The multimodal innovation
Tools
generate_audio
Core tool for converting text to speech with full control.
Parameters:
textorfile: Content to narratevoiceId: Which voice to use (default: mia)engine: piper, system, or elevenlabs (default: piper)speed: Speech rate 0.5-2.0 (default: 1.0)pitch: Pitch adjustment 0.5-2.0 (default: 1.0)reverb: Reverb effect 0-1.0 (default: 0.2)autoplay: Auto-play after generation (default: true)
Example:
{
text: "We've explored the concept of structural tension...",
voiceId: "miette",
speed: 0.9,
reverb: 0.3,
autoplay: true
}generate_contextual_audio
High-level tool for creating conversation companions.
Parameters:
conversationContext: What just happened in the conversationvoiceId: Voice for narration (default: miette)tone: intimate, professional, dramatic, or contemplative (default: intimate)autoplay: Auto-play (default: true)
Example:
{
conversationContext: "We just explored how MCPs enable multimodal AI interactions. The key insight was that audio companions transform terminal sessions into immersive experiences where users can close their eyes and absorb ideas differently.",
voiceId: "zephyr",
tone: "contemplative",
autoplay: true
}list_voices
Get all available voices with descriptions.
read_file_aloud
Read any text file with a specified voice.
Parameters:
filepath: Path to text filevoiceId: Voice to use (default: mia)speed: Reading speed (default: 0.95)
Prompts
create-audio-companion
Helps LLMs craft effective audio companions for their responses.
Arguments:
context: What was just discussedvoice: Preferred voice
Usage Pattern:
- LLM completes text response
- LLM uses this prompt to craft audio companion text
- LLM calls
generate_contextual_audiowith the crafted text
narrative-checkpoint
Creates reflective audio checkpoints during conversations.
Arguments:
journey_summary: Summary of conversation progress
The Innovation: Dual-Channel Communication
This MCP enables a new mode of human-AI interaction:
Text Channel (Primary)
- Detailed, scannable, reference-able
- Code, links, structured data
- Quick back-and-forth
Audio Channel (Companion)
- Immersive, emotional, experiential
- Synthesis and reflection
- Intimate connection
Users can:
- Read detailed responses when focused
- Listen to audio companions when eyes-closed ideating
- Experience both modalities based on their state and needs
Example LLM Integration
An LLM with this MCP might work like this:
1. User asks about a complex topic
2. LLM provides detailed text response
3. LLM identifies key narrative thread
4. LLM calls generate_contextual_audio with:
- Distilled essence of the discussion
- Reference to user's journey
- Warm, conversational synthesis
5. User hears audio render and play
6. User can close eyes and absorb the momentVoice Selection Guide
For Technical Content: Mia, Jeremy
For Conversation: Miette, Atlas
For Stories: Seraphine, Echo
For Reflection: Zephyr, ResoNova
For Drama: Seraphine, Echo
Development
npm run dev # Run with tsx
npm run build # Compile TypeScript
npm run watch # Watch modeTroubleshooting
"mia-narrative: command not found"
- Ensure the CLI is built and linked:
cd cli && npm run build && npm link
"Audio generated but could not autoplay"
- Install mpg123:
brew install mpg123(macOS) orapt-get install mpg123(Linux)
"Voice models not found"
- Run setup:
cd cli && npm run setup
Use Cases
- Immersive Learning: Audio summaries help visual learners absorb complex topics
- Eyes-Free Ideation: Users can close eyes during creative brainstorming
- Ambient Guidance: Audio companions during long coding or writing sessions
- Conversation Milestones: Reflective checkpoints in extended dialogues
- Accessibility: Alternative modality for consuming AI responses
License
MIT
