npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

voicesmith-mcp

v1.0.18

Published

Local AI voice for coding assistants — TTS & STT via MCP. Kokoro ONNX + faster-whisper, fully offline.

Readme

VoiceSmith MCP

Local AI voice for coding assistants. Gives your AI a real voice (text-to-speech) and ears (speech-to-text) via the Model Context Protocol (MCP). Fully offline — no cloud APIs, no data leaves your machine.

What You Get

  • 54 distinct voices via Kokoro ONNX (local TTS, ~300MB model)
  • Speech-to-text via faster-whisper (local STT, ~150MB model)
  • Voice activity detection via Silero VAD (local, 2MB)
  • Multi-session support — run multiple Claude Code sessions, each with its own voice (single session for Cursor/Codex)
  • Works with Claude Code, Cursor, and Codex

Quick Start

npx voicesmith-mcp install

The installer will:

  1. Check system dependencies (Python 3.11+, espeak-ng, mpv)
  2. Set up a Python virtual environment with all packages
  3. Download TTS and STT models
  4. Configure your IDE's MCP settings
  5. Let you pick a voice
  6. Inject voice behavior rules so the AI knows how to speak

Restart your IDE session after installing. The AI will greet you by voice on the first response.

Usage

[!NOTE] Everything works out of the box. After installing, just start a session — the AI speaks automatically. No configuration needed. The installer sets up voice behavior rules that teach the AI when and how to use its voice.

What the AI does automatically:

| Moment | What happens | |--------|-------------| | You give it a task | Gets to work (speaks only when clarifying approach) | | It finishes work | Speaks a summary of what was done | | It has a question | Asks out loud, then listens for your voice response | | Voice tools unavailable | Falls back to text silently |


Changing Voices Mid-Session

Ask the AI to switch voices at any time:

"Switch to Nova"

If the voice is available, the AI switches immediately. If it's occupied by another session, the AI will tell you and show available alternatives.

Browse all 54 voices:

"Show me the available voices"

Or preview them in a terminal: npx voicesmith-mcp voices


Voice Persistence

[!TIP] When you switch voices, the choice is saved automatically. Next time you start or resume a session, the AI uses the same voice — no need to switch again.


Muting

In a meeting or shared space? Just ask:

"Mute the voice"

The AI continues working normally — it just won't play audio. Say "unmute" when you're ready.

Alternative Install

If you don't have Node.js or prefer a shell script:

git clone https://github.com/shshalom/voicesmith-mcp.git
cd voicesmith-mcp
./install.sh

Supports the same flags: --claude, --cursor, --codex, --all.

MCP Tools

Once installed, your AI assistant has access to these tools:

| Tool | Description | |------|-------------| | speak | Synthesize and play speech for a named agent | | listen | Open the mic, record speech, return transcribed text | | speak_then_listen | Speak a question, then immediately listen for the answer | | set_voice | Change the voice for an agent name | | get_voice_registry | See which voices are assigned and available | | list_voices | Browse all 54 Kokoro voices | | mute / unmute | Silence or resume voice output | | stop | Stop playback or cancel an active recording | | status | Server health and session info |

How It Works

The MCP server runs as a local process alongside your IDE. It communicates over stdio (the MCP protocol). All processing happens on your machine:

  • TTS: Kokoro ONNX — fast neural TTS, 54 voices, no GPU needed
  • STT: faster-whisper — OpenAI Whisper running locally via CTranslate2
  • VAD: Silero VAD — voice activity detection for clean recordings
  • Audio: mpv for playback; CoreAudio via native app bundle on macOS (sounddevice fallback on Linux)
  • Media ducking: Auto-pauses Apple Music, Spotify, and browser audio during speech (macOS)

Multi-Session

Claude Code: Full multi-session support. Multiple Claude Code sessions can run simultaneously, each with its own voice. Session identity is tracked via Claude's session_id — resuming a session reclaims the same voice, and multiple terminals sharing the same session share the same voice. Orphaned servers are detected and cleaned up automatically.

Cursor / Codex: Single session only. Cursor runs one MCP server per config (shared across tabs), and Codex has no multi-session hooks. Voice works normally — just no multi-session coordination.

Cross-session audio is serialized via flock to prevent overlapping playback.

Configuration

Config lives at ~/.local/share/voicesmith-mcp/config.json. Key settings:

{
  "main_agent": "Eric",
  "tts": {
    "default_voice": "am_eric",
    "audio_player": "mpv",
    "duck_media": true
  },
  "stt": {
    "model_size": "base",
    "language": "en",
    "vad_threshold": 0.3,
    "nudge_on_timeout": false
  }
}

| Setting | Description | Default | |---------|-------------|---------| | tts.duck_media | Auto-pause music/browser audio during speech (macOS) | true | | stt.nudge_on_timeout | Speak "I didn't catch that" when listen times out | false | | stt.vad_threshold | Voice detection sensitivity (lower = more sensitive) | 0.3 |

Re-run npx voicesmith-mcp install to change your voice or update settings. Existing configuration is preserved — only new defaults are added.

Requirements

  • Python 3.11+ (3.11 or 3.12 recommended)
  • macOS (primary platform) or Linux (partial support)
  • espeak-ng — phoneme backend for Kokoro
  • mpv — audio playback
  • ~500MB disk space for models

[!WARNING] Windows is not supported yet. The server uses Unix-specific features (file locking, audio commands, process detection). Windows support is planned — see TODO for details.

Supported IDEs

| IDE | Config Location | Rules Location | Multi-Session | |-----|----------------|----------------|---------------| | Claude Code | ~/.claude.json | ~/.claude/CLAUDE.md | Yes (via session_id) | | Cursor | ~/.cursor/mcp.json | ~/.cursor/rules/voicesmith.mdc | No (single server) | | Codex | ~/.codex/mcp.json | ~/.codex/AGENTS.md | No (single session) |

Troubleshooting

The AI can't hear me (listen returns empty or times out)

Check microphone permissions. On macOS, VoiceSmith uses a native app bundle (VoiceSmithMCP.app) for mic access. The first time it records, macOS should show a permission dialog for the app. If it didn't:

  1. Open System Settings > Privacy & Security > Microphone
  2. Look for VoiceSmithMCP and make sure it's enabled
  3. If it's not listed, the LaunchAgent may not be running — try reinstalling: npx voicesmith-mcp install

[!IMPORTANT] If the server detects silent audio (all zeros for ~320ms), it returns an error pointing you to the microphone permission settings. This usually means macOS TCC denied mic access.

Check your audio input device. If an external mic is selected but not connected, the server opens it but gets silence:

  • Open System Settings > Sound > Input and verify the correct mic is selected
  • Or ask the AI: "What's the server status?" — check that stt.loaded and vad.loaded are both true

Another app is using the mic. Apps like Zoom, Teams, or FaceTime can hold exclusive mic access. Close them and try again.

Voice too quiet for VAD. The voice activity detector might not pick up soft speech. You can lower the sensitivity threshold in ~/.local/share/voicesmith-mcp/config.json:

{
  "stt": {
    "vad_threshold": 0.2
  }
}

Lower values = more sensitive. Default is 0.3. Restart the session after changing.

The AI doesn't speak

  • Check that espeak-ng and mpv are installed: which espeak-ng mpv
  • Check the AI's status: ask "What's your voice status?"
  • If muted, say "Unmute"

The AI speaks with the wrong voice

This can happen when another session is holding your preferred voice name. Ask the AI: "Switch to Eric" — it will either switch or tell you what's available.

Uninstall

npx voicesmith-mcp uninstall

Removes all files, models, MCP config entries, and voice rules cleanly.

License

Apache 2.0