npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@madeinoz67/voice-server

v0.1.11

Published

PAI voice server using MLX-audio Kokoro TTS

Readme

voice-server

Voice Server Header

A local-first Text-to-Speech (TTS) voice server for PAI (Personal AI Infrastructure). Uses MLX-audio with Kokoro-82M model and an ElevenLabs-compatible API for PAI agent voices. Zero API costs, rate limits, or external network dependencies.

Features

  • Local TTS - All audio generation happens on your machine
  • Cost-Free - No per-character or per-minute charges
  • Private - No data sent to external services
  • 41 Built-in Voices - Numeric voice IDs for easy configuration
  • Fast Streaming - Smooth real-time audio playback (RTF ~1.0x)
  • Multi-language - English, British, Japanese, Chinese voices
  • macOS Integration - Native notifications and audio playback

Requirements

Platform

  • macOS 13+ (Ventura or later) - Required for native afplay audio
  • Apple Silicon (M1/M2/M3/M4) - Required for MLX-audio backend

Required Tools

| Tool | Version | Purpose | Install | |------|---------|---------|--------| | Bun | >= 1.0 | TypeScript runtime | curl -fsSL https://bun.sh/install \| bash | | pipx | >= 1.0 | Python package installer | brew install pipx | | ffmpeg | any | Audio conversion | brew install ffmpeg |

TTS Backend

| Backend | Requirements | Use Case | |---------|--------------|----------| | MLX-audio | Apple Silicon only | Fast local TTS, 41 built-in voices |

Quick Start

Option 1: Install via Homebrew (Recommended)

# Tap the repository
brew tap madeinoz67/tap

# Install the voice server
brew install madeinoz67/tap/voice-server

# Install MLX-audio backend
pipx install mlx-audio

# Start as a service
brew services start voice-server

# Or run directly
voice-server

Note: After Homebrew installation, the MLX-audio backend needs to be installed separately:

pipx install mlx-audio

Option 2: Install via npm/bun

# Install globally using bun
bun install -g @madeinoz67/voice-server

# Or install globally using npm
npm install -g @madeinoz67/voice-server

# Install MLX-audio backend
pipx install mlx-audio

# Start the server
bunx @madeinoz67/voice-server

# Or specify port
PORT=8888 bunx @madeinoz67/voice-server

Note: After npm/bun installation, the MLX-audio backend needs to be installed separately:

pipx install mlx-audio

Option 3: Install from Source

1. Install Prerequisites

# Install Bun (if not already installed)
curl -fsSL https://bun.sh/install | bash

# Install ffmpeg (for audio conversion)
brew install ffmpeg

# Install pipx for MLX-audio (if not already installed)
brew install pipx

2. Install Project Dependencies

# Clone and navigate to project
cd voice-server

# Install TypeScript dependencies
bun install

# Install MLX-audio for Kokoro TTS backend
pipx install mlx-audio

3. Run the Server

# Production mode
PORT=8888 bun run dev

# Development mode (uses port 8889 to avoid conflicts)
NODE_ENV=development PORT=8889 bun run dev

4. Test the Server

# Health check
curl http://localhost:8888/health

# Test TTS notification
curl -X POST http://localhost:8888/notify \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello from Kokoro TTS!", "voice_id": "1"}'

Voice Configuration

The server uses numeric voice IDs (1-41) for easy configuration:

# Test voice ID 1 (warm, friendly)
curl -X POST http://localhost:8888/notify \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "voice_id": "1"}'

# Test voice ID 12 (professional male)
curl -X POST http://localhost:8888/notify \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "voice_id": "12"}'

# Test voice ID 21 (sophisticated British)
curl -X POST http://localhost:8888/notify \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "voice_id": "21"}'

See docs/VOICE_GUIDE.md for complete voice documentation.

API Endpoints

POST /notify

Send a notification with text-to-speech.

curl -X POST http://localhost:8888/notify \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Hello",
    "message": "This is a test notification",
    "voice_id": "1",
    "volume": 1.0
  }'

Request: | Field | Type | Required | Description | |-------|------|----------|-------------| | title | string | No | Notification title (default: "Notification") | | message | string | Yes | Text to speak | | voice_id | string | No | Numeric voice ID 1-41 (default: "1") | | voice_settings | object | No | Voice configuration | | volume | number | No | Volume 0.0-1.0 (default: 1.0) | | voice_enabled | boolean | No | Enable TTS (default: true) |

Response:

{
  "status": "success",
  "message": "Notification sent"
}

POST /pai

PAI-specific notification endpoint with default voice settings.

curl -X POST http://localhost:8888/pai \
  -H "Content-Type: application/json" \
  -d '{
    "title": "PAI Alert",
    "message": "Task completed successfully"
  }'

GET /health

Health check endpoint with server status.

curl http://localhost:8888/health

Response:

{
  "status": "healthy",
  "port": 8888,
  "voice_system": "Kokoro-82M",
  "default_voice_id": "1",
  "model_loaded": true,
  "available_voices": ["1", "2", "3", "..."]
}

Configuration

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | PORT | 8888 | Server port | | DEFAULT_VOICE_ID | 1 | Default voice ID (1-41 for Kokoro) | | MLX_MODEL | mlx-community/Kokoro-82M-bf16 | MLX model to use | | MLX_STREAMING_INTERVAL | 0.3 | Streaming chunk size (seconds) | | ENABLE_MACOS_NOTIFICATIONS | true | Enable macOS notifications |

Voice Configuration

Voices are configured in docs/agent-voices.md:

## marrvin

**Description:** Default voice for DAIV

**Prosody:** speak with moderate consistency, speak in a neutral tone, speak at a normal pace

**Speed:** 1.0

Pronunciation Rules

Add custom pronunciations in ~/.claude/pronunciations.json:

{
  "DAIV": "DAY-vee",
  "PAI": "PIE",
  "Kokoro": "ko-KO-ro"
}

Project Structure

.
├── src/
│   └── ts/                    # TypeScript main server
│       ├── models/            # Type definitions
│       ├── services/          # Business logic
│       ├── utils/             # Utilities
│       ├── middleware/        # HTTP middleware
│       └── server.ts          # Main server
├── tests/
│   └── ts/                    # TypeScript tests
├── scripts/                   # Utility scripts
│   └── generate-reference.ts  # ElevenLabs reference generator
├── docs/
│   ├── VOICE_GUIDE.md         # User voice configuration guide
│   ├── VOICE_QUICK_REF.md     # Quick reference for all 41 voices
│   ├── KOKORO_VOICES.md       # Technical voice documentation
│   └── MIGRATION.md           # ElevenLabs migration guide
├── specs/                     # Feature specifications
└── AGENTPERSONALITIES.md      # Voice configurations

Development

# Install all dependencies
bun install              # TypeScript/Bun dependencies
pipx install mlx-audio  # MLX-audio backend

# Run development server
# Production: PORT=8888
# Development (to avoid conflict): NODE_ENV=development PORT=8889
PORT=8888 bun run dev

# Run tests
bun test

# Type checking
bun run typecheck

# Linting
bun run lint

# Build for production
bun run build

Available Voices

The server includes 41 built-in Kokoro voices accessible via numeric IDs:

Popular Voices

| ID | Voice | Description | |----|-------|-------------| | 1 | af_heart | Warm, friendly (default) | | 4 | af_sky | Bright, energetic | | 12 | am_michael | Professional male | | 13 | am_adam | Youthful, energetic | | 21 | bf_emma | Sophisticated British |

Quick Reference

| Category | IDs | Examples | |----------|-----|----------| | American Female | 1-11 | af_heart, af_sky, af_bella | | American Male | 12-20 | am_michael, am_adam, am_eric | | British Female | 21-24 | bf_emma, bf_isabella | | British Male | 25-28 | bm_george, bm_lewis | | Japanese | 29-33 | jf_alpha, jm_kumo | | Chinese | 34-41 | zf_xiaoxiao, zm_yunjian |

See docs/VOICE_QUICK_REF.md for complete voice listings.

Scripts

Generate Reference Voice

Generate a reference audio file using ElevenLabs:

ELEVENLABS_API_KEY=your-key bun scripts/generate-reference.ts <voice_id>

Creates ~/.claude/voices/<voice_id>.reference.wav.

TTS Backend

MLX-audio

Fast local TTS optimized for Apple Silicon using the Kokoro-82M model.

# Install MLX-audio
pipx install mlx-audio

# Run server (production port 8888)
PORT=8888 bun run dev

# Development mode (port 8889 to avoid conflicts)
NODE_ENV=development PORT=8889 bun run dev

Features:

  • 41 built-in voices
  • Ultra-fast streaming on Apple Silicon (~1.0x RTF)
  • Supports English, British, Japanese, Chinese

Model Testing

Kokoro (Current)

The Kokoro-82M model via MLX-audio delivers near real-time performance on Apple Silicon:

  • Real-Time Factor (RTF): ~1.0x (audio generates as fast as it plays)
  • Low latency: < 500ms to first audio chunk
  • Smooth streaming with no buffering delays
  • Good voice quality for most use cases

qwen-tts (Alternative MLX-audio Model)

qwen-tts is a supported MLX-audio model that was tested alongside Kokoro. It exhibited the following issues:

  • Delay: Significant latency before audio playback (6-15 seconds)
  • Quality: Inconsistent audio quality with stuttering, artifacts, and streaming issues
  • Streaming: Intermittent playback and buffering problems

qwen-tts is self-contained like Kokoro (no external API dependencies). Despite these issues, qwen-tts models offer greater flexibility for custom voice cloning and fine-tuning. We remain interested in exploring qwen-tts for specialized use cases.

Community Contributions

We welcome additional model testing results and comparisons from the community. If you have experience with:

  • Other TTS backends on Apple Silicon
  • qwen-tts improvements or configurations
  • Alternative local TTS solutions

Please share your findings via issues or pull requests.

Architecture

The server uses a modular TypeScript architecture:

  1. TypeScript Main Server (Bun)

    • HTTP API endpoints (/notify, /pai, /health)
    • Request routing and validation
    • Voice configuration management
    • macOS notification integration
  2. MLX-audio Backend

    • Direct CLI integration for Apple Silicon
    • Kokoro-82M model for high-quality TTS
    • 41 built-in voices across multiple languages

Migration from ElevenLabs

See docs/MIGRATION.md for detailed migration instructions.

TL;DR: No code changes required - the API is drop-in compatible.

Troubleshooting

Server won't start

# Check if port is in use
lsof -i :8888

# Kill existing process
pkill -f "bun run src/ts/server.ts"

# Check Bun is installed
bun --version

# Verify dependencies
bun install

MLX-audio backend issues

# Check MLX-audio is installed
pipx list

# Reinstall MLX-audio if needed
pipx uninstall mlx-audio
pipx install mlx-audio

# Verify MLX-audio works directly
mlx-audio --help

# Check Apple Silicon compatibility
uname -m  # Should show arm64

Audio not playing

# Test afplay directly
afplay /System/Library/Sounds/Ping.aiff

# Check system volume
osascript -e 'get volume settings'

# Verify ffmpeg is installed
ffmpeg -version

License

MIT

Attribution

This server uses MLX-audio with the Kokoro-82M model for high-quality text-to-speech.