pi-voice

v0.4.0

Published

7 days ago

Voice interface for pi coding agent

0High
0Medium
0Low

pi-voice

Headless voice interface for the Pi Coding Agent. Hold a key, speak, and pi executes your instructions with voice feedback.

Demo using ElevenLabs provider (make sure unmuted)

https://github.com/user-attachments/assets/76adb941-83cf-4394-b8d2-f6d73a1df8bc

Installation

npm i -g pi-voice
# or
bun i -g pi-voice

Usage

pi-voice is a daemon-style application that runs in the background once started. You can push-to-talk with the agent.

pi-voice start    # start the daemon in the background
pi-voice status   # show state, PID, and uptime
pi-voice stop     # stop the daemon

The push-to-talk trigger defaults to Cmd+Shift+I (macOS) / Win+Shift+I (Windows). Hold the key to record, release to send.

Setting

pi agent configuration

pi-voice launches a Pi agent session with the directory where pi-voice start was executed. This means all standard pi configuration works as-is:

AGENTS.md — walked up from cwd to the filesystem root
.pi/settings.json — project-level settings
.pi/skills/, .pi/extensions/, .pi/prompts/ — project-level resources
~/.pi/agent/ — global settings, skills, extensions, prompts, and models
and more

Refer to the Pi documentation for details on these settings.

pi-voice configuration

You can configure pi-voice in .pi/pi-voice.json:

{
  "key": "ctrl+t",
  "provider": "local"
}

| Key | Description | | --- | --- | | key | Push-to-talk shortcut. Combine modifiers (ctrl, shift, alt/opt, meta/cmd) and a main key with +. Examples: "ctrl+t", "alt+space", "ctrl+shift+r". Default: "meta+shift+i". | | provider | Speech provider for STT & TTS. "local", "gemini" (Vertex AI or Gemini API), "openai", or "elevenlabs". Default: "local". |

Environment variables

| Provider | Required variables | | --- | --- | | local | None (model is auto-downloaded on first launch). Optional: WHISPER_MODEL_PATH (custom model path), WHISPER_MODEL (model name, default medium-q5_0), SAY_VOICE (macOS say voice name, e.g. "Kyoko"). | | gemini | Vertex AI: GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION (optional, default us-central1). Gemini API: GEMINI_API_KEY or GOOGLE_API_KEY. If GOOGLE_CLOUD_PROJECT is set, Vertex AI is used; set GOOGLE_GENAI_USE_VERTEXAI=false to force API key mode. | | openai | OPENAI_API_KEY | | elevenlabs | ELEVENLABS_API_KEY. Optional: ELEVENLABS_VOICE_ID (TTS voice, default CwhRBWXzGAHq8TQ4Fs17), ELEVENLABS_TTS_MODEL (default eleven_flash_v2_5). |

Logging

The daemon writes structured JSON logs to both the console and a log file. The default log file path is $XDG_CONFIG_HOME/pi-voice/daemon.log (falls back to ~/.config/pi-voice/daemon.log).

To override the log file path:

export PI_VOICE_LOG_PATH=/path/to/custom.log

Whisper model (local provider)

The local provider uses Whisper for STT and the macOS say command for TTS. On first launch, a ggml-format Whisper model (medium-q5_0, ~514 MB) is automatically downloaded to ~/.pi-agent/whisper/ and cached for subsequent runs.

To use a different model, set WHISPER_MODEL:

export WHISPER_MODEL=base     # smaller & faster

Or point to your own model file directly:

export WHISPER_MODEL_PATH=/path/to/ggml-custom.bin

Contributing

See CONTRIBUTING.md for development setup, build commands, and release workflow.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-voice

Demo using ElevenLabs provider (make sure unmuted)

Installation

Usage

Setting

pi agent configuration

pi-voice configuration

Environment variables

Logging

Whisper model (local provider)

Contributing