pi-vox
v0.1.0
Published
Lightweight voice dictation for Pi: /voice-toggle records locally and transcribes with ElevenLabs.
Maintainers
Readme
pi-vox
Voice input for pi coding agent.
It records your microphone, sends the audio to ElevenLabs speech-to-text, and puts the transcript into the current pi input box.
ElevenLabs is the only speech provider right now. Their free tier is generous enough for normal testing and light use.
Install
From npm:
pi install npm:pi-voxOr from GitHub:
pi install https://github.com/denismrvoljak/pi-voxSetup
1. Add your ElevenLabs API key
Create an API key in ElevenLabs, then set it before starting pi:
export ELEVENLABS_API_KEY="your-key-here"You can also put it in a .env file in the directory where you launch pi:
ELEVENLABS_API_KEY=your-key-herepi-vox redacts this key from status messages and common error output.
2. Install a recorder
pi-vox needs a local command-line recorder. On macOS, install one of these:
brew install soxor:
brew install ffmpegIf you install sox, pi-vox can use rec or sox. If you install ffmpeg, it can use ffmpeg.
3. Reload pi
Inside pi:
/reloadThen check that everything is connected:
/voice-statusYou should see something like:
Voice input: version=..., provider=elevenlabs, key=configured, autoSubmit=off, cleanup=on, audio=rec/sox/ffmpegHow to use it
Start recording:
/voice-toggleSpeak your prompt.
Stop recording and insert the transcript:
/voice-toggleCancel the recording:
/voice-cancelThat's the main workflow.
Commands
/voice-toggle
Starts recording when idle. Stops recording when active, transcribes, and inserts the text into the editor.
/voice-cancel
Stops the current recording and deletes the temporary audio file.
/voice-status
Shows whether the API key is configured, which recorder is available, and a few current settings.
/voice-glossary
Adds custom cleanup rules for words speech-to-text gets wrong.
/voice-glossary list
/voice-glossary add pi-vox pyvox "bye vox"
/voice-glossary add pi-coding-agent pycodingagent "bye coding agent"
/voice-glossary clearSettings are saved here:
~/.pi/pi-vox/config.jsonYou can use another config file with:
export PI_VOX_CONFIG=/path/to/config.jsonTranscript cleanup
Speech-to-text often gets project names wrong, so pi-vox cleans up common mistakes before inserting the text.
Examples:
py-coding agent→pi-coding-agentpie coding agent→pi-coding-agentbye coding agent→pi-coding-agentpyvox→pi-voxpytutor→pi-tutorpyoverwatch→pi-overwatch
You can add your own glossary entries in config:
{
"transcriptGlossary": [
{ "canonical": "my-product", "aliases": ["my product", "mai product"] }
]
}Or use the command:
/voice-glossary add my-product "my product" "mai product"To turn cleanup off:
{
"transcriptCleanup": false
}Why it uses commands instead of hold-space
Some terminals handle key press/release events differently. Holding space can be unreliable, and it can interfere with normal typing.
So the default is simple and safe:
/voice-toggleThere is internal support for shortcuts and hold-to-talk, but the command workflow is the supported default.
Config defaults
{
provider: 'elevenlabs',
holdKey: 'space',
holdToTalk: false,
holdThresholdMs: 350,
fallbackToggleShortcut: 'ctrl+v',
cancelShortcut: 'escape',
autoSubmit: false,
appendMode: 'append',
recorder: 'auto',
transcriptCleanup: true,
transcriptGlossary: undefined,
transcriptReplacements: undefined
}Privacy notes
When you stop recording, pi-vox sends that audio to ElevenLabs for transcription.
It does not keep a recording history. Temporary audio files are cleaned up after transcribe or cancel.
Still, don't dictate secrets into any networked voice tool.
Development
pnpm install
pnpm test
pnpm check
pnpm pack:smokeLocal install while developing:
pi install /absolute/path/to/pi-voxInside pi:
/reload
/voice-status
/voice-toggleKnown limitations
- ElevenLabs is the only provider right now
- command-based toggle is the supported path
- hold-space is disabled by default
- no streaming partial transcripts yet
- no text-to-speech, wake word, or daemon
License
MIT
