@juicesharp/rpiv-voice
v1.9.0
Published
Pi extension. Voice dictation via /voice — local on-device STT with sherpa-onnx Whisper (base multilingual int8), microphone capture via decibri.
Maintainers
Readme
rpiv-voice
Talk to Pi Agent instead of typing. rpiv-voice adds the /voice slash command — open the overlay, speak, hit Enter, and your transcript drops straight into Pi's editor. Speech-to-text runs entirely on your machine via sherpa-onnx Whisper (base multilingual int8). No cloud, no API keys, no telemetry.

Features
- 100% on-device — audio never leaves your laptop. No accounts, no API keys, no network calls after the first model download.
- ~99 languages, autodetected — Whisper base multilingual handles the full Whisper language set with per-utterance autodetection. Switch languages mid-session without changing settings.
- Live transcript — committed lines render as you finish phrases, with a dim rolling partial showing the still-active utterance in real time. What you see is what gets pasted (no waiting for a "proper" final).
- VAD-driven chunking — Silero voice-activity detection breaks long monologues at natural pauses, so latency stays bounded even on a 5-minute rant.
- Settings screen built-in —
Tabflips to a settings panel showing your active mic, detected language, and a hallucination filter toggle.Ctrl-Sto save,EscorTabto return to dictation. - Whisper hallucination filter — strips spurious "Thanks for watching", "[Music]", and repeating-token loops that Whisper sometimes emits on silence. Toggle off if you're dictating short single words.
- Pause / resume — hit
Spaceto mute the mic without closing the overlay; great for stepping aside mid-thought. - Localized UI — overlay, status bar, and settings render in German, English, Spanish, French, Portuguese (European + Brazilian), Russian, and Ukrainian when
@juicesharp/rpiv-i18nis installed. Falls back to English when it isn't. - Honest first-run UX — the splash overlay shows download progress (percent + bytes), then
Extracting…,Verifying…,Loading engine…,Initializing mic…before the dictation overlay opens. Half-loaded states never reach you. - Configurable cancel keybinding — bind cancel to whatever your fingers prefer; no longer hardcoded to
Esc. - Errors persisted, not swallowed — recognition failures land in
~/.config/rpiv-voice/errors.logso you can see why a phrase didn't transcribe.
Install
rpiv-voice is opt-in — it's not part of /rpiv-setup because the native deps (sherpa-onnx, decibri) are heavyweight. Install it directly:
pi install npm:@juicesharp/rpiv-voiceThen restart your Pi session.
Optional: localized UI
Install @juicesharp/rpiv-i18n alongside it to flip the overlay, status bar, and settings strings to your active locale:
pi install npm:@juicesharp/rpiv-i18n/languages switches the locale live — no restart.
Usage
Type /voice in Pi's input — the overlay opens with a recording glyph, a session timer, and Listening….
| Key | Action |
|---|---|
| (speak) | Equalizer animates; transcript fills in live as Whisper decodes |
| Enter | Close overlay, paste transcript into the Pi editor |
| Esc | Close overlay, paste nothing (configurable — see below) |
| Space | Pause / resume the mic |
| Tab | Flip between dictation and settings screens |
| Ctrl-S (in Settings) | Save settings to disk |
The dim trailing text after the committed transcript is the rolling partial — it's already part of what will paste, so you can hit Enter the moment you're done.
First run
The first time you run /voice, the splash overlay downloads the Whisper base multilingual model (~198 MB compressed, ~157 MB on disk) into ~/.pi/models/whisper-base/. Subsequent runs load directly from disk in under a second. If a previous download was interrupted, the stale model directory is detected and re-downloaded automatically.
Configuration
rpiv-voice works without any config file. To customize, drop a JSON file at ~/.config/rpiv-voice/voice.json:
{
"hallucinationFilterEnabled": false
}| Field | Default | Effect |
|---|---|---|
| hallucinationFilterEnabled | true | When false, keeps Whisper's "Thanks for watching" / "[Music]" / repeating-token loops. Useful when dictating short single words that the filter might mistake for noise. |
You can also flip the toggle interactively from the Settings screen (Tab from dictation, Ctrl-S to save).
The microphone is the OS default input — rpiv-voice does not expose device selection. The bundled Whisper base multilingual model is loaded from ~/.pi/models/whisper-base/; alternative models aren't supported today.
Privacy
- No cloud STT. Audio is decoded on your CPU via sherpa-onnx; nothing leaves the machine.
- No telemetry. No usage events, no crash reports, no install pings. Errors are written to a local log only.
- No API keys. Nothing to provision, nothing to revoke.
- Network only on first run — to download the model. After that,
/voiceworks offline.
Requirements
- Pi Agent CLI
- A working microphone reachable by
decibri(mic permission granted to your terminal on macOS) - ~200 MB free disk under
~/.pi/models/whisper-base/ - Network access on first run only
Troubleshooting
- "Microphone init failed" on the splash — grant your terminal app microphone access (System Settings → Privacy & Security → Microphone on macOS), then re-run
/voice. - Transcript looks like "Thanks for watching" — Whisper hallucinated on near-silence; either speak louder/closer, or leave the hallucination filter on (the default).
/voicenot found — restart your Pi session after install. If it's still missing, confirm the entry exists in~/.pi/agent/settings.json.- Recognition errors — check
~/.config/rpiv-voice/errors.logfor the underlying sherpa-onnx error.
Related packages
@juicesharp/rpiv-i18n— localizes the/voiceoverlay UI.@juicesharp/rpiv-pi— umbrella +/rpiv-setupfor the rest of therpiv-*family.
License
MIT — see LICENSE.
