@juicesharp/rpiv-voice

v1.20.0

Published

a month ago

Pi extension. Voice dictation via /voice — local on-device STT with sherpa-onnx Whisper (base multilingual int8), microphone capture via decibri.

Downloads

2,954

0High
0Medium
0Low

juicesharp

pi-package pi-extension rpiv voice dictation stt sherpa-onnx overlay

rpiv-voice

Talk to Pi Agent instead of typing. rpiv-voice adds the /voice slash command — open the overlay, speak, hit Enter, and your transcript drops straight into Pi's editor. Speech-to-text runs entirely on your machine via sherpa-onnx Whisper (base multilingual int8). No cloud, no API keys, no telemetry.

Voice dictation overlay above the Pi editor

Features

100% on-device — audio never leaves your laptop. No accounts, no API keys, no network calls after the first model download.
~99 languages, autodetected — Whisper base multilingual handles the full Whisper language set with per-utterance autodetection. Switch languages mid-session without changing settings.
Live transcript — committed lines render as you finish phrases, with a dim rolling partial showing the still-active utterance in real time. What you see is what gets pasted (no waiting for a "proper" final).
VAD-driven chunking — Silero voice-activity detection breaks long monologues at natural pauses, so latency stays bounded even on a 5-minute rant.
Settings screen built-in — Tab flips to a settings panel showing your active mic, detected language, and a hallucination filter toggle. Ctrl-S to save, Esc or Tab to return to dictation.
Whisper hallucination filter — strips spurious "Thanks for watching", "[Music]", and repeating-token loops that Whisper sometimes emits on silence. Toggle off if you're dictating short single words.
Pause / resume — hit Space to mute the mic without closing the overlay; great for stepping aside mid-thought.
Localized UI — overlay, status bar, and settings render in German, English, Spanish, French, Portuguese (European + Brazilian), Russian, and Ukrainian when @juicesharp/rpiv-i18n is installed. Falls back to English when it isn't.
Honest first-run UX — the splash overlay shows download progress (percent + bytes), then Extracting…, Verifying…, Loading engine…, Initializing mic… before the dictation overlay opens. Half-loaded states never reach you.
Configurable cancel keybinding — bind cancel to whatever your fingers prefer; no longer hardcoded to Esc.
Errors persisted, not swallowed — recognition failures land in ~/.config/rpiv-voice/errors.log so you can see why a phrase didn't transcribe.

Install

rpiv-voice is opt-in — it's not part of /rpiv-setup because the native deps (sherpa-onnx, decibri) are heavyweight. Install it directly:

pi install npm:@juicesharp/rpiv-voice

Then restart your Pi session.

Optional: localized UI

Install @juicesharp/rpiv-i18n alongside it to flip the overlay, status bar, and settings strings to your active locale:

pi install npm:@juicesharp/rpiv-i18n

/languages switches the locale live — no restart.

Usage

Type /voice in Pi's input — the overlay opens with a recording glyph, a session timer, and Listening….

| Key | Action | |---|---| | (speak) | Equalizer animates; transcript fills in live as Whisper decodes | | Enter | Close overlay, paste transcript into the Pi editor | | Esc | Close overlay, paste nothing (configurable — see below) | | Space | Pause / resume the mic | | Tab | Flip between dictation and settings screens | | Ctrl-S (in Settings) | Save settings to disk |

The dim trailing text after the committed transcript is the rolling partial — it's already part of what will paste, so you can hit Enter the moment you're done.

First run

The first time you run /voice, the splash overlay downloads the Whisper base multilingual model (~198 MB compressed, ~157 MB on disk) into ~/.pi/models/whisper-base/. Subsequent runs load directly from disk in under a second. If a previous download was interrupted, the stale model directory is detected and re-downloaded automatically.

Configuration

rpiv-voice works without any config file. To customize, drop a JSON file at ~/.config/rpiv-voice/voice.json:

{
  "hallucinationFilterEnabled": false
}

| Field | Default | Effect | |---|---|---| | hallucinationFilterEnabled | true | When false, keeps Whisper's "Thanks for watching" / "[Music]" / repeating-token loops. Useful when dictating short single words that the filter might mistake for noise. |

You can also flip the toggle interactively from the Settings screen (Tab from dictation, Ctrl-S to save).

The microphone is the OS default input — rpiv-voice does not expose device selection. The bundled Whisper base multilingual model is loaded from ~/.pi/models/whisper-base/; alternative models aren't supported today.

Privacy

No cloud STT. Audio is decoded on your CPU via sherpa-onnx; nothing leaves the machine.
No telemetry. No usage events, no crash reports, no install pings. Errors are written to a local log only.
No API keys. Nothing to provision, nothing to revoke.
Network only on first run — to download the model. After that, /voice works offline.

Requirements

Pi Agent CLI
A working microphone reachable by decibri (mic permission granted to your terminal on macOS)
~200 MB free disk under ~/.pi/models/whisper-base/
Network access on first run only

Troubleshooting

"Microphone init failed" on the splash — grant your terminal app microphone access (System Settings → Privacy & Security → Microphone on macOS), then re-run /voice.
Transcript looks like "Thanks for watching" — Whisper hallucinated on near-silence; either speak louder/closer, or leave the hallucination filter on (the default).
/voice not found — restart your Pi session after install. If it's still missing, confirm the entry exists in ~/.pi/agent/settings.json.
Recognition errors — check ~/.config/rpiv-voice/errors.log for the underlying sherpa-onnx error.

Related packages

@juicesharp/rpiv-i18n — localizes the /voice overlay UI.
@juicesharp/rpiv-pi — umbrella + /rpiv-setup for the rest of the rpiv-* family.

License

MIT — see LICENSE.