whisperroo
v0.1.9
Published
Push-to-talk transcription with a video-chat persona overlay so you don't look crazy talking to your computer.
Readme
WhisperRoo
Push-to-talk transcription with a video-chat persona overlay so you don't look crazy talking to your computer.
Hold a hotkey, talk to a small floating "video meeting" window, release — your words are transcribed (locally, via whisper.cpp) and pasted into whichever app was focused.
This is a personal-use app for Apple Silicon macOS. It can run from npm, from source, or as a packaged local macOS .app.
Install From Terminal
npx whisperrooOr install it globally:
npm install -g whisperroo
whisperrooThe npm package includes Electron and the native whisper.cpp runtime for Apple Silicon Macs. On first launch it downloads the default large-v3-turbo model (~1.6 GB).
Requirements: macOS on Apple Silicon, Node.js/npm, internet access for the first model download, and macOS Microphone + Accessibility permissions.
Source Setup
git clone <this repo> WhisperRoo && cd WhisperRoo
npm installThe first time you launch WhisperRoo:
- Open the tray menu and click
❌ Microphone— allow the macOS microphone prompt. - Open the tray menu and click
❌ Accessibility— System Settings opens to Privacy & Security → Accessibility. Enable WhisperRoo there. - The first transcription downloads
ggml-large-v3-turbo.bin(~1.6 GB) to~/Library/Application Support/whisperroo/models/.
Developer smoke-test scripts need ffmpeg, but normal app usage does not.
Run
npm startA tray icon appears (no window). Hold Right Option, speak, release — text appears in your focused field.
To quit: tray menu → Quit.
If the hotkey is not active, open the tray menu and look for the two permission rows: ✅ or ❌ Microphone, and ✅ or ❌ Accessibility. Click either row to open the right macOS permission flow; Accessibility also registers the current app with macOS before opening Settings.
The tray menu also keeps the last 10 successful recordings under Recent Recordings. Click any recent item to copy its transcript to the clipboard; hover over an item to preview more of the transcript. Audio .wav files are stored in ~/Library/Application Support/whisperroo/recordings/ and old entries are cleaned up automatically.
Build A Mac App
npm run pack:mac
open dist/mac-arm64/WhisperRoo.appThat creates a local app bundle with the persona videos bundled in Contents/Resources/assets/personas/. Packaged builds store config and custom persona clips in:
~/Library/Application Support/whisperroo/For the cleanest macOS Accessibility entry, install the app as /Applications/WhisperRoo.app and launch it from there. If macOS shows an old cached WhisperRoo entry in Accessibility, toggle that old entry off, remove it if needed, then click ❌ Accessibility from the tray and add /Applications/WhisperRoo.app.
Configuration
Edit config.json (created in the project folder on first launch):
{
"hotkey": "AltRight",
"position": "top-center",
"persona": "main-persona.mov",
"model": "large-v3-turbo",
"keepCameraMinutes": 0
}| Key | Values |
|---|---|
| hotkey | Any UiohookKey name, e.g. AltRight, Ctrl, F13, CapsLock. |
| position | top-left, top-center, top-right, center, bottom-left, or bottom-right. Switchable from the tray. |
| persona | A file in assets/personas/. Switchable from the tray. |
| model | A whisper model key — see Models below. Switchable from the tray. |
| keepCameraMinutes | Minutes to keep the video-call overlay visible after releasing the hotkey. 0 hides immediately. Switchable from the tray with presets and a custom value. |
Models
Pick a model from the tray icon → Model submenu. Each entry shows its size; models that are not cached yet show (download). The default is large-v3-turbo; on first launch WhisperRoo downloads it from HuggingFace and caches it in ~/Library/Application Support/whisperroo/models/.
| Key | Size | Languages | Notes |
|---|---|---|---|
| tiny.en | 39 MB | English | Fastest, lower quality |
| base.en | 142 MB | English | Fast whisper.cpp fallback; good balance |
| small.en | 466 MB | English | Better quality |
| medium.en | 1.5 GB | English | Near-best for English |
| large-v3-turbo-q5_0 | 547 MB | Multilingual | Smaller/faster quantized turbo |
| large-v3-turbo-q8_0 | 834 MB | Multilingual | Smaller quantized turbo with higher precision |
| large-v3-turbo | 1.6 GB | Multilingual | Default; best local speed/quality balance |
| large-v3 | 3.1 GB | Multilingual | Highest local Whisper quality, slower |
Bigger models are slower on first use because whisper.cpp warms up Metal/CUDA and need more RAM. Use large-v3-turbo for the best local dictation default, large-v3 when maximum local accuracy matters more than speed, or a large-v3-turbo-q* model when you want a smaller download.
WhisperRoo also filters likely non-dictation captures before pasting: very low-signal clips are ignored, and short common Whisper hallucinations such as "thank you" are suppressed when the audio looks like background room noise rather than foreground speech.
Personas
Tray icon → Persona submenu — pick from any .mp4, .mov, or .webm in the personas folder. The bundled default is main-persona.mov.
To use your own clips during development, drop any short looping video into assets/personas/ and pick it from the tray menu. In the packaged app, use tray menu → Reveal user personas folder and drop clips there. The app seeds that folder with the bundled clip on launch and rescans it each time the tray menu opens.
There's also a built-in canvas avatar fallback ("persona": "alex" | "sam" | "jordan") if you'd rather a stylised non-photo "person" — it's clearly stylised, so only useful if no one's looking closely.
Troubleshooting
- Hotkey does nothing on macOS — Accessibility permission isn't granted. Open the tray menu, click
❌ Accessibility, enable WhisperRoo in System Settings, then open the tray again and confirm the row changes to✅ Accessibility. If you do not see WhisperRoo in the app picker, install and launch/Applications/WhisperRoo.app; running withnpx whisperroocan show Electron or your terminal instead because macOS sees the launched binary. whisper-cli not found— see platform setup above.- Transcription is slow on first call — whisper.cpp loads the model and warms up Metal/CUDA on the first call. Subsequent calls are much faster.
- Spinner never stops — WhisperRoo now times out a stuck whisper process and returns to idle instead of spinning forever. Try again once the model has finished warming up.
- Pasted text appears in the overlay instead of my app — should not happen since the overlay is
focusable: false, but if it does, increasePASTE_DELAY_MSin src/main/text-injector.js.
How it works
[Right Option held]
└─► uiohook-napi keydown
└─► IDLE → RECORDING
├─► show overlay window (frameless, top-center)
└─► renderer: getUserMedia → AudioWorklet → 16 kHz Float32 PCM → IPC
[Right Option released]
└─► uiohook-napi keyup
└─► RECORDING → TRANSCRIBING
└─► spawn `whisper-cli -m model.bin -f clip.wav -nt -np`
└─► → INJECTING
├─► clipboard.writeText(text)
├─► nut-js: Cmd/Ctrl+V
└─► hide overlay → IDLESame engine MacWhisper uses (whisper.cpp), bound via the whisper-cli binary on PATH. No cloud calls, no API keys.
License
MIT (this app). whisper.cpp is MIT. Persona videos: provide your own.
