whisperroo

v0.1.9

Published

22 days ago

Push-to-talk transcription with a video-chat persona overlay so you don't look crazy talking to your computer.

0High
0Medium
0Low

uplo69

WhisperRoo

Push-to-talk transcription with a video-chat persona overlay so you don't look crazy talking to your computer.

Hold a hotkey, talk to a small floating "video meeting" window, release — your words are transcribed (locally, via whisper.cpp) and pasted into whichever app was focused.

This is a personal-use app for Apple Silicon macOS. It can run from npm, from source, or as a packaged local macOS .app.

Install From Terminal

npx whisperroo

Or install it globally:

npm install -g whisperroo
whisperroo

The npm package includes Electron and the native whisper.cpp runtime for Apple Silicon Macs. On first launch it downloads the default large-v3-turbo model (~1.6 GB).

Requirements: macOS on Apple Silicon, Node.js/npm, internet access for the first model download, and macOS Microphone + Accessibility permissions.

Source Setup

git clone <this repo> WhisperRoo && cd WhisperRoo
npm install

The first time you launch WhisperRoo:

Open the tray menu and click ❌ Microphone — allow the macOS microphone prompt.
Open the tray menu and click ❌ Accessibility — System Settings opens to Privacy & Security → Accessibility. Enable WhisperRoo there.
The first transcription downloads ggml-large-v3-turbo.bin (~1.6 GB) to ~/Library/Application Support/whisperroo/models/.

Developer smoke-test scripts need ffmpeg, but normal app usage does not.

Run

npm start

A tray icon appears (no window). Hold Right Option, speak, release — text appears in your focused field.

To quit: tray menu → Quit.

If the hotkey is not active, open the tray menu and look for the two permission rows: ✅ or ❌ Microphone, and ✅ or ❌ Accessibility. Click either row to open the right macOS permission flow; Accessibility also registers the current app with macOS before opening Settings.

The tray menu also keeps the last 10 successful recordings under Recent Recordings. Click any recent item to copy its transcript to the clipboard; hover over an item to preview more of the transcript. Audio .wav files are stored in ~/Library/Application Support/whisperroo/recordings/ and old entries are cleaned up automatically.

Build A Mac App

npm run pack:mac
open dist/mac-arm64/WhisperRoo.app

That creates a local app bundle with the persona videos bundled in Contents/Resources/assets/personas/. Packaged builds store config and custom persona clips in:

~/Library/Application Support/whisperroo/

For the cleanest macOS Accessibility entry, install the app as /Applications/WhisperRoo.app and launch it from there. If macOS shows an old cached WhisperRoo entry in Accessibility, toggle that old entry off, remove it if needed, then click ❌ Accessibility from the tray and add /Applications/WhisperRoo.app.

Configuration

Edit config.json (created in the project folder on first launch):

{
  "hotkey": "AltRight",
  "position": "top-center",
  "persona": "main-persona.mov",
  "model": "large-v3-turbo",
  "keepCameraMinutes": 0
}

| Key | Values | |---|---| | hotkey | Any UiohookKey name, e.g. AltRight, Ctrl, F13, CapsLock. | | position | top-left, top-center, top-right, center, bottom-left, or bottom-right. Switchable from the tray. | | persona | A file in assets/personas/. Switchable from the tray. | | model | A whisper model key — see Models below. Switchable from the tray. | | keepCameraMinutes | Minutes to keep the video-call overlay visible after releasing the hotkey. 0 hides immediately. Switchable from the tray with presets and a custom value. |

Models

Pick a model from the tray icon → Model submenu. Each entry shows its size; models that are not cached yet show (download). The default is large-v3-turbo; on first launch WhisperRoo downloads it from HuggingFace and caches it in ~/Library/Application Support/whisperroo/models/.

| Key | Size | Languages | Notes | |---|---|---|---| | tiny.en | 39 MB | English | Fastest, lower quality | | base.en | 142 MB | English | Fast whisper.cpp fallback; good balance | | small.en | 466 MB | English | Better quality | | medium.en | 1.5 GB | English | Near-best for English | | large-v3-turbo-q5_0 | 547 MB | Multilingual | Smaller/faster quantized turbo | | large-v3-turbo-q8_0 | 834 MB | Multilingual | Smaller quantized turbo with higher precision | | large-v3-turbo | 1.6 GB | Multilingual | Default; best local speed/quality balance | | large-v3 | 3.1 GB | Multilingual | Highest local Whisper quality, slower |

Bigger models are slower on first use because whisper.cpp warms up Metal/CUDA and need more RAM. Use large-v3-turbo for the best local dictation default, large-v3 when maximum local accuracy matters more than speed, or a large-v3-turbo-q* model when you want a smaller download.

WhisperRoo also filters likely non-dictation captures before pasting: very low-signal clips are ignored, and short common Whisper hallucinations such as "thank you" are suppressed when the audio looks like background room noise rather than foreground speech.

Personas

Tray icon → Persona submenu — pick from any .mp4, .mov, or .webm in the personas folder. The bundled default is main-persona.mov.

To use your own clips during development, drop any short looping video into assets/personas/ and pick it from the tray menu. In the packaged app, use tray menu → Reveal user personas folder and drop clips there. The app seeds that folder with the bundled clip on launch and rescans it each time the tray menu opens.

There's also a built-in canvas avatar fallback ("persona": "alex" | "sam" | "jordan") if you'd rather a stylised non-photo "person" — it's clearly stylised, so only useful if no one's looking closely.

Troubleshooting

Hotkey does nothing on macOS — Accessibility permission isn't granted. Open the tray menu, click ❌ Accessibility, enable WhisperRoo in System Settings, then open the tray again and confirm the row changes to ✅ Accessibility. If you do not see WhisperRoo in the app picker, install and launch /Applications/WhisperRoo.app; running with npx whisperroo can show Electron or your terminal instead because macOS sees the launched binary.
whisper-cli not found — see platform setup above.
Transcription is slow on first call — whisper.cpp loads the model and warms up Metal/CUDA on the first call. Subsequent calls are much faster.
Spinner never stops — WhisperRoo now times out a stuck whisper process and returns to idle instead of spinning forever. Try again once the model has finished warming up.
Pasted text appears in the overlay instead of my app — should not happen since the overlay is focusable: false, but if it does, increase PASTE_DELAY_MS in src/main/text-injector.js.

How it works

[Right Option held]
  └─► uiohook-napi keydown
        └─► IDLE → RECORDING
              ├─► show overlay window (frameless, top-center)
              └─► renderer: getUserMedia → AudioWorklet → 16 kHz Float32 PCM → IPC

[Right Option released]
  └─► uiohook-napi keyup
        └─► RECORDING → TRANSCRIBING
              └─► spawn `whisper-cli -m model.bin -f clip.wav -nt -np`
                    └─► → INJECTING
                          ├─► clipboard.writeText(text)
                          ├─► nut-js: Cmd/Ctrl+V
                          └─► hide overlay → IDLE

Same engine MacWhisper uses (whisper.cpp), bound via the whisper-cli binary on PATH. No cloud calls, no API keys.

License

MIT (this app). whisper.cpp is MIT. Persona videos: provide your own.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme