mrmd-voice
v0.1.0
Published
Voice-to-text for MRMD — recording, transcription, and text routing
Readme
mrmd-voice
Voice-to-text for MRMD. Works everywhere: Electron desktop, markco.dev browser, phone (TWA).
Voice is an app-level service, not an editor feature. It captures audio regardless of what's focused (editor, terminal, file picker, rename input) and routes transcribed text to the active element.
Architecture
Core Principle: Never Lose Audio
Every recording is saved as a project asset immediately. Transcription is secondary — if it fails, the audio is safe and can be re-transcribed later. Audio assets are auto-cleaned after 1 hour (configurable). The goal is crash safety, not archiving.
During recording, checkpoint blobs are saved every 30 seconds. On stop, the final complete blob overwrites the checkpoints. Format: WebM/Opus (~12KB/sec).
Package Structure
mrmd-voice/
├── src/
│ ├── recorder.js # MediaRecorder wrapper
│ │ # - start/stop recording
│ │ # - periodic checkpoint saves (crash safety)
│ │ # - final blob on stop
│ │
│ ├── transcriber.js # Transcription backend router
│ │ # - tries backends in priority order
│ │ # - common interface for all backends
│ │
│ ├── parakeet-client.js # WebSocket client for Parakeet servers
│ │ # - Same protocol as parakeet-stream server
│ │ # - Sends int16 PCM audio, receives segments
│ │ # - Works direct (ws://) or via proxy (/proxy/)
│ │
│ ├── api-client.js # REST client for Groq/OpenAI Whisper API
│ │ # - POST audio blob, get text back
│ │
│ ├── text-router.js # Routes transcribed text to focused element
│ │ # - CodeMirror → editor.dispatch() insert
│ │ # - xterm terminal → write to PTY stdin
│ │ # - <input>/<textarea> → set value + input event
│ │ # - File picker → set search query
│ │ # - Nothing focused → buffer / last known target
│ │
│ └── index.js # Public API
├── dist/
│ └── mrmd-voice.iife.js # Browser bundle (loaded by index.html)
├── rollup.config.js
└── package.jsonWhere Each Piece Lives (Across the Ecosystem)
| Component | Package | Why |
|---|---|---|
| Recording, transcription clients, text routing | mrmd-voice/ | Pure browser lib, shared by all deployment modes |
| Shimmer overlay, mic button, Alt+W, rail panel | mrmd-electron/index.html | App shell (served to browser by mrmd-server too) |
| Voice service (Parakeet process lifecycle) | mrmd-electron/src/services/voice-service.js | Node service, same pattern as session services |
| Voice IPC handlers | mrmd-electron/main.js | voice:status, voice:startLocal, voice:stopLocal |
| Voice IPC bridge | mrmd-electron/preload.cjs | electronAPI.voice.* namespace |
| Voice HTTP routes | mrmd-server/src/api/voice.js | HTTP mirror of voice IPC |
| Voice http-shim entries | mrmd-server/static/http-shim.js | Browser compat layer |
| FixTranscriptionPredict | mrmd-ai/ | Already exists |
| Voice settings | Settings service | Already exists, just add voice section |
Transcription Backends
Current providers wired in app settings:
Parakeet (WebSocket, GPU-friendly)
- Config:
voice.provider = "parakeet" - URL:
voice.parakeetUrl = "ws://192.168.2.24:8765"
- Config:
OpenAI Whisper API
- Config:
voice.provider = "openai" - Uses
apiKeys.openai - Endpoint:
https://api.openai.com/v1/audio/transcriptions
- Config:
Groq Whisper API
- Config:
voice.provider = "groq" - Uses
apiKeys.groq - Endpoint:
https://api.groq.com/openai/v1/audio/transcriptions
- Config:
If no backend is configured, audio is still saved and user gets a non-fatal toast.
How Phone/Browser Uses Desktop GPU (Tunnel)
The existing Runtime Tunnel already proxies arbitrary HTTP and WebSocket to the desktop Electron app. Voice piggybacks on this:
Phone (markco.dev)
→ mrmd-voice/parakeet-client.js connects via WebSocket
→ ws://server/sync/8765/ (proxy path)
→ RuntimeTunnelClient in mrmd-server
→ WebSocket relay (markco.dev/sync-relay)
→ RuntimeTunnel provider in mrmd-electron on desktop
→ ws://localhost:8765 (Parakeet server on GPU)No new tunnel infrastructure needed. Just register the Parakeet port as a tunnel port.
Text Routing
When transcription completes, text is routed to wherever focus is:
| Focus target | How text is inserted |
|---|---|
| CodeMirror editor | view.dispatch({ changes: { from: cursor, insert: text } }) |
| xterm.js terminal | Write to PTY stdin (simulates typing — user sees text, presses Enter) |
| <input> element (file picker, rename, search) | el.value += text; el.dispatchEvent(new Event('input')) |
| Nothing focused | Insert at last known editor cursor position |
UI Elements
- Full-screen shimmer: Animated border glow on the
.appcontainer when recording. Unmistakable visual signal. - Mic button: In the mobile toolbar (alongside search, AI, terminal buttons). On desktop, in the status bar area.
- Alt+W: Toggle recording on/off (same shortcut as existing parakeet-hotkey service).
- Duration timer: Shows recording time while active.
- Rail menu panel: Voice status, recent recordings (last hour), re-transcribe button, backend settings, mic test.
Audio Safety Details
- During recording: checkpoint blob saved every 30 seconds to
_assets/voice/ - On stop: final complete blob saved, checkpoints removed
- After transcription:
.jsonsidecar with transcript + metadata - Cleanup: background job deletes voice assets older than 1 hour
- Format: WebM/Opus (native to MediaRecorder, good compression)
_assets/voice/
├── 2026-02-23T06-31-56.webm # Audio recording
├── 2026-02-23T06-31-56.json # { audio, text, backend, duration, timestamp }
└── 2026-02-23T06-31-56.checkpoint.webm # Only exists during active recordingSettings
In SettingsService (~/.mrmd/settings.json), voice section:
{
"voice": {
"shortcut": { "altKey": true, "ctrlKey": false, "metaKey": false, "shiftKey": false, "key": "w" },
"parakeetUrl": "ws://192.168.2.24:8765"
}
}You can preseed parakeetUrl from backend/environment (no manual typing) with:
MRMD_PARAKEET_URL=ws://192.168.2.24:8765 mrmd-electron
# or
MRMD_PARAKEET_URL=ws://192.168.2.24:8765 mrmd-serverThe Settings panel still lets users override it at runtime.
Implementation Phases
Phase 1 — Core Recording + Parakeet (Current)
mrmd-voice/package with recorder, parakeet-client, text-router- Full-screen shimmer in
index.html - Alt+W shortcut, mic button in mobile toolbar
- Audio safety (save blobs as assets, 1-hour cleanup)
- Direct Parakeet connection (configurable URL in settings)
- Text insertion to editor, inputs, terminal
Phase 2 — Electron Parakeet Management + Tunnel
voice-service.jsin Electron (GPU detection, auto-start Parakeet)- Register Parakeet port in the tunnel
- Phone/browser → tunnel → desktop Parakeet (zero config)
electronAPI.voice.*IPC + server HTTP mirror + http-shim entries
Phase 3 — Cloud API + Rail Panel + Polish
- Groq/OpenAI Whisper API client (
api-client.js) - Rail menu voice panel (status, re-transcribe, settings)
FixTranscriptionPredictauto-cleanup option- Streaming partial transcription during recording
Development
cd mrmd-voice
npm install
npm run build # Produces dist/mrmd-voice.iife.js
npm run dev # Watch mode