npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

mrmd-voice

v0.1.0

Published

Voice-to-text for MRMD — recording, transcription, and text routing

Readme

mrmd-voice

Voice-to-text for MRMD. Works everywhere: Electron desktop, markco.dev browser, phone (TWA).

Voice is an app-level service, not an editor feature. It captures audio regardless of what's focused (editor, terminal, file picker, rename input) and routes transcribed text to the active element.

Architecture

Core Principle: Never Lose Audio

Every recording is saved as a project asset immediately. Transcription is secondary — if it fails, the audio is safe and can be re-transcribed later. Audio assets are auto-cleaned after 1 hour (configurable). The goal is crash safety, not archiving.

During recording, checkpoint blobs are saved every 30 seconds. On stop, the final complete blob overwrites the checkpoints. Format: WebM/Opus (~12KB/sec).

Package Structure

mrmd-voice/
├── src/
│   ├── recorder.js          # MediaRecorder wrapper
│   │                        #   - start/stop recording
│   │                        #   - periodic checkpoint saves (crash safety)
│   │                        #   - final blob on stop
│   │
│   ├── transcriber.js        # Transcription backend router
│   │                        #   - tries backends in priority order
│   │                        #   - common interface for all backends
│   │
│   ├── parakeet-client.js    # WebSocket client for Parakeet servers
│   │                        #   - Same protocol as parakeet-stream server
│   │                        #   - Sends int16 PCM audio, receives segments
│   │                        #   - Works direct (ws://) or via proxy (/proxy/)
│   │
│   ├── api-client.js         # REST client for Groq/OpenAI Whisper API
│   │                        #   - POST audio blob, get text back
│   │
│   ├── text-router.js        # Routes transcribed text to focused element
│   │                        #   - CodeMirror → editor.dispatch() insert
│   │                        #   - xterm terminal → write to PTY stdin
│   │                        #   - <input>/<textarea> → set value + input event
│   │                        #   - File picker → set search query
│   │                        #   - Nothing focused → buffer / last known target
│   │
│   └── index.js              # Public API
├── dist/
│   └── mrmd-voice.iife.js    # Browser bundle (loaded by index.html)
├── rollup.config.js
└── package.json

Where Each Piece Lives (Across the Ecosystem)

| Component | Package | Why | |---|---|---| | Recording, transcription clients, text routing | mrmd-voice/ | Pure browser lib, shared by all deployment modes | | Shimmer overlay, mic button, Alt+W, rail panel | mrmd-electron/index.html | App shell (served to browser by mrmd-server too) | | Voice service (Parakeet process lifecycle) | mrmd-electron/src/services/voice-service.js | Node service, same pattern as session services | | Voice IPC handlers | mrmd-electron/main.js | voice:status, voice:startLocal, voice:stopLocal | | Voice IPC bridge | mrmd-electron/preload.cjs | electronAPI.voice.* namespace | | Voice HTTP routes | mrmd-server/src/api/voice.js | HTTP mirror of voice IPC | | Voice http-shim entries | mrmd-server/static/http-shim.js | Browser compat layer | | FixTranscriptionPredict | mrmd-ai/ | Already exists | | Voice settings | Settings service | Already exists, just add voice section |

Transcription Backends

Current providers wired in app settings:

  1. Parakeet (WebSocket, GPU-friendly)

    • Config: voice.provider = "parakeet"
    • URL: voice.parakeetUrl = "ws://192.168.2.24:8765"
  2. OpenAI Whisper API

    • Config: voice.provider = "openai"
    • Uses apiKeys.openai
    • Endpoint: https://api.openai.com/v1/audio/transcriptions
  3. Groq Whisper API

    • Config: voice.provider = "groq"
    • Uses apiKeys.groq
    • Endpoint: https://api.groq.com/openai/v1/audio/transcriptions

If no backend is configured, audio is still saved and user gets a non-fatal toast.

How Phone/Browser Uses Desktop GPU (Tunnel)

The existing Runtime Tunnel already proxies arbitrary HTTP and WebSocket to the desktop Electron app. Voice piggybacks on this:

Phone (markco.dev)
  → mrmd-voice/parakeet-client.js connects via WebSocket
  → ws://server/sync/8765/  (proxy path)
  → RuntimeTunnelClient in mrmd-server
  → WebSocket relay (markco.dev/sync-relay)  
  → RuntimeTunnel provider in mrmd-electron on desktop
  → ws://localhost:8765  (Parakeet server on GPU)

No new tunnel infrastructure needed. Just register the Parakeet port as a tunnel port.

Text Routing

When transcription completes, text is routed to wherever focus is:

| Focus target | How text is inserted | |---|---| | CodeMirror editor | view.dispatch({ changes: { from: cursor, insert: text } }) | | xterm.js terminal | Write to PTY stdin (simulates typing — user sees text, presses Enter) | | <input> element (file picker, rename, search) | el.value += text; el.dispatchEvent(new Event('input')) | | Nothing focused | Insert at last known editor cursor position |

UI Elements

  • Full-screen shimmer: Animated border glow on the .app container when recording. Unmistakable visual signal.
  • Mic button: In the mobile toolbar (alongside search, AI, terminal buttons). On desktop, in the status bar area.
  • Alt+W: Toggle recording on/off (same shortcut as existing parakeet-hotkey service).
  • Duration timer: Shows recording time while active.
  • Rail menu panel: Voice status, recent recordings (last hour), re-transcribe button, backend settings, mic test.

Audio Safety Details

  • During recording: checkpoint blob saved every 30 seconds to _assets/voice/
  • On stop: final complete blob saved, checkpoints removed
  • After transcription: .json sidecar with transcript + metadata
  • Cleanup: background job deletes voice assets older than 1 hour
  • Format: WebM/Opus (native to MediaRecorder, good compression)
_assets/voice/
├── 2026-02-23T06-31-56.webm       # Audio recording
├── 2026-02-23T06-31-56.json       # { audio, text, backend, duration, timestamp }
└── 2026-02-23T06-31-56.checkpoint.webm  # Only exists during active recording

Settings

In SettingsService (~/.mrmd/settings.json), voice section:

{
  "voice": {
    "shortcut": { "altKey": true, "ctrlKey": false, "metaKey": false, "shiftKey": false, "key": "w" },
    "parakeetUrl": "ws://192.168.2.24:8765"
  }
}

You can preseed parakeetUrl from backend/environment (no manual typing) with:

MRMD_PARAKEET_URL=ws://192.168.2.24:8765 mrmd-electron
# or
MRMD_PARAKEET_URL=ws://192.168.2.24:8765 mrmd-server

The Settings panel still lets users override it at runtime.

Implementation Phases

Phase 1 — Core Recording + Parakeet (Current)

  • mrmd-voice/ package with recorder, parakeet-client, text-router
  • Full-screen shimmer in index.html
  • Alt+W shortcut, mic button in mobile toolbar
  • Audio safety (save blobs as assets, 1-hour cleanup)
  • Direct Parakeet connection (configurable URL in settings)
  • Text insertion to editor, inputs, terminal

Phase 2 — Electron Parakeet Management + Tunnel

  • voice-service.js in Electron (GPU detection, auto-start Parakeet)
  • Register Parakeet port in the tunnel
  • Phone/browser → tunnel → desktop Parakeet (zero config)
  • electronAPI.voice.* IPC + server HTTP mirror + http-shim entries

Phase 3 — Cloud API + Rail Panel + Polish

  • Groq/OpenAI Whisper API client (api-client.js)
  • Rail menu voice panel (status, re-transcribe, settings)
  • FixTranscriptionPredict auto-cleanup option
  • Streaming partial transcription during recording

Development

cd mrmd-voice
npm install
npm run build    # Produces dist/mrmd-voice.iife.js
npm run dev      # Watch mode