hyprvox

v0.11.2

Published

23 days ago

Production-ready STT daemon for Linux with global hotkeys and clipboard history

0High
0Medium
0Low

snehit70

hyprvox

Voice input for AI workflows on Linux.

hyprvox is published on the npm registry as hyprvox.

The Problem

You're deep in a session with a coding agent. You know exactly what you want to ask — a complex refactor, a debugging question, a feature request. But now you have to type it all out.

By the time you're done, you've lost the thread.

Context switching kills flow. And typing at 40 WPM when you can speak at 150 WPM is a bottleneck you don't need.

The Solution

Press a key. Speak. Press again. Paste.

hyprvox is a voice-to-text daemon for Linux. It runs in the background, transcribes when you need it, and puts the result on your clipboard — ready to paste into Claude, Copilot, or whatever agent you're working with.

Built for Hyprland/Wayland first. Works on X11 too.

Quick Start

Prerequisites

# Install Bun (if not already installed)
curl -fsSL https://bun.sh/install | bash

# Install ffmpeg (required for Opus audio conversion)
# Arch:   sudo pacman -S ffmpeg
# Ubuntu: sudo apt install ffmpeg
# Fedora: sudo dnf install ffmpeg

Installation

git clone https://github.com/Snehit70/hyprvox.git
cd hyprvox
bun install

bun run index.ts config init   # Set up API keys (Groq + Deepgram)
bun run index.ts install       # Install as systemd service

If you want to install the published CLI from npm instead of building from source:

npm install -g hyprvox

Or with Bun:

bun add -g hyprvox

Press Right Ctrl to record. Press again to stop. Paste anywhere.

Works on both Wayland and X11. On X11/GNOME/KDE, the built-in hotkey works out of the box. On Wayland (Hyprland, Sway), see compositor keybind setup for reliable system-wide hotkeys.

For AI Agents

Copy this prompt to your coding agent:

Install and configure hyprvox on this Linux system:

1. Clone: git clone https://github.com/Snehit70/hyprvox.git
2. Install: cd hyprvox && bun install
3. Run `bun run index.ts config init` — I'll provide API keys when prompted:
   - Groq API key (get from console.groq.com)
   - Deepgram API key (get from console.deepgram.com)
4. Install service: bun run index.ts install
5. For Hyprland, add keybind to ~/.config/hypr/hyprland.conf:
    bind = , code:105, exec, bun run /path/to/hyprvox/index.ts toggle
    # code:105 = Right Control (use `wev` to find other key codes)
6. For Hyprland overlay, add to ~/.config/hypr/UserConfigs/WindowRules.conf:
    windowrule = match:class hyprvox-overlay, float on
    windowrule = match:class hyprvox-overlay, pin on
    windowrule = match:class hyprvox-overlay, no_focus on
    windowrule = match:class hyprvox-overlay, no_shadow on
    windowrule = match:class hyprvox-overlay, no_anim on
    windowrule = match:class hyprvox-overlay, move ((monitor_w-window_w)*0.5) (monitor_h-window_h-50)
7. Reload: hyprctl reload
8. Verify: bun run index.ts health

How It Works

Dual-engine transcription. Audio goes to both Groq (Whisper V3) and Deepgram (Nova-3) in parallel. Results are merged with an LLM for better accuracy. If one fails, the other continues.

Streaming or batch. ~500ms latency in streaming mode. Higher accuracy in batch mode. Your choice.

Runs as a daemon. Systemd service starts on login. Always ready when you need it.

Performance

| Metric | Value | |--------|-------| | Median latency | 882ms | | Real-time factor | 39x faster than real-time | | Dual-engine success | 93.5% | | Filler words removed | 12.3% (by LLM cleanup) | | LLM merge overhead | ~280ms |

The LLM doesn't just merge — it removes filler words ("um", "uh"), false starts, and self-corrections automatically.

The Overlay

A small waveform appears at the bottom of your screen while recording — visual feedback that it's listening.

Overlay showing waveform during recording

For Hyprland, add these window rules:

# ~/.config/hypr/UserConfigs/WindowRules.conf
windowrule = match:class hyprvox-overlay, float on
windowrule = match:class hyprvox-overlay, pin on
windowrule = match:class hyprvox-overlay, no_focus on
windowrule = match:class hyprvox-overlay, no_shadow on
windowrule = match:class hyprvox-overlay, no_anim on
windowrule = match:class hyprvox-overlay, move ((monitor_w-window_w)*0.5) (monitor_h-window_h-50)

Installation

Dependencies

Audio — alsa-utils

Arch: sudo pacman -S alsa-utils
Ubuntu: sudo apt install alsa-utils
Fedora: sudo dnf install alsa-utils

Clipboard

Wayland: wl-clipboard
X11: xclip or xsel

Permissions

sudo usermod -aG audio,input $USER
# Log out and back in

API Keys

| Provider | Purpose | Link | |----------|---------|------| | Groq | Whisper V3 (fast) | console.groq.com | | Deepgram | Nova-3 (accurate) | console.deepgram.com |

Run bun run index.ts config init to set them up.

Usage

bun run index.ts status      # Check daemon status
bun run index.ts health      # Test system setup
bun run index.ts toggle      # Start/stop recording
bun run index.ts history     # View past transcriptions
bun run index.ts logs        # Tail daemon logs
bun run index.ts errors      # Show last error
bun run index.ts config init # Set up API keys
bun run index.ts boost add   # Add custom vocabulary

Configuration

Config file: ~/.config/hypr/vox/config.json

{
  "apiKeys": { "groq": "...", "deepgram": "..." },
  "transcription": {
    "streaming": true,
    "boostWords": ["Hyprland", "WebSocket", "refactor"]
  }
}

Streaming mode — ~500ms latency, slightly lower accuracy. Batch mode — 2-8 seconds, higher accuracy. Boost words — Improve recognition for technical terms.

Full options: Configuration Guide

Hyprland Setup

Add keybind for global hotkey:

# ~/.config/hypr/hyprland.conf
bind = , code:105, exec, bun run /path/to/hyprvox/index.ts toggle
# code:105 = Right Control

Use wev | grep -A5 "key event" to find key codes.

This bypasses XWayland limitations.

Full guide: Wayland Support

Troubleshooting

| Problem | Fix | |---------|-----| | Hotkey not working | Add user to input group; use compositor binds on Wayland | | No audio | Add user to audio group | | Clipboard issues | Install wl-clipboard (Wayland) or xclip (X11) | | Service won't start | Check logs: journalctl --user -u hyprvox -f |

Full guide: Troubleshooting

Documentation

Architecture — How it works under the hood
Configuration — All options explained
CLI Commands — Every command and flag
Wayland Support — Platform-specific setup

Release Workflow

Use Conventional Commits on branches merged into main; feat: triggers a minor bump and fix: triggers a patch bump.
.github/workflows/release-please.yml opens or updates the release PR, and .github/workflows/release.yml publishes tagged releases after tests pass.
The root CLI package is also published to npm as hyprvox.
Release Please uses release-please-config.json and .release-please-manifest.json to track the root package version.
Set repository Actions permissions to Read and write, and enable Allow GitHub Actions to create and approve pull requests or provide a RELEASE_PLEASE_TOKEN secret with repo scope.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

hyprvox

The Problem

The Solution

Quick Start

Prerequisites

Installation

For AI Agents

How It Works

Performance

The Overlay

Installation

Dependencies

API Keys

Usage

Configuration

Hyprland Setup

Troubleshooting

Documentation

Release Workflow

License