npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@drakulavich/kesha-voice-kit

v1.23.0

Published

Give your tools a voice — speech to text and back, 25 languages, up to ~19× faster than Whisper. On your machine.

Readme

  • Transcribe locally25 languages, up to ~19x faster than Whisper on Apple Silicon, ~2.5x on CPU
  • Speak back — text-to-speech in 9 languages
  • Plug into agents — ship voice workflows as CLI commands, an MCP server, an OpenClaw skill, or a Hermes agent
  • Small Rust engine — single ~20MB binary, no ffmpeg, no Python, no native Node addons

Quick Start

Runtime: Bun >= 1.3.0 · Platforms: macOS arm64, Linux x64, Windows x64.

# 1. Install Bun (skip if you have it) — Linux & macOS:
curl -fsSL https://bun.sh/install | bash        # or: brew install oven-sh/bun/bun
# Windows: powershell -c "irm bun.sh/install.ps1 | iex"

# 2. Install Kesha:
bun add -g @drakulavich/kesha-voice-kit
kesha install        # downloads engine + models (explicit — never automatic)

# 3. Transcribe:
kesha audio.ogg      # transcript to stdout

Prefer Homebrew, .deb/.rpm, Docker, or Nix? See Other install methods. Air-gapped or behind a corporate mirror? See docs/model-mirror.md.

Speech-to-text

kesha audio.ogg                            # transcribe (plain text)
kesha --format transcript audio.ogg        # text + language/confidence
kesha --format json audio.ogg              # full JSON with lang fields
kesha --json --timestamps audio.ogg        # JSON with timestamped segments
kesha --toon audio.ogg                     # compact LLM-friendly TOON
kesha status                               # show installed backend info

Multiple files get head-style headers; stdout is the transcript, stderr is errors — pipe-friendly:

$ kesha freedom.ogg tahiti.ogg
=== freedom.ogg ===
Свободу попугаям! Свободу!

=== tahiti.ogg ===
Таити, Таити! Не были мы ни в какой Таити! Нас и тут неплохо кормят.
  • Long / silence-heavy audio: install VAD (kesha install --vad); Kesha auto-uses it past 120 s. Without VAD, long audio falls back to fixed ASR chunks. See docs/vad.md.
  • Speaker diarization (darwin-arm64): kesha install --diarize, then kesha --json --vad --speakers meeting.m4a stamps each segment with a speaker id. Linux/Windows return a clear "darwin-arm64 only" error (#199).

Text-to-speech

Kesha speaks back in 9 languages, auto-picking the voice from the text's language. Override with --lang <code> or --voice <id>.

kesha install --tts                              # opt-in models (~990MB)
kesha say "Hello, world" > hello.wav
kesha say "Привет, мир" > privet.wav             # auto-routes by language
kesha say --voice ru-vosk-m02 "Голос в текст." > ru.wav

Output formats (--format, or inferred from the --out extension):

kesha say "Hello" --out hi.wav                    # WAV (default, uncompressed)
kesha say "Hello" --format ogg-opus --out hi.ogg  # OGG/Opus — messenger voice notes
kesha say "Hello" --format flac --out hi.flac     # FLAC — lossless, plays in every browser incl. Safari/iOS

kesha say --list-voices lists what's installed. Voices, the full catalogue, macOS system voices, SSML, speaking rate (--rate, <prosody>), Russian word stress, and Russian/English abbreviation handling are all in docs/tts.md.

Languages

Speech-to-text spans 25 languages and text-to-speech covers English, Russian, and select multilingual voices — full tables with codes and flags in docs/languages.md. Audio language detection identifies 107 languages.

Performance

Up to ~19x faster than Whisper on Apple Silicon (M2), ~2.5x faster on CPU

Compared against Whisper large-v3-turbo, all engines auto-detecting language:

Benchmark: openai-whisper vs faster-whisper vs Kesha Voice Kit

Full per-file breakdown (Russian + English): BENCHMARK.md.

Other install methods

All of these install the Bun CLI wrapper; engine + models still download explicitly via kesha install.

  • Homebrewbrew install drakulavich/tap/kesha-voice-kit · docs/homebrew.md
  • Linux packages (.deb/.rpm, x64) — docs/linux-packages.md
  • Docker (GHCR image) — docs/docker.md
  • Nix (aarch64-darwin / x86_64-linux) — nix run github:drakulavich/kesha-voice-kit -- install · docs/nix-install.md
  • Shell completions + manpagekesha completions bash|zsh|fish and kesha manpage print the packaged files to install wherever your shell expects them.

Integrations

  • MCP serverkesha mcp exposes transcribe/synthesize/list tools to any MCP client (Claude, Cursor, Codex, Gemini). Setup: docs/mcp.md.
  • OpenClaw — give your LLM agent ears. Install & config: docs/openclaw.md.
  • Hermes Agent — local STT/TTS through Hermes command providers. Setup: docs/hermes.md.
  • Raycast (macOS) — transcribe selected audio & speak the clipboard from the launcher. Source + install: raycast/.
  • Programmatic API@drakulavich/kesha-voice-kit/core for use inside a Bun program. See docs/api.md.

More

  • Architecture — runtime data flow, the models that ship, the CLI ↔ Rust engine boundary, model pinning, and where tests live.
  • Use cases — copy-paste recipes (transcribe a meeting, speak from OpenClaw, run offline, move the cache).
  • Product positioning — supported workflows, non-goals, maturity labels, platform matrix.
  • Diagnostics: kesha doctor, kesha support-bundle (redacted .tar.gz for issues), and kesha logs produce local, content-free diagnostics — see docs/diagnostic-logs.md. Every failure prints a stable error [CODE]: … line (docs/errors.md).
  • Privacy / Local Stats: Stats are off by default and fully local. Opt in with kesha stats enable to record content-free operational metrics in a local SQLite database — never networked, never storing audio, transcripts, text, or paths. Full commands & lifecycle: docs/local-stats.md.

Contributing

See CONTRIBUTING.md, the Roadmap (Now / Next / Later), and the Decision log (why platform/model choices were made — and reversed). Dev setup: make dev-setup (Bun, Rust, nextest, platform libs).

License

Made with 💛🩵 and 🥤 energy under MIT License