npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@kstonekuan/gemini-voice

v0.0.2

Published

**Voice mode for [Gemini CLI](https://github.com/google-gemini/gemini-cli).** Talk to Gemini from your terminal, powered by the Gemini Live API.

Readme

Gemini CLI Voice Extension

Voice mode for Gemini CLI. Talk to Gemini from your terminal, powered by the Gemini Live API.

Gemini CLI extension

This repo ships two things:

  • gemini-voice CLI, a standalone voice real-time transcription tool in the terminal with an audio waveform display. It captures speech from your microphone, streams it to the Gemini Live API, and returns a transcript.
  • Gemini CLI Extension, which adds a /voice command to Gemini CLI so you can speak instead of type.

The CLI was built first as the core transcription engine, and the extension wraps it to bring voice input into Gemini CLI. Think of it like voice mode for Claude Code, but for Gemini CLI.

gemini-voice CLI

Current limitations

The extension approach works, but Gemini CLI's extension system has some constraints that limit the experience:

  • No push-to-talk. You need to type /voice (or use your OS voice-to-text) to start listening. There's no hotkey to hold and talk.
  • No live feedback. The standalone gemini-voice CLI shows a real-time audio waveform, but Gemini CLI doesn't support live output from extension subprocesses, so the interactive UI is suppressed when used as an extension.

These are platform limitations, not bugs. To get a true voice mode with push-to-talk, live waveforms, and tight integration, it needs to be built natively into Gemini CLI itself. I'm working on that, and this project is a stepping stone towards it, built on top of the Gemini Live API.

Features

  • Voice input for Gemini CLI via the /voice extension command
  • Native microphone capture via a Rust addon (cpal + lock-free ring buffer)
  • Real-time audio streaming to the Gemini Live API for transcription
  • Server-side voice activity detection (VAD), no local VAD needed
  • Automatic shutdown after speech ends
  • Ink-based terminal UI with spinner and live audio level meter (standalone CLI)
  • Standalone CLI with transcribe and devices subcommands
  • Pre-built native binaries, no Rust toolchain needed for end users

How it works

The Gemini Live API is actually a speech-to-speech API designed for real-time voice conversations with the model. We're repurposing it here, only using its real-time input transcription and server-side voice activity detection to build a transcription tool. The model's audio responses are ignored entirely.

  1. The native Rust addon captures 16kHz 16-bit PCM mono audio from the microphone using cpal
  2. Audio samples are written to a lock-free ring buffer and drained on a dedicated thread
  3. The drain thread pushes samples into Node.js via a NAPI ThreadsafeFunction (non-blocking)
  4. TypeScript code base64-encodes the PCM chunks and sends them as realtimeInput over a WebSocket to the Gemini Live API
  5. The server performs voice activity detection and streams back inputTranscription messages
  6. Once transcription is complete (or a settle timeout elapses), the transcript is printed to stdout and the process exits

Prerequisites

Installation

As a Gemini CLI extension

From GitHub:

gemini extensions install https://github.com/kstonekuan/gemini-cli-voice-extension

From npm:

gemini extensions install @kstonekuan/gemini-voice

Set up your API key:

gemini-voice auth

Standalone CLI

npm install -g @kstonekuan/gemini-voice
gemini-voice auth

Development

See CONTRIBUTING.md for development setup.

Usage

Inside Gemini CLI

/voice

Standalone CLI

# Transcribe speech from the default microphone
gemini-voice transcribe

# Transcribe from a specific audio device
gemini-voice transcribe --device 1

# Quiet mode -- only output the final transcript (no UI)
gemini-voice transcribe --quiet

# List available audio input devices
gemini-voice devices

Note: When using /voice inside Gemini CLI, the --quiet flag is used automatically. Gemini CLI's !{...} syntax does not support live output from subprocesses, so the interactive UI is suppressed. The model will echo back the transcription before responding.

Press Ctrl+C to cancel at any time.