npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sorenpeng/rtstt

v1.0.0

Published

Real-time speech-to-text CLI tool using OpenAI Realtime API

Downloads

8

Readme

RTSTT - Real-Time Speech-to-Text

██████╗ ████████╗███████╗████████╗████████╗
██╔══██╗╚══██╔══╝██╔════╝╚══██╔══╝╚══██╔══╝
██████╔╝   ██║   ███████╗   ██║      ██║   
██╔══██╗   ██║   ╚════██║   ██║      ██║   
██║  ██║   ██║   ███████║   ██║      ██║   
╚═╝  ╚═╝   ╚═╝   ╚══════╝   ╚═╝      ╚═╝   
                                           
   Real-Time Speech-To-Text with OpenAI

A command-line tool for real-time speech-to-text transcription using OpenAI's Realtime API. Follows Unix philosophy: does one thing well, outputs to stdout, and is easily composable with other tools.

Features

  • 🎙️ Real-time audio capture from system microphone
  • 🔄 Live transcription using OpenAI Realtime API
  • 📝 Output to stdout (perfect for piping)
  • 💾 Optional file output
  • 🔧 Cross-platform support (Linux, macOS, Windows)
  • ⚡ Low latency streaming transcription
  • 🎛️ Configurable audio settings

Installation

Prerequisites

Linux (Recommended)

# Ubuntu/Debian
sudo apt install alsa-utils

# Arch Linux
sudo pacman -S alsa-utils

# CentOS/RHEL/Fedora
sudo dnf install alsa-utils

macOS/Windows

# Install ffmpeg
# macOS with Homebrew
brew install ffmpeg

# Windows with Chocolatey
choco install ffmpeg

# Or download from https://ffmpeg.org/download.html

Install RTSTT

npm install -g @SorenPeng/rtstt

Setup

  1. Get your OpenAI API key from OpenAI Platform
  2. Set the environment variable:
export OPENAI_API_KEY="your_api_key_here"

Or create a .env file:

cp .env.example .env
# Edit .env and add your API key

Usage

Basic Usage

# Start real-time transcription
rtstt

# Save transcription to file
rtstt --out transcript.txt

# Quiet mode (suppress status messages)
rtstt --quiet

Advanced Usage

# Use specific model
rtstt --model gpt-4o-realtime-preview

# Custom audio settings
rtstt --rate 24000 --chunks 0.1

# Linux: specify audio device
rtstt --device hw:1,0

Composable Examples

# Search for keywords in real-time
rtstt | grep -i "important"

# Log with timestamps
rtstt | ts '[%H:%M:%S]' >> meeting_notes.txt

# Pipe to other tools
rtstt --quiet | tee >(grep "action item" >> tasks.txt)

# Use with fzf for searchable transcription
rtstt --out history.txt | fzf --tac

Command Line Options

| Option | Alias | Default | Description | |--------|-------|---------|-------------| | --model | -m | gpt-4o-realtime-preview | OpenAI model to use | | --rate | -r | 16000 | Audio sample rate in Hz | | --device | -d | | Audio input device (Linux only) | | --chunks | -c | 0.2 | Audio chunk duration in seconds | | --out | -o | | Output file to append transcription | | --quiet | -q | false | Suppress status messages | | --help | -h | | Show help |

Audio Requirements

The tool captures audio in the following format (optimized for OpenAI Realtime API):

  • Sample Rate: 16 kHz (configurable)
  • Bit Depth: 16-bit
  • Channels: Mono
  • Format: PCM (little-endian)

Environment Variables

| Variable | Description | Default | |----------|-------------|---------| | OPENAI_API_KEY | OpenAI API key (required) | - | | RTSTT_MODEL | Default model to use | gpt-4o-realtime-preview | | RTSTT_BASE_URL | Custom API base URL | wss://api.openai.com |

Troubleshooting

Audio Issues

Linux

# List audio devices
arecord -l

# Test recording
arecord -f S16_LE -r 16000 -c 1 -t raw /dev/null

# Check ALSA configuration
cat /proc/asound/cards

macOS

# List audio devices
ffmpeg -f avfoundation -list_devices true -i ""

# Test recording
ffmpeg -f avfoundation -i ":0" -t 5 test.wav

Windows

# List audio devices
ffmpeg -list_devices true -f dshow -i dummy

# Test recording
ffmpeg -f dshow -i audio="Microphone" -t 5 test.wav

Common Issues

  1. "OPENAI_API_KEY environment variable is required"

    • Set your OpenAI API key as environment variable
    • Or create a .env file with your key
  2. "Failed to start audio recording"

    • Install audio tools (alsa-utils on Linux, ffmpeg on macOS/Windows)
    • Check microphone permissions
    • Try different audio device with --device
  3. "Failed to connect to OpenAI API"

    • Check internet connection
    • Verify API key is correct and has Realtime API access
    • Check if you have sufficient credits

Development

# Clone repository
git clone <repo-url>
cd rtstt

# Install dependencies
npm install

# Run in development mode
npm run dev

# Build
npm run build

# Test built version
npm start

Architecture

  • src/cli.ts: Main CLI interface and orchestration
  • src/audio.ts: Audio capture and chunking logic
  • src/rtws.ts: OpenAI Realtime WebSocket client

Unix Philosophy

This tool follows Unix philosophy principles:

  1. Do one thing well: Only handles speech-to-text conversion
  2. Work together: Outputs to stdout for easy piping
  3. Text streams: All data flows through standard text streams
  4. Separation of concerns: Status goes to stderr, data to stdout

License

MIT

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Links