@sorenpeng/rtstt

v1.0.0

Published

a year ago

Real-time speech-to-text CLI tool using OpenAI Realtime API

0High
0Medium
0Low

sorenpeng

speech-to-text realtime openai cli voice transcription audio stt voice-recognition command-line real-time streaming

RTSTT - Real-Time Speech-to-Text

██████╗ ████████╗███████╗████████╗████████╗
██╔══██╗╚══██╔══╝██╔════╝╚══██╔══╝╚══██╔══╝
██████╔╝   ██║   ███████╗   ██║      ██║   
██╔══██╗   ██║   ╚════██║   ██║      ██║   
██║  ██║   ██║   ███████║   ██║      ██║   
╚═╝  ╚═╝   ╚═╝   ╚══════╝   ╚═╝      ╚═╝   
                                           
   Real-Time Speech-To-Text with OpenAI

A command-line tool for real-time speech-to-text transcription using OpenAI's Realtime API. Follows Unix philosophy: does one thing well, outputs to stdout, and is easily composable with other tools.

Features

🎙️ Real-time audio capture from system microphone
🔄 Live transcription using OpenAI Realtime API
📝 Output to stdout (perfect for piping)
💾 Optional file output
🔧 Cross-platform support (Linux, macOS, Windows)
⚡ Low latency streaming transcription
🎛️ Configurable audio settings

Installation

Prerequisites

Linux (Recommended)

# Ubuntu/Debian
sudo apt install alsa-utils

# Arch Linux
sudo pacman -S alsa-utils

# CentOS/RHEL/Fedora
sudo dnf install alsa-utils

macOS/Windows

# Install ffmpeg
# macOS with Homebrew
brew install ffmpeg

# Windows with Chocolatey
choco install ffmpeg

# Or download from https://ffmpeg.org/download.html

Install RTSTT

npm install -g @SorenPeng/rtstt

Setup

Get your OpenAI API key from OpenAI Platform
Set the environment variable:

export OPENAI_API_KEY="your_api_key_here"

Or create a .env file:

cp .env.example .env
# Edit .env and add your API key

Usage

Basic Usage

# Start real-time transcription
rtstt

# Save transcription to file
rtstt --out transcript.txt

# Quiet mode (suppress status messages)
rtstt --quiet

Advanced Usage

# Use specific model
rtstt --model gpt-4o-realtime-preview

# Custom audio settings
rtstt --rate 24000 --chunks 0.1

# Linux: specify audio device
rtstt --device hw:1,0

Composable Examples

# Search for keywords in real-time
rtstt | grep -i "important"

# Log with timestamps
rtstt | ts '[%H:%M:%S]' >> meeting_notes.txt

# Pipe to other tools
rtstt --quiet | tee >(grep "action item" >> tasks.txt)

# Use with fzf for searchable transcription
rtstt --out history.txt | fzf --tac

Command Line Options

| Option | Alias | Default | Description | |--------|-------|---------|-------------| | --model | -m | gpt-4o-realtime-preview | OpenAI model to use | | --rate | -r | 16000 | Audio sample rate in Hz | | --device | -d | | Audio input device (Linux only) | | --chunks | -c | 0.2 | Audio chunk duration in seconds | | --out | -o | | Output file to append transcription | | --quiet | -q | false | Suppress status messages | | --help | -h | | Show help |

Audio Requirements

The tool captures audio in the following format (optimized for OpenAI Realtime API):

Sample Rate: 16 kHz (configurable)
Bit Depth: 16-bit
Channels: Mono
Format: PCM (little-endian)

Environment Variables

| Variable | Description | Default | |----------|-------------|---------| | OPENAI_API_KEY | OpenAI API key (required) | - | | RTSTT_MODEL | Default model to use | gpt-4o-realtime-preview | | RTSTT_BASE_URL | Custom API base URL | wss://api.openai.com |

Troubleshooting

Audio Issues

Linux

# List audio devices
arecord -l

# Test recording
arecord -f S16_LE -r 16000 -c 1 -t raw /dev/null

# Check ALSA configuration
cat /proc/asound/cards

macOS

# List audio devices
ffmpeg -f avfoundation -list_devices true -i ""

# Test recording
ffmpeg -f avfoundation -i ":0" -t 5 test.wav

Windows

# List audio devices
ffmpeg -list_devices true -f dshow -i dummy

# Test recording
ffmpeg -f dshow -i audio="Microphone" -t 5 test.wav

Common Issues

"OPENAI_API_KEY environment variable is required"
- Set your OpenAI API key as environment variable
- Or create a .env file with your key
"Failed to start audio recording"
- Install audio tools (alsa-utils on Linux, ffmpeg on macOS/Windows)
- Check microphone permissions
- Try different audio device with --device
"Failed to connect to OpenAI API"
- Check internet connection
- Verify API key is correct and has Realtime API access
- Check if you have sufficient credits

Development

# Clone repository
git clone <repo-url>
cd rtstt

# Install dependencies
npm install

# Run in development mode
npm run dev

# Build
npm run build

# Test built version
npm start

Architecture

src/cli.ts: Main CLI interface and orchestration
src/audio.ts: Audio capture and chunking logic
src/rtws.ts: OpenAI Realtime WebSocket client

Unix Philosophy

This tool follows Unix philosophy principles:

Do one thing well: Only handles speech-to-text conversion
Work together: Outputs to stdout for easy piping
Text streams: All data flows through standard text streams
Separation of concerns: Status goes to stderr, data to stdout

License

MIT

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request