@sorenpeng/rtstt
v1.0.0
Published
Real-time speech-to-text CLI tool using OpenAI Realtime API
Downloads
8
Maintainers
Readme
RTSTT - Real-Time Speech-to-Text
██████╗ ████████╗███████╗████████╗████████╗
██╔══██╗╚══██╔══╝██╔════╝╚══██╔══╝╚══██╔══╝
██████╔╝ ██║ ███████╗ ██║ ██║
██╔══██╗ ██║ ╚════██║ ██║ ██║
██║ ██║ ██║ ███████║ ██║ ██║
╚═╝ ╚═╝ ╚═╝ ╚══════╝ ╚═╝ ╚═╝
Real-Time Speech-To-Text with OpenAIA command-line tool for real-time speech-to-text transcription using OpenAI's Realtime API. Follows Unix philosophy: does one thing well, outputs to stdout, and is easily composable with other tools.
Features
- 🎙️ Real-time audio capture from system microphone
- 🔄 Live transcription using OpenAI Realtime API
- 📝 Output to stdout (perfect for piping)
- 💾 Optional file output
- 🔧 Cross-platform support (Linux, macOS, Windows)
- ⚡ Low latency streaming transcription
- 🎛️ Configurable audio settings
Installation
Prerequisites
Linux (Recommended)
# Ubuntu/Debian
sudo apt install alsa-utils
# Arch Linux
sudo pacman -S alsa-utils
# CentOS/RHEL/Fedora
sudo dnf install alsa-utilsmacOS/Windows
# Install ffmpeg
# macOS with Homebrew
brew install ffmpeg
# Windows with Chocolatey
choco install ffmpeg
# Or download from https://ffmpeg.org/download.htmlInstall RTSTT
npm install -g @SorenPeng/rtsttSetup
- Get your OpenAI API key from OpenAI Platform
- Set the environment variable:
export OPENAI_API_KEY="your_api_key_here"Or create a .env file:
cp .env.example .env
# Edit .env and add your API keyUsage
Basic Usage
# Start real-time transcription
rtstt
# Save transcription to file
rtstt --out transcript.txt
# Quiet mode (suppress status messages)
rtstt --quietAdvanced Usage
# Use specific model
rtstt --model gpt-4o-realtime-preview
# Custom audio settings
rtstt --rate 24000 --chunks 0.1
# Linux: specify audio device
rtstt --device hw:1,0Composable Examples
# Search for keywords in real-time
rtstt | grep -i "important"
# Log with timestamps
rtstt | ts '[%H:%M:%S]' >> meeting_notes.txt
# Pipe to other tools
rtstt --quiet | tee >(grep "action item" >> tasks.txt)
# Use with fzf for searchable transcription
rtstt --out history.txt | fzf --tacCommand Line Options
| Option | Alias | Default | Description |
|--------|-------|---------|-------------|
| --model | -m | gpt-4o-realtime-preview | OpenAI model to use |
| --rate | -r | 16000 | Audio sample rate in Hz |
| --device | -d | | Audio input device (Linux only) |
| --chunks | -c | 0.2 | Audio chunk duration in seconds |
| --out | -o | | Output file to append transcription |
| --quiet | -q | false | Suppress status messages |
| --help | -h | | Show help |
Audio Requirements
The tool captures audio in the following format (optimized for OpenAI Realtime API):
- Sample Rate: 16 kHz (configurable)
- Bit Depth: 16-bit
- Channels: Mono
- Format: PCM (little-endian)
Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| OPENAI_API_KEY | OpenAI API key (required) | - |
| RTSTT_MODEL | Default model to use | gpt-4o-realtime-preview |
| RTSTT_BASE_URL | Custom API base URL | wss://api.openai.com |
Troubleshooting
Audio Issues
Linux
# List audio devices
arecord -l
# Test recording
arecord -f S16_LE -r 16000 -c 1 -t raw /dev/null
# Check ALSA configuration
cat /proc/asound/cardsmacOS
# List audio devices
ffmpeg -f avfoundation -list_devices true -i ""
# Test recording
ffmpeg -f avfoundation -i ":0" -t 5 test.wavWindows
# List audio devices
ffmpeg -list_devices true -f dshow -i dummy
# Test recording
ffmpeg -f dshow -i audio="Microphone" -t 5 test.wavCommon Issues
"OPENAI_API_KEY environment variable is required"
- Set your OpenAI API key as environment variable
- Or create a
.envfile with your key
"Failed to start audio recording"
- Install audio tools (alsa-utils on Linux, ffmpeg on macOS/Windows)
- Check microphone permissions
- Try different audio device with
--device
"Failed to connect to OpenAI API"
- Check internet connection
- Verify API key is correct and has Realtime API access
- Check if you have sufficient credits
Development
# Clone repository
git clone <repo-url>
cd rtstt
# Install dependencies
npm install
# Run in development mode
npm run dev
# Build
npm run build
# Test built version
npm startArchitecture
src/cli.ts: Main CLI interface and orchestrationsrc/audio.ts: Audio capture and chunking logicsrc/rtws.ts: OpenAI Realtime WebSocket client
Unix Philosophy
This tool follows Unix philosophy principles:
- Do one thing well: Only handles speech-to-text conversion
- Work together: Outputs to stdout for easy piping
- Text streams: All data flows through standard text streams
- Separation of concerns: Status goes to stderr, data to stdout
License
MIT
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
