@okw/stt

v1.0.0

Published

6 months ago

Text to Speech CLI using ElevenLabs API

0High
0Medium
0Low

okwasniewski

text to speech cli elevenlabs tts

`stt` - Text to Speech CLI 🎤

Give your AI agent a voice using ElevenLabs API. Streams audio directly to your terminal.

Why?

AI coding agents can use this CLI to speak to you - announce completions, read errors aloud, or provide audio feedback while you work.

Requirements

Node.js 18+
mpv for audio playback
ElevenLabs API key (get one here)

# macOS
brew install mpv

Setup

Set your ElevenLabs API key:

export ELEVENLABS_API_KEY=your_api_key_here

Usage

# Basic usage (recommended)
npx @okw/stt "Hello, world!"

# Save to file instead of playing
npx @okw/stt -o output.mp3 "Hello, world!"

# Use a different voice
npx @okw/stt -v CwhRBWXzGAHq8TQ4Fs17 "Hello, world!"

# Adjust speech speed
npx @okw/stt --speed 1.5 "Faster speech"

# Use audio tags (v3 model)
npx @okw/stt "I'm so excited! [laughs] This is amazing!"

Global Installation

If you use it frequently, install globally:

npm install -g @okw/stt
stt "Hello, world!"

Give Your AI Agent a Voice

Add this to your AGENTS.md to let your AI agent speak:

## Voice Output

You can speak to the user using: `npx @okw/stt "Your message"`

Use sparingly for:
- Task completion announcements
- Critical errors
- When user explicitly asks for voice output

Keep messages to 1-2 sentences. Don't read everything aloud.

Options

| Option | Description | Default | | --------------------- | ------------------------------- | ---------------------- | | -v, --voice <id> | Voice ID | UgBBYS2sOqTuMpoF3BR0 | | -m, --model <id> | Model ID | eleven_v3 | | -s, --stability <n> | Voice stability (0-1) | 0.5 | | -b, --boost <n> | Similarity boost (0-1) | 0.75 | | --speed <n> | Speech speed (0.5-2.0) | 1.0 | | -o, --output <file> | Save to file instead of playing | - |

Available Voices

Run this to list available voices:

curl -s "https://api.elevenlabs.io/v1/voices" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" | \
  jq -r '.voices[] | "\(.voice_id) | \(.name)"'

Popular premade voices:

| ID | Name | | ---------------------- | --------------------------- | | CwhRBWXzGAHq8TQ4Fs17 | Roger - Laid-Back, Casual | | EXAVITQu4vr4xnSDxMaL | Sarah - Mature, Reassuring | | IKne3meq5aSn9XLyUdCD | Charlie - Deep, Confident | | onwK4e9ZLuTAKqWW03F9 | Daniel - Steady Broadcaster | | pFZP5JQG7iQjIQuC4Bku | Lily - Velvety Actress |

Models

| Model | Description | | ------------------------ | ------------------------------------ | | eleven_v3 | Most expressive, supports audio tags | | eleven_turbo_v2_5 | Low latency, good quality | | eleven_flash_v2_5 | Ultra-low latency (<75ms) | | eleven_multilingual_v2 | Best for non-English |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

stt - Text to Speech CLI 🎤