@elizaos/plugin-elevenlabs-root
v2.0.0-alpha.1
Published
High-quality text-to-speech and speech-to-text plugin for ElizaOS using ElevenLabs API
Readme
ElevenLabs Plugin for ElizaOS
High-quality text-to-speech (TTS) and speech-to-text (STT) plugin for ElizaOS using the ElevenLabs API.
Features
- Text-to-Speech (TTS): High-quality voice synthesis with multiple voice models
- Speech-to-Text (STT): Accurate transcription with Scribe v1 model
- Speaker Diarization: Identify up to 32 different speakers
- Multi-language Support: 99 languages for STT
- Audio Event Detection: Detect laughter, applause, and other audio events
- Streaming Support: Efficient memory usage with streaming audio output
- Multi-runtime: Available for TypeScript, Python, and Rust
Installation
TypeScript (npm)
npm install @elizaos/plugin-elevenlabs
# or
bun add @elizaos/plugin-elevenlabsPython (PyPI)
pip install eliza-plugin-elevenlabsRust (crates.io)
Add to your Cargo.toml:
[dependencies]
eliza-plugin-elevenlabs = "0.1.0"Quick Start
TypeScript
import { elevenLabsPlugin } from '@elizaos/plugin-elevenlabs';
// Add to your character configuration
const character = {
plugins: ['@elizaos/plugin-elevenlabs'],
settings: {
ELEVENLABS_API_KEY: 'your-api-key',
},
};Python
from eliza_plugin_elevenlabs import ElevenLabsService
async with ElevenLabsService(api_key="your-api-key") as service:
# Text-to-speech
audio = await service.text_to_speech_bytes("Hello, world!")
# Speech-to-text
transcript = await service.speech_to_text(audio_bytes)Rust
use eliza_plugin_elevenlabs::ElevenLabsService;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let service = ElevenLabsService::new("your-api-key");
// Text-to-speech
let audio = service.text_to_speech("Hello, world!").await?;
// Speech-to-text
let transcript = service.speech_to_text(&audio).await?;
Ok(())
}Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| ELEVENLABS_API_KEY | ElevenLabs API key | Required |
| ELEVENLABS_VOICE_ID | Voice ID for TTS | EXAVITQu4vr4xnSDxMaL |
| ELEVENLABS_MODEL_ID | TTS model ID | eleven_monolingual_v1 |
| ELEVENLABS_VOICE_STABILITY | Voice stability (0-1) | 0.5 |
| ELEVENLABS_VOICE_SIMILARITY_BOOST | Similarity boost (0-1) | 0.75 |
| ELEVENLABS_VOICE_STYLE | Voice style intensity (0-1) | 0 |
| ELEVENLABS_VOICE_USE_SPEAKER_BOOST | Enable speaker boost | true |
| ELEVENLABS_OPTIMIZE_STREAMING_LATENCY | Latency optimization (0-4) | 0 |
| ELEVENLABS_OUTPUT_FORMAT | Audio output format | mp3_44100_128 |
| ELEVENLABS_BROWSER_URL | Browser proxy URL | - |
| ELEVENLABS_STT_MODEL_ID | STT model ID | scribe_v1 |
| ELEVENLABS_STT_LANGUAGE_CODE | Language code for STT | auto-detect |
| ELEVENLABS_STT_TIMESTAMPS_GRANULARITY | Timestamp detail level | word |
| ELEVENLABS_STT_DIARIZE | Enable speaker diarization | false |
| ELEVENLABS_STT_NUM_SPEAKERS | Expected number of speakers (1-32) | - |
| ELEVENLABS_STT_TAG_AUDIO_EVENTS | Tag audio events | false |
Project Structure
plugin-elevenlabs/
├── package.json # Root package with multi-language scripts
├── README.md # This file
├── .gitignore
├── .github/
│ └── workflows/
│ ├── ci.yml # CI for all languages
│ ├── npm-deploy.yml # npm publishing
│ ├── pypi-deploy.yml # PyPI publishing
│ └── crates-deploy.yml # crates.io publishing
├── typescript/ # TypeScript implementation
│ ├── package.json
│ ├── src/
│ └── README.md
├── python/ # Python implementation
│ ├── pyproject.toml
│ ├── src/
│ └── README.md
└── rust/ # Rust implementation
├── Cargo.toml
├── src/
└── README.mdDevelopment
Build All
bun run buildBuild Individual Languages
bun run build:ts # TypeScript
bun run build:python # Python
bun run build:rust # RustTest All
bun run testTest Individual Languages
bun run test:ts # TypeScript
bun run test:python # Python
bun run test:rust # RustLint
bun run lint # All languages
bun run lint:ts # TypeScript
bun run lint:python # Python
bun run lint:rust # RustModel Types
TEXT_TO_SPEECH
Converts text into spoken audio. Supports:
- Multiple voice models
- Configurable voice parameters
- Streaming output
- Various audio formats (MP3, PCM, etc.)
TRANSCRIPTION
Converts audio/video into text transcripts. Supports:
- 99 languages with auto-detection
- Speaker diarization (up to 32 speakers)
- Word/character-level timestamps
- Audio event tagging
Supported Models
TTS Models
eleven_monolingual_v1eleven_multilingual_v1eleven_multilingual_v2eleven_turbo_v2eleven_turbo_v2_5
STT Models
scribe_v1
License
MIT
