@elizaos/plugin-elevenlabs-root

v2.0.0-alpha.1

Published

21 days ago

High-quality text-to-speech and speech-to-text plugin for ElizaOS using ElevenLabs API

0High
0Medium
0Low

elizaos plugin tts stt text-to-speech speech-to-text elevenlabs voice-synthesis transcription ai

ElevenLabs Plugin for ElizaOS

High-quality text-to-speech (TTS) and speech-to-text (STT) plugin for ElizaOS using the ElevenLabs API.

Features

Text-to-Speech (TTS): High-quality voice synthesis with multiple voice models
Speech-to-Text (STT): Accurate transcription with Scribe v1 model
Speaker Diarization: Identify up to 32 different speakers
Multi-language Support: 99 languages for STT
Audio Event Detection: Detect laughter, applause, and other audio events
Streaming Support: Efficient memory usage with streaming audio output
Multi-runtime: Available for TypeScript, Python, and Rust

Installation

TypeScript (npm)

npm install @elizaos/plugin-elevenlabs
# or
bun add @elizaos/plugin-elevenlabs

Python (PyPI)

pip install eliza-plugin-elevenlabs

Rust (crates.io)

Add to your Cargo.toml:

[dependencies]
eliza-plugin-elevenlabs = "0.1.0"

Quick Start

TypeScript

import { elevenLabsPlugin } from '@elizaos/plugin-elevenlabs';

// Add to your character configuration
const character = {
  plugins: ['@elizaos/plugin-elevenlabs'],
  settings: {
    ELEVENLABS_API_KEY: 'your-api-key',
  },
};

Python

from eliza_plugin_elevenlabs import ElevenLabsService

async with ElevenLabsService(api_key="your-api-key") as service:
    # Text-to-speech
    audio = await service.text_to_speech_bytes("Hello, world!")
    
    # Speech-to-text
    transcript = await service.speech_to_text(audio_bytes)

Rust

use eliza_plugin_elevenlabs::ElevenLabsService;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let service = ElevenLabsService::new("your-api-key");
    
    // Text-to-speech
    let audio = service.text_to_speech("Hello, world!").await?;
    
    // Speech-to-text
    let transcript = service.speech_to_text(&audio).await?;
    
    Ok(())
}

Configuration

| Environment Variable | Description | Default | |---------------------|-------------|---------| | ELEVENLABS_API_KEY | ElevenLabs API key | Required | | ELEVENLABS_VOICE_ID | Voice ID for TTS | EXAVITQu4vr4xnSDxMaL | | ELEVENLABS_MODEL_ID | TTS model ID | eleven_monolingual_v1 | | ELEVENLABS_VOICE_STABILITY | Voice stability (0-1) | 0.5 | | ELEVENLABS_VOICE_SIMILARITY_BOOST | Similarity boost (0-1) | 0.75 | | ELEVENLABS_VOICE_STYLE | Voice style intensity (0-1) | 0 | | ELEVENLABS_VOICE_USE_SPEAKER_BOOST | Enable speaker boost | true | | ELEVENLABS_OPTIMIZE_STREAMING_LATENCY | Latency optimization (0-4) | 0 | | ELEVENLABS_OUTPUT_FORMAT | Audio output format | mp3_44100_128 | | ELEVENLABS_BROWSER_URL | Browser proxy URL | - | | ELEVENLABS_STT_MODEL_ID | STT model ID | scribe_v1 | | ELEVENLABS_STT_LANGUAGE_CODE | Language code for STT | auto-detect | | ELEVENLABS_STT_TIMESTAMPS_GRANULARITY | Timestamp detail level | word | | ELEVENLABS_STT_DIARIZE | Enable speaker diarization | false | | ELEVENLABS_STT_NUM_SPEAKERS | Expected number of speakers (1-32) | - | | ELEVENLABS_STT_TAG_AUDIO_EVENTS | Tag audio events | false |

Project Structure

plugin-elevenlabs/
├── package.json          # Root package with multi-language scripts
├── README.md             # This file
├── .gitignore
├── .github/
│   └── workflows/
│       ├── ci.yml        # CI for all languages
│       ├── npm-deploy.yml    # npm publishing
│       ├── pypi-deploy.yml   # PyPI publishing
│       └── crates-deploy.yml # crates.io publishing
├── typescript/           # TypeScript implementation
│   ├── package.json
│   ├── src/
│   └── README.md
├── python/               # Python implementation
│   ├── pyproject.toml
│   ├── src/
│   └── README.md
└── rust/                 # Rust implementation
    ├── Cargo.toml
    ├── src/
    └── README.md

Development

Build All

bun run build

Build Individual Languages

bun run build:ts      # TypeScript
bun run build:python  # Python
bun run build:rust    # Rust

Test All

bun run test

Test Individual Languages

bun run test:ts      # TypeScript
bun run test:python  # Python
bun run test:rust    # Rust

Lint

bun run lint         # All languages
bun run lint:ts      # TypeScript
bun run lint:python  # Python
bun run lint:rust    # Rust

Model Types

TEXT_TO_SPEECH

Converts text into spoken audio. Supports:

Multiple voice models
Configurable voice parameters
Streaming output
Various audio formats (MP3, PCM, etc.)

TRANSCRIPTION

Converts audio/video into text transcripts. Supports:

99 languages with auto-detection
Speaker diarization (up to 32 speakers)
Word/character-level timestamps
Audio event tagging

Supported Models

TTS Models

eleven_monolingual_v1
eleven_multilingual_v1
eleven_multilingual_v2
eleven_turbo_v2
eleven_turbo_v2_5

STT Models

scribe_v1

License

MIT