npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@create-voice-agent/elevenlabs

v0.1.0

Published

ElevenLabs Text-to-Speech integration for voice agents

Readme

@create-voice-agent/elevenlabs 🔊

ElevenLabs Text-to-Speech integration for create-voice-agent.

This package provides high-quality, low-latency voice synthesis using ElevenLabs' streaming TTS API.

Installation

npm install @create-voice-agent/elevenlabs
# or
pnpm add @create-voice-agent/elevenlabs

Quick Start

import { createVoiceAgent } from "create-voice-agent";
import { AssemblyAISpeechToText } from "@create-voice-agent/assemblyai";
import { ElevenLabsTextToSpeech } from "@create-voice-agent/elevenlabs";

const voiceAgent = createVoiceAgent({
  model: new ChatOpenAI({ model: "gpt-4o" }),
  
  stt: new AssemblyAISpeechToText({ /* ... */ }),
  
  tts: new ElevenLabsTextToSpeech({
    apiKey: process.env.ELEVENLABS_API_KEY!,
    voiceId: process.env.ELEVENLABS_VOICE_ID!,
  }),
});

API Reference

ElevenLabsTextToSpeech

Streaming Text-to-Speech model using ElevenLabs' HTTP API.

import { ElevenLabsTextToSpeech } from "@create-voice-agent/elevenlabs";

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel
  
  // Optional configuration
  modelId: "eleven_flash_v2_5",
  outputFormat: "pcm_16000",
  optimizeStreamingLatency: 3,
  
  // Voice settings
  voiceSettings: {
    stability: 0.5,
    similarityBoost: 0.75,
    style: 0.3,
    speed: 1.0,
    useSpeakerBoost: true,
  },
  
  // Token batching
  flushDelayMs: 300,
  
  // Callbacks
  onAudioComplete: () => console.log("Finished speaking"),
  onInterrupt: () => console.log("Speech interrupted"),
});

Configuration Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | apiKey | string | required | ElevenLabs API key | | voiceId | string | required | Voice ID to use | | modelId | string | "eleven_flash_v2_5" | TTS model ID | | languageCode | string | - | ISO 639-1 language code (e.g., "en", "es") | | outputFormat | string | "pcm_16000" | Audio output format | | optimizeStreamingLatency | 0-4 | 3 | Latency optimization level | | flushDelayMs | number | 300 | Token batching delay (ms) | | seed | number | - | Seed for deterministic generation | | previousText | string | - | Context text before current request | | nextText | string | - | Context text after current request | | applyTextNormalization | "auto" \| "on" \| "off" | "auto" | Text normalization mode | | applyLanguageTextNormalization | boolean | false | Language-specific normalization (⚠️ high latency) |

Voice Settings

Fine-tune the generated speech characteristics:

interface ElevenLabsVoiceSettings {
  /** Speech stability (0-1). Lower = more expressive, higher = more consistent */
  stability?: number;
  
  /** Voice similarity (0-1). Higher = closer to reference voice */
  similarityBoost?: number;
  
  /** Enable speaker boost for enhanced clarity */
  useSpeakerBoost?: boolean;
  
  /** Style/expressiveness (0-1). Only for certain models */
  style?: number;
  
  /** Speech speed (0.5-2.0) */
  speed?: number;
}

Example: Expressive Storytelling Voice

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  voiceSettings: {
    stability: 0.3,        // More expressive
    similarityBoost: 0.8,  // Close to reference
    style: 0.6,            // More stylized
    speed: 0.9,            // Slightly slower
  },
});

Example: Consistent Professional Voice

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  voiceSettings: {
    stability: 0.8,        // Very consistent
    similarityBoost: 0.7,
    useSpeakerBoost: true, // Enhanced clarity
    speed: 1.0,
  },
});

Models

| Model ID | Description | Best For | |----------|-------------|----------| | eleven_flash_v2_5 | Fastest, lowest latency (default) | Real-time conversations | | eleven_turbo_v2_5 | Fast with higher quality | Balanced speed/quality | | eleven_multilingual_v2 | Best multilingual support | Non-English or mixed languages | | eleven_monolingual_v1 | Original English model | Legacy compatibility |

Output Formats

PCM (Recommended for voice agents)

| Format | Sample Rate | Description | |--------|-------------|-------------| | pcm_8000 | 8 kHz | Telephone quality | | pcm_16000 | 16 kHz | Standard voice (default) | | pcm_22050 | 22.05 kHz | Higher quality | | pcm_24000 | 24 kHz | High quality | | pcm_44100 | 44.1 kHz | CD quality | | pcm_48000 | 48 kHz | Professional quality |

MP3

| Format | Sample Rate | Bitrate | |--------|-------------|---------| | mp3_22050_32 | 22.05 kHz | 32 kbps | | mp3_44100_64 | 44.1 kHz | 64 kbps | | mp3_44100_128 | 44.1 kHz | 128 kbps | | mp3_44100_192 | 44.1 kHz | 192 kbps |

Other Formats

| Format | Description | |--------|-------------| | ulaw_8000 | μ-law 8kHz (telephony) | | alaw_8000 | A-law 8kHz (telephony) | | opus_48000_* | Opus codec (32-192 kbps) |

Latency Optimization

Control the trade-off between latency and quality:

| Level | Description | Use Case | |-------|-------------|----------| | 0 | No optimization | Highest quality | | 1 | ~50% latency reduction | Balanced | | 2 | ~75% latency reduction | Lower latency | | 3 | Maximum optimization (default) | Real-time conversations | | 4 | Max + disable text normalizer | Fastest (may mispronounce numbers/dates) |

// For real-time conversations (fastest)
const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  optimizeStreamingLatency: 4,
});

// For pre-recorded content (highest quality)
const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  optimizeStreamingLatency: 0,
});

Token Batching

The TTS model batches incoming text tokens before sending to ElevenLabs for more natural speech generation:

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  
  // Wait 300ms after last token before generating speech
  flushDelayMs: 300,
});
  • Lower values (100-200ms): Faster response, may sound choppy
  • Higher values (400-500ms): More natural speech, higher latency
  • Default (300ms): Good balance for most use cases

Instance Methods

interrupt()

Interrupt the current speech generation. Useful for barge-in handling.

// User started speaking - stop the agent
tts.interrupt();

speak(text: string): ReadableStream<Buffer>

Generate speech directly without going through the voice pipeline. Returns a ReadableStream of PCM audio buffers.

This is useful for:

  • Initial greetings when a call starts
  • System announcements that bypass the agent
  • One-off speech synthesis outside of conversations
const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
});

// Generate and play a greeting
const audioStream = tts.speak("Welcome to our service! How can I help you?");

for await (const chunk of audioStream) {
  // Send to audio output (speakers, WebRTC, etc.)
  audioOutput.write(chunk);
}

The speak() method uses the same voice settings and configuration as the main TTS pipeline, ensuring consistent voice quality.

Callbacks

onAudioComplete

Called when speech generation finishes (not interrupted).

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  onAudioComplete: () => {
    console.log("Agent finished speaking");
    // Trigger next action, update UI, etc.
  },
});

onInterrupt

Called when speech is interrupted (e.g., by barge-in).

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  onInterrupt: () => {
    console.log("Speech was interrupted");
  },
});

Finding Voice IDs

Using the API

const response = await fetch("https://api.elevenlabs.io/v1/voices", {
  headers: { "xi-api-key": process.env.ELEVENLABS_API_KEY! },
});
const { voices } = await response.json();

for (const voice of voices) {
  console.log(`${voice.name}: ${voice.voice_id}`);
}

Popular Pre-made Voices

| Voice | ID | Description | |-------|-----|-------------| | Rachel | 21m00Tcm4TlvDq8ikWAM | American female, calm | | Domi | AZnzlk1XvdvUeBnXmlld | American female, strong | | Bella | EXAVITQu4vr4xnSDxMaL | American female, soft | | Antoni | ErXwobaYiN019PkySvjV | American male, warm | | Josh | TxGEqnHWrfWFTfGW9XjX | American male, deep | | Arnold | VR6AewLTigWG4xSOukaG | American male, crisp | | Adam | pNInz6obpgDQGcFmaJgB | American male, deep | | Sam | yoZ06aMxZJJ28mfd3POQ | American male, raspy |

Multilingual Support

For non-English or mixed-language content:

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  modelId: "eleven_multilingual_v2",
  languageCode: "es", // Spanish
});

Supported Languages

The eleven_multilingual_v2 model supports 29 languages including: English, Spanish, French, German, Italian, Portuguese, Polish, Hindi, Arabic, Japanese, Korean, Mandarin, and more.

Text Normalization

Control how text is processed before synthesis:

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  
  // "auto" - Let the system decide (default)
  // "on"   - Always normalize (spell out numbers, dates, etc.)
  // "off"  - Skip normalization
  applyTextNormalization: "on",
});

Note: For eleven_turbo_v2_5 and eleven_flash_v2_5 models, text normalization requires an Enterprise plan.

Deterministic Generation

Use seeds for reproducible output:

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: "your-voice-id",
  seed: 12345, // 0 to 4294967295
});

Note: Determinism is not guaranteed but the system will attempt to produce consistent results.

Complete Example

import { createVoiceAgent, createThinkingFillerMiddleware } from "create-voice-agent";
import { AssemblyAISpeechToText } from "@create-voice-agent/assemblyai";
import { ElevenLabsTextToSpeech } from "@create-voice-agent/elevenlabs";
import { ChatOpenAI } from "@langchain/openai";

const tts = new ElevenLabsTextToSpeech({
  apiKey: process.env.ELEVENLABS_API_KEY!,
  voiceId: process.env.ELEVENLABS_VOICE_ID!,
  modelId: "eleven_flash_v2_5",
  outputFormat: "pcm_16000",
  optimizeStreamingLatency: 3,
  
  voiceSettings: {
    stability: 0.5,
    similarityBoost: 0.75,
    useSpeakerBoost: true,
  },
  
  onAudioComplete: () => console.log("Agent finished speaking"),
});

const stt = new AssemblyAISpeechToText({
  apiKey: process.env.ASSEMBLYAI_API_KEY!,
  onSpeechStart: () => {
    // Barge-in: user started speaking, interrupt the agent
    tts.interrupt();
  },
});

const voiceAgent = createVoiceAgent({
  model: new ChatOpenAI({ model: "gpt-4o" }),
  prompt: "You are a friendly voice assistant. Keep responses concise.",
  
  stt,
  tts,
  
  middleware: [
    createThinkingFillerMiddleware({ thresholdMs: 1000 }),
  ],
});

// Process audio streams
const audioOutput = voiceAgent.process(audioInputStream);

License

MIT