npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/voice-agent-tts

v0.1.0

Published

Provider-agnostic text-to-speech interface with Deepgram Aura, AWS Polly, and Google Cloud Text-to-Speech adapters

Readme

@reaatech/voice-agent-tts

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Provider-agnostic text-to-speech interface with five adapter implementations: Deepgram Aura, AWS Polly, Google Cloud Text-to-Speech, ElevenLabs, and Cartesia. Streaming audio output via AsyncIterable<AudioChunk>, cancelable synthesis, and Twilio-ready audio formatting.

Installation

npm install @reaatech/voice-agent-tts
pnpm add @reaatech/voice-agent-tts

Provider SDKs (install only what you use)

The cloud adapters load their provider SDKs lazily and declare them as optional peer dependencies, so you only install the SDK for the provider you actually use. Deepgram needs no extra SDK.

# AWS Polly
npm install @aws-sdk/client-polly @aws-sdk/credential-provider-ini

# Google Cloud Text-to-Speech
npm install @google-cloud/text-to-speech

Feature Overview

  • Unified TTS interfaceTTSProvider with synthesize() returning AsyncIterable<AudioChunk>
  • Deepgram Aura adapter — Low-latency HTTP/2 streaming with voice selection and mulaw encoding
  • AWS Polly adapter — Neural engine with SSML support, multiple voice IDs, sample rate configuration
  • Google Cloud TTS adapter — 220+ voices, speaking rate, pitch, volume control, and SSML gender
  • ElevenLabs adapter — Streaming HTTP/2 with ultra-realistic voices (Turbo v2.5, Flash v2.5)
  • Cartesia adapter — Ultra-low latency streaming with Sonic model and emotion control
  • Cancelable synthesiscancel() stops in-progress TTS immediately (barge-in support)
  • Twilio audio formatting — Automatic mulaw 8kHz conversion via formatAudioForTwilio()
  • Silence generationcreateSilenceChunk() for injecting pauses between utterances
  • Text chunkingchunkTextForStreaming() to split long responses for streaming TTS
  • Provider factorycreateTTSProvider() for runtime provider selection

Quick Start

import { DeepgramTTSProvider } from '@reaatech/voice-agent-tts';

const tts = new DeepgramTTSProvider();

for await (const chunk of tts.synthesize('Hello, how can I help you today?', {
  provider: 'deepgram',
  apiKey: process.env.DEEPGRAM_API_KEY,
  voice: 'asteria',
  model: 'aura',
  encoding: 'mulaw',
  sampleRate: 8000,
})) {
  // Send chunk.buffer to Twilio Media Stream
  twilioHandler.sendAudio(chunk);
}

API Reference

TTSProvider Interface

interface TTSProvider {
  readonly name: string;
  synthesize(text: string, config: DeepgramTTSConfig | AWSPollyConfig | GoogleCloudTTSConfig): AsyncIterable<AudioChunk>;
  readonly supportsStreaming: boolean;
  readonly firstByteLatencyMs: number | null;
  cancel(): void;
  connect?(config: unknown): Promise<void>;
}

TTSProviderInterface (Static Utilities)

class TTSProviderInterface {
  static formatAudioForTwilio(chunk: AudioChunk): AudioChunk;
  static createSilenceChunk(durationMs: number, sampleRate?: number): AudioChunk;
  static chunkTextForStreaming(text: string, maxChunkSize?: number): string[];
}

| Method | Description | |--------|-------------| | formatAudioForTwilio | Converts any audio chunk to mulaw 8kHz for Twilio Media Streams | | createSilenceChunk | Creates a mulaw silence buffer of specified duration (default 8kHz) | | chunkTextForStreaming | Splits long text at sentence boundaries for sentence-by-sentence TTS |

DeepgramTTSProvider

class DeepgramTTSProvider implements TTSProvider {
  readonly name = 'deepgram';
  readonly supportsStreaming = true;
  constructor(options?: DeepgramTTSOptions);
  getLastFirstByteLatency(): number | null;
}

interface DeepgramTTSOptions {
  apiUrl?: string;   // default: 'api.deepgram.com'
  version?: string;  // default: 'v1'
}

interface DeepgramTTSConfig extends TTSConfig {
  model?: 'aura';
  voice?: string;        // e.g., 'asteria', 'luna', 'stella', 'arcas'
  encoding?: 'mulaw' | 'linear16' | 'pcm';
  sampleRate?: number;   // 8000, 16000, 24000, 48000
  container?: 'none' | 'wav';
}

AWSPollyProvider

class AWSPollyProvider extends EventEmitter implements TTSProvider {
  readonly name = 'aws-polly';
  readonly supportsStreaming = true;
  constructor(options?: AWSPollyOptions);
  connect(config: AWSPollyConfig): Promise<void>;
  onError(cb: (error: Error) => void): void;
  close(): Promise<void>;
  isConnected(): boolean;
}

interface AWSPollyOptions {
  region?: string;          // default: 'us-east-1'
  defaultVoiceId?: string;  // default: 'Joanna'
  defaultEngine?: Engine;   // default: NEURAL
}

interface AWSPollyConfig extends TTSConfig {
  region: string;
  voiceId?: string;          // Joanna, Matthew, Salli, etc.
  engine?: 'standard' | 'neural';
  languageCode?: string;
  sampleRate?: number;       // 8000, 16000, 22050
  textType?: 'text' | 'ssml';
}

GoogleCloudTTSProvider

class GoogleCloudTTSProvider implements TTSProvider {
  readonly name = 'google-cloud-tts';
  readonly supportsStreaming = true;
  constructor(options?: GoogleCloudTTSOptions);
  getLastFirstByteLatency(): number | null;
}

interface GoogleCloudTTSOptions {
  projectId?: string;
  keyFilename?: string;
}

interface GoogleCloudTTSConfig extends TTSConfig {
  projectId: string;
  voiceName?: string;              // e.g., 'en-US-Standard-A'
  languageCode?: string;           // e.g., 'en-US'
  ssmlGender?: 'MALE' | 'FEMALE' | 'NEUTRAL';
  audioEncoding?: 'MP3' | 'LINEAR16' | 'OGG_OPUS' | 'MULAW' | 'ALAW';
  sampleRateHertz?: number;
  speakingRate?: number;           // 0.25–4.0
  pitch?: number;                  // -20.0–20.0
  volumeGainDb?: number;           // -96.0–16.0
}

ElevenLabsProvider

class ElevenLabsProvider implements TTSProvider {
  readonly name = 'elevenlabs';
  readonly supportsStreaming = true;
  constructor(options?: ElevenLabsOptions);
  getLastFirstByteLatency(): number | null;
}

interface ElevenLabsConfig extends TTSConfig {
  modelId?: 'eleven_turbo_v2_5' | 'eleven_flash_v2_5';
  voiceId?: string;
  stability?: number;
  similarityBoost?: number;
  optimizeStreamingLatency?: number;
  outputFormat?: 'mp3_44100' | 'pcm_8000' | 'mulaw_8000';
}

Streaming HTTP/2 adapter for ElevenLabs ultra-realistic voices. Supports latency optimization and multiple output formats.

CartesiaProvider

class CartesiaProvider implements TTSProvider {
  readonly name = 'cartesia';
  readonly supportsStreaming = true;
  constructor(options?: CartesiaOptions);
  getLastFirstByteLatency(): number | null;
}

interface CartesiaConfig extends TTSConfig {
  modelId?: 'sonic' | 'sonic-2';
  voiceId?: string;
  speed?: 'slowest' | 'slow' | 'normal' | 'fast' | 'fastest';
  emotion?: 'anger' | 'positivity' | 'surprise' | 'sadness' | 'curiosity' | 'neutral';
  language?: string;
  outputFormat?: 'raw' | 'wav' | 'mp3';
  sampleRate?: number;
}

Ultra-low latency streaming adapter with Sonic model and emotion control. Sub-100ms P50 latency for real-time use.

Provider Factory

import { createTTSProvider } from '@reaatech/voice-agent-tts';

const tts = createTTSProvider({
  provider: 'deepgram',             // 'deepgram' | 'aws-polly' | 'google-cloud-tts' | 'elevenlabs' | 'cartesia'
  config: { provider: 'deepgram', apiKey: '...' },
});

Usage Patterns

Barge-In (Cancel In-Progress TTS)

// Start TTS
const ttsStream = tts.synthesize(text, config);

// User interrupts — cancel immediately
tts.cancel();
// The synthesize() generator will exit cleanly

Sentence-Level Streaming for Low Latency

import { TTSProviderInterface } from '@reaatech/voice-agent-tts';

const sentences = TTSProviderInterface.chunkTextForStreaming(longText, 200);

for (const sentence of sentences) {
  for await (const chunk of tts.synthesize(sentence, config)) {
    handler.sendAudio(chunk);
  }
}

Silence Between Utterances

import { TTSProviderInterface } from '@reaatech/voice-agent-tts';

// 500ms silence gap
const silence = TTSProviderInterface.createSilenceChunk(500);
handler.sendAudio(silence);

Related Packages

License

MIT