@voicebridge/voicebridge

v1.0.0

Published

2 months ago

Unified ASR & TTS SDK for Node.js — one interface for every voice AI provider

VoiceBridge

Unified ASR & TTS SDK for Node.js — one interface for every voice AI provider.

VoiceBridge lets you integrate multiple speech-to-text (ASR) and text-to-speech (TTS) providers through a single, consistent API. Switch providers with a config change — no code rewrite required.

Features

| Feature | Description | |---------|-------------| | Unified API | Single interface for ASR and TTS across all providers | | Provider Adapters | Deepgram, Google Cloud, ElevenLabs — with Azure, OpenAI, Polly, Cartesia planned | | Automatic Fallback | Seamless failover to backup providers on errors | | Retry with Backoff | Exponential backoff with jitter for transient failures | | Rate Limiting | Per-provider sliding-window rate limiting with request queuing | | TTS Caching | LRU in-memory cache for repeated synthesis requests | | Voice Aliases | Map friendly names like female-warm to provider-specific voice IDs | | Cost Estimation | Estimate and compare costs across providers before calling | | SSML Builder | Fluent API for building W3C-compliant SSML markup | | Typed Events | Full event system for monitoring provider activity | | Zero Lock-in | Provider SDKs are optional peer dependencies — install only what you use | | Tree-shakeable | ESM + CJS dual output, ships with TypeScript declarations |

Supported Providers

| Provider | ASR | TTS | Streaming | Status | |----------|:---:|:---:|:---------:|--------| | Deepgram | ✅ | — | Planned | Shipped | | Google Cloud | ✅ | ✅ | Planned | Shipped | | ElevenLabs | — | ✅ | Planned | Shipped | | Azure | Planned | Planned | Planned | P1 | | OpenAI (Whisper) | Planned | Planned | Planned | P1 | | AWS Polly | — | Planned | Planned | P1 | | Cartesia | — | Planned | Planned | P2 |

Installation

npm install VoiceBridge

Then install only the provider SDK(s) you need (they are optional peer dependencies):

# Deepgram — Speech-to-Text
npm install @deepgram/sdk

# Google Cloud — Speech-to-Text + Text-to-Speech
npm install @google-cloud/speech @google-cloud/text-to-speech

# ElevenLabs — Text-to-Speech
npm install elevenlabs

Tip: You don't need every SDK — VoiceBridge only loads adapters for installed SDKs.

Quick Start

import { VoiceBridge } from 'VoiceBridge';
import fs from 'node:fs';

// 1. Create an instance with your provider(s)
const vu = new VoiceBridge({
  providers: {
    deepgram: {
      provider: 'deepgram',
      credentials: { apiKey: process.env.DEEPGRAM_API_KEY },
    },
  },
  defaultASRProvider: 'deepgram',
});

// 2. Initialize
await vu.initialize();

// 3. Transcribe
const audio = fs.readFileSync('./recording.wav');
const result = await vu.asr.transcribe(audio, {
  provider: 'deepgram',
  language: 'en-US',
});

console.log(result.text);       // "Hello, world!"
console.log(result.confidence); // 0.98

// 4. Clean up when done
await vu.dispose();

Examples

Example 1: Transcribe Audio with Deepgram

import { VoiceBridge } from 'VoiceBridge';
import fs from 'node:fs';

const vu = new VoiceBridge({
  providers: {
    deepgram: {
      provider: 'deepgram',
      credentials: { apiKey: process.env.DEEPGRAM_API_KEY },
    },
  },
  defaultASRProvider: 'deepgram',
});

await vu.initialize();

const audio = fs.readFileSync('./interview.wav');
const result = await vu.asr.transcribe(audio, {
  provider: 'deepgram',
  language: 'en-US',
  model: 'nova-2',
  diarization: true,   // identify different speakers
  punctuation: true,
});

// Full transcript
console.log(result.text);

// Per-word timestamps & confidence
for (const word of result.words) {
  console.log(`${word.word} [${word.start}s - ${word.end}s] (${(word.confidence * 100).toFixed(1)}%)`);
}

// Speaker segments (when diarization is enabled)
for (const seg of result.speakers) {
  console.log(`Speaker ${seg.speaker}: "${seg.text}" [${seg.start}s - ${seg.end}s]`);
}

console.log(`Provider: ${result.provider}, Duration: ${result.duration}s`);

Example 2: Synthesize Speech with ElevenLabs

import { VoiceBridge } from 'VoiceBridge';
import fs from 'node:fs';

const vu = new VoiceBridge({
  providers: {
    elevenlabs: {
      provider: 'elevenlabs',
      credentials: { apiKey: process.env.ELEVENLABS_API_KEY },
    },
  },
  defaultTTSProvider: 'elevenlabs',
});

await vu.initialize();

const result = await vu.tts.synthesize('Hello from VoiceBridge! This is a test.', {
  provider: 'elevenlabs',
  voice: 'EXAVITQu4vr4xnSDxMaL',   // Bella voice ID
  outputFormat: 'mp3',
  sampleRate: 44100,
});

fs.writeFileSync('./output.mp3', result.audio);
console.log(`Generated ${result.audio.length} bytes, format: ${result.format}`);
console.log(`Characters used: ${result.charactersUsed}`);

Example 3: Automatic Fallback Between Providers

If the primary provider fails (network error, rate limit, outage), VoiceBridge automatically tries the next one:

const vu = new VoiceBridge({
  providers: {
    deepgram: {
      provider: 'deepgram',
      credentials: { apiKey: process.env.DEEPGRAM_API_KEY },
    },
    google: {
      provider: 'google',
      credentials: { credentials: JSON.parse(process.env.GOOGLE_CREDS!) },
    },
  },
  defaultASRProvider: 'deepgram',
  // When Deepgram is configured in the fallback chain, Google is used as backup
});

await vu.initialize();

// Listen for fallback events
vu.on('asr:fallback', ({ from, to, reason }) => {
  console.warn(`⚠️ Fell back from ${from} → ${to}: ${reason}`);
});

const audio = fs.readFileSync('./audio.wav');

// If Deepgram is down, this automatically falls back to Google
const result = await vu.asr.transcribe(audio, {
  provider: 'deepgram',
  language: 'en-US',
});

console.log(`Transcribed by: ${result.provider}`); // "deepgram" or "google"

Example 4: Compare Costs Before Calling

const vu = new VoiceBridge({
  providers: {
    deepgram: { provider: 'deepgram', credentials: { apiKey: '...' } },
    google:   { provider: 'google',   credentials: { apiKey: '...' } },
  },
});

// Compare ASR cost for a 5-minute audio file
const asrCosts = vu.compareASRCosts(300); // 300 seconds
console.log('ASR cost comparison (5 min):');
for (const c of asrCosts) {
  console.log(`  ${c.provider}: $${c.estimatedCost.toFixed(6)} (${c.unitType})`);
}
// Output (sorted cheapest first):
//   deepgram: $0.021500 (per minute)
//   google:   $0.030000 (per minute)

// Compare TTS cost for 10,000 characters
const ttsCosts = vu.compareTTSCosts(10_000);
console.log('\nTTS cost comparison (10K chars):');
for (const c of ttsCosts) {
  console.log(`  ${c.provider}: $${c.estimatedCost.toFixed(6)} (${c.unitType})`);
}

// Estimate a single provider
const estimate = vu.estimateASRCost('deepgram', 60, 'nova-2');
console.log(`\nDeepgram nova-2 for 1 min: $${estimate.estimatedCost}`);

Example 5: Build SSML for Expressive TTS

Use the fluent SSMLBuilder to create W3C-compliant SSML without writing XML by hand:

import { VoiceBridge, SSMLBuilder } from 'VoiceBridge';

const ssml = new SSMLBuilder()
  .text('Welcome to VoiceBridge!')
  .pause(800)                                        // 800ms pause
  .emphasis('This is really important.', 'strong')   // emphasized text
  .pause(400)
  .prosody('And now, speaking more slowly...', {     // control rate/pitch
    rate: 'slow',
    pitch: 'low',
  })
  .pause(300)
  .sayAs('12345', 'digits')                         // read as individual digits
  .pause(200)
  .sentence('Here is a sentence.')                   // <s> wrapper
  .paragraph('And a full paragraph.')                // <p> wrapper
  .build();

console.log(ssml);
// <speak>Welcome to VoiceBridge!<break time="800ms"/>
//   <emphasis level="strong">This is really important.</emphasis>
//   <break time="400ms"/>
//   <prosody rate="slow" pitch="low">And now, speaking more slowly...</prosody>
//   ...
// </speak>

// Use it with any TTS provider that supports SSML
const result = await vu.tts.synthesize(ssml, {
  provider: 'google',
  ssml: true,
});

You can also check and strip SSML:

import { isSSML, stripSSML } from 'VoiceBridge';

isSSML('<speak>Hello</speak>');   // true
isSSML('Hello');                  // false

stripSSML('<speak>Hello <break time="500ms"/> World</speak>');
// "Hello  World"

Example 6: Voice Aliases

Instead of memorizing provider-specific voice IDs, use human-friendly aliases:

const vu = new VoiceBridge({
  providers: {
    elevenlabs: { provider: 'elevenlabs', credentials: { apiKey: '...' } },
    google:     { provider: 'google',     credentials: { apiKey: '...' } },
  },
});

await vu.initialize();

// Use a built-in alias — VoiceBridge resolves it to the right voice ID per provider
const result = await vu.tts.synthesize('Hello!', {
  provider: 'elevenlabs',
  voiceAlias: 'female-warm',    // resolves to "EXAVITQu4vr4xnSDxMaL" (Bella) for ElevenLabs
});

Built-in aliases:

| Alias | Gender | ElevenLabs | Google | Azure | OpenAI | |-------|--------|------------|--------|-------|--------| | female-warm | Female | Bella | en-US-Journey-F | en-US-JennyNeural | nova | | male-professional | Male | Adam | en-US-Journey-D | en-US-GuyNeural | onyx | | female-energetic | Female | Lily | en-US-Studio-O | en-US-AriaNeural | shimmer | | male-calm | Male | Daniel | en-US-Neural2-D | en-US-DavisNeural | echo |

Register your own aliases:

vu.registerVoiceAlias({
  name: 'brand-narrator',
  description: 'Our brand narrator voice',
  gender: 'female',
  mappings: {
    elevenlabs: 'your-custom-voice-id',
    google: 'en-US-Studio-O',
    azure: 'en-US-JennyNeural',
  },
});

// Now use it anywhere
const result = await vu.tts.synthesize('Our latest product...', {
  provider: 'elevenlabs',
  voiceAlias: 'brand-narrator',
});

Example 7: Streaming ASR

Open a live transcription session and send audio chunks as they arrive:

const stream = await vu.asr.stream({
  provider: 'deepgram',
  language: 'en-US',
  interimResults: true,
  endpointing: 300,
});

// Listen for real-time partial results
stream.on('partial', (result) => {
  process.stdout.write(`\r  Hearing: ${result.text}`);
});

// Final, punctuated transcripts
stream.on('final', (result) => {
  console.log(`\n✅ ${result.text} (confidence: ${result.confidence})`);
});

stream.on('error', (err) => console.error('Stream error:', err));

// Feed audio from a microphone, file, or WebSocket
stream.send(audioChunk1);
stream.send(audioChunk2);

// When done
await stream.close();

Example 8: Event Monitoring

VoiceBridge emits typed events for every stage of the pipeline:

// Provider initialization
vu.on('provider:init', ({ provider, type }) => {
  console.log(`✓ Initialized ${type} provider: ${provider}`);
});

// ASR events
vu.on('asr:transcribe:start', ({ provider }) => {
  console.log(`🎙 Transcribing with ${provider}...`);
});

vu.on('asr:transcribe:end', ({ provider, duration }) => {
  console.log(`🎙 Transcription complete (${provider}, ${duration}s)`);
});

// TTS events
vu.on('tts:synthesize:start', ({ provider, characters }) => {
  console.log(`🔊 Synthesizing ${characters} chars with ${provider}...`);
});

// Fallback events
vu.on('asr:fallback', ({ from, to, reason }) => {
  console.warn(`⚠️ ASR fallback: ${from} → ${to} (${reason})`);
});

vu.on('tts:fallback', ({ from, to, reason }) => {
  console.warn(`⚠️ TTS fallback: ${from} → ${to} (${reason})`);
});

// Retry events
vu.on('retry', ({ provider, attempt, delay }) => {
  console.log(`🔄 Retry #${attempt} for ${provider} in ${delay}ms`);
});

// Cost tracking
vu.on('cost:estimated', (estimate) => {
  console.log(`💰 Estimated: $${estimate.estimatedCost} (${estimate.provider})`);
});

// Cache events
vu.on('cache:hit', ({ key }) => console.log(`📦 Cache hit: ${key}`));
vu.on('cache:miss', ({ key }) => console.log(`📦 Cache miss: ${key}`));

Example 9: Register a Custom Provider Plugin

Build your own adapter and plug it in at runtime:

import type { ASRProviderAdapter, ASRConfig, ASRResult } from 'VoiceBridge';

const myAdapter: ASRProviderAdapter = {
  name: 'my-whisper-server',
  supportsBatch: true,
  supportsStreaming: false,
  supportsDiarization: false,
  supportsLanguageDetection: true,

  async initialize() { /* connect to your server */ },

  async transcribe(audio: Buffer, config: ASRConfig): Promise<ASRResult> {
    // Call your custom ASR endpoint
    const response = await fetch('https://my-whisper.internal/v1/transcribe', {
      method: 'POST',
      body: audio,
      headers: { 'Content-Type': 'audio/wav' },
    });
    const data = await response.json();
    return {
      text: data.text,
      confidence: data.confidence,
      words: [],
      speakers: [],
      language: config.language ?? 'en',
      duration: data.duration,
      provider: 'my-whisper-server',
      metadata: {},
    };
  },

  isAvailable: () => true,
  getSupportedLanguages: () => ['en-US', 'es-ES', 'fr-FR'],
  getSupportedModels: () => ['whisper-large-v3'],
  async dispose() {},
};

// Register the plugin
vu.registerASRPlugin('my-whisper-server', myAdapter);

// Now use it like any other provider
const result = await vu.asr.transcribe(audio, {
  provider: 'my-whisper-server',
  language: 'en-US',
});

Example 10: Per-Provider Rate Limiting

Prevent hitting provider rate limits with built-in sliding-window limiters:

const vu = new VoiceBridge({
  providers: {
    deepgram: { provider: 'deepgram', credentials: { apiKey: '...' } },
    elevenlabs: { provider: 'elevenlabs', credentials: { apiKey: '...' } },
  },
  // Configure rate limits per provider
  rateLimits: {
    deepgram: {
      maxRequests: 100,     // max 100 requests
      windowMs: 60_000,     // per 60-second window
      maxQueueSize: 50,     // queue up to 50 overflow requests
    },
    elevenlabs: {
      maxRequests: 20,
      windowMs: 60_000,
    },
  },
});

// Requests exceeding the limit are queued automatically
// If the queue is full, an error is thrown

Example 11: Audio Format Utilities

Detect audio formats and convert between them:

import { detectAudioFormat, convertAudio } from 'VoiceBridge';

// Detect format from buffer headers
const buffer = fs.readFileSync('./unknown-file');
const format = detectAudioFormat(buffer);
console.log(format); // 'wav', 'mp3', 'flac', 'ogg', 'webm', or null

// Convert audio between PCM formats
const wavBuffer = convertAudio(pcmBuffer, {
  fromFormat: 'pcm',
  toFormat: 'wav',
  sampleRate: 16000,
  channels: 1,
  bitDepth: 16,
});

Full Configuration Reference

import { VoiceBridge } from 'VoiceBridge';

const vu = new VoiceBridge({
  // ── Provider credentials ──────────────────────────────────
  providers: {
    deepgram: {
      provider: 'deepgram',
      credentials: {
        apiKey: 'dg-...',
      },
    },
    google: {
      provider: 'google',
      credentials: {
        credentials: { /* service account JSON */ },
        projectId: 'my-gcp-project',
      },
    },
    elevenlabs: {
      provider: 'elevenlabs',
      credentials: {
        apiKey: 'el-...',
      },
    },
  },

  // ── Default providers ─────────────────────────────────────
  defaultASRProvider: 'deepgram',
  defaultTTSProvider: 'elevenlabs',

  // ── Retry configuration ───────────────────────────────────
  retry: {
    maxRetries: 3,           // attempts after first failure
    initialDelay: 500,       // first retry delay (ms)
    maxDelay: 10_000,        // cap on retry delay
    backoffMultiplier: 2,    // exponential factor
    retryableStatusCodes: [429, 500, 502, 503, 504],
    timeout: 30_000,         // per-request timeout (ms)
  },

  // ── Fallback chain ────────────────────────────────────────
  fallback: {
    providers: [
      { provider: 'deepgram', credentials: { apiKey: '...' } },
      { provider: 'google',   credentials: { apiKey: '...' } },
    ],
  },

  // ── Cache configuration (TTS only) ───────────────────────
  cache: {
    enabled: true,
    backend: 'memory',                 // 'memory' | 'redis'
    maxSize: 100 * 1024 * 1024,        // 100 MB
    ttl: 3_600_000,                    // 1 hour
  },

  // ── Per-provider rate limits ──────────────────────────────
  rateLimits: {
    deepgram:   { maxRequests: 100, windowMs: 60_000 },
    elevenlabs: { maxRequests: 20,  windowMs: 60_000, maxQueueSize: 50 },
  },

  // ── Logging ───────────────────────────────────────────────
  logLevel: 'info',   // 'debug' | 'info' | 'warn' | 'error' | 'silent'

  // Optional: provide your own log transport (e.g., winston, pino)
  // logger: { debug: ..., info: ..., warn: ..., error: ... },
});

Error Handling

VoiceBridge provides a structured error hierarchy so you can handle each failure type precisely:

import {
  VoiceBridgeError,       // Base class — all errors extend this
  ProviderError,         // Generic provider failure
  AuthenticationError,   // Invalid or expired API key
  RateLimitError,        // Provider rate limit exceeded
  QuotaExceededError,    // Usage quota exhausted
  ProviderUnavailableError,  // Provider is down
  ValidationError,       // Invalid input / config
  TimeoutError,          // Request timed out
  SDKNotInstalledError,  // Missing peer dependency
} from 'VoiceBridge';

try {
  const result = await vu.asr.transcribe(audio, { provider: 'deepgram' });
} catch (error) {
  if (error instanceof AuthenticationError) {
    // error.provider → "deepgram"
    // error.suggestion → "Verify your API key for deepgram is correct and not expired."
    console.error(`🔑 Bad API key for ${error.provider}`);

  } else if (error instanceof RateLimitError) {
    // error.retryAfterMs → 30000 (if provider sends Retry-After header)
    console.error(`⏱ Rate limited. Retry after ${error.retryAfterMs}ms`);

  } else if (error instanceof QuotaExceededError) {
    console.error(`📊 Quota exceeded for ${error.provider}`);

  } else if (error instanceof ProviderUnavailableError) {
    // error.retryable → true
    console.error(`🔌 ${error.provider} is down`);

  } else if (error instanceof TimeoutError) {
    // error.timeoutMs → 30000
    console.error(`⏰ Timed out after ${error.timeoutMs}ms`);

  } else if (error instanceof ValidationError) {
    // error.field → "language", error.expected → "BCP-47 code"
    console.error(`❌ Invalid input: ${error.message}`);

  } else if (error instanceof SDKNotInstalledError) {
    // error.packageName → "@deepgram/sdk"
    // error.suggestion → "Run: npm install @deepgram/sdk"
    console.error(error.suggestion);
  }
}

Every error includes:

| Property | Type | Description | |----------|------|-------------| | code | string | Machine-readable code (PROVIDER_ERROR, TIMEOUT_ERROR, etc.) | | provider | string? | Which provider caused the error | | operation | string? | The operation that failed | | retryable | boolean | Whether automatic retry is safe | | suggestion | string? | Human-readable fix suggestion | | cause | Error? | Original underlying error |

API Reference

`VoiceBridge` (main class)

class VoiceBridge {
  asr: ASR;     // Access ASR methods
  tts: TTS;     // Access TTS methods

  constructor(config: VoiceBridgeConfig);
  initialize(): Promise<void>;
  dispose(): Promise<void>;

  // Provider plugins
  registerASRPlugin(name: string, adapter: ASRProviderAdapter): void;
  registerTTSPlugin(name: string, adapter: TTSProviderAdapter): void;
  listASRProviders(): string[];
  listTTSProviders(): string[];

  // Voice aliases
  registerVoiceAlias(alias: VoiceAlias): void;

  // Cost estimation
  estimateASRCost(provider: string, durationSeconds: number, model?: string): CostEstimate;
  estimateTTSCost(provider: string, characterCount: number, model?: string): CostEstimate;
  compareASRCosts(durationSeconds: number, model?: string): CostEstimate[];
  compareTTSCosts(characterCount: number, model?: string): CostEstimate[];
  pricingLastUpdated: string;   // e.g. "2026-03-01"

  // Events
  on<K extends keyof VoiceBridgeEvents>(event: K, listener: (payload: VoiceBridgeEvents[K]) => void): void;
  off<K extends keyof VoiceBridgeEvents>(event: K, listener: (payload: VoiceBridgeEvents[K]) => void): void;
}

`ASR`

class ASR {
  transcribe(audio: Buffer, config: ASRConfig): Promise<ASRResult>;
  stream(config: ASRStreamConfig): Promise<ASRStream>;
}

`TTS`

class TTS {
  synthesize(text: string, config: TTSConfig): Promise<TTSResult>;
  synthesizeStream(text: string, config: TTSStreamConfig): Promise<TTSStream>;
  listVoices(provider?: string, language?: string): Promise<TTSVoice[]>;
}

`SSMLBuilder`

class SSMLBuilder {
  text(content: string): this;
  pause(timeMs: number): this;
  emphasis(text: string, level?: 'strong' | 'moderate' | 'reduced'): this;
  prosody(text: string, options: { rate?: string; pitch?: string; volume?: string }): this;
  sayAs(text: string, interpretAs: string, format?: string): this;
  paragraph(text: string): this;
  sentence(text: string): this;
  build(): string;
}

Utility Functions

detectAudioFormat(buffer: Buffer): AudioFormat | null;
convertAudio(buffer: Buffer, options: AudioConvertOptions): Buffer;
stripSSML(ssml: string): string;
isSSML(text: string): boolean;

Types

interface ASRConfig {
  provider: string;
  language?: string;
  model?: string;
  modelTier?: 'fast' | 'accurate' | 'balanced';
  diarization?: boolean;
  punctuation?: boolean;
  profanityFilter?: boolean;
  keywordBoosting?: string[];
  encoding?: AudioFormat;
  sampleRate?: SampleRate;
  channels?: number;
  options?: Record<string, unknown>;
}

interface ASRResult {
  text: string;
  confidence: number;
  words: ASRWord[];
  speakers: ASRSpeakerSegment[];
  language: string;
  duration: number;
  provider: string;
  model?: string;
  metadata: Record<string, unknown>;
}

interface TTSConfig {
  provider: string;
  voice?: string;
  voiceAlias?: string;
  model?: string;
  outputFormat?: AudioFormat;
  sampleRate?: SampleRate;
  speed?: number;
  pitch?: number;
  ssml?: boolean;
  language?: string;
  options?: Record<string, unknown>;
}

interface TTSResult {
  audio: Buffer;
  format: AudioFormat;
  sampleRate: SampleRate;
  duration?: number;
  provider: string;
  charactersUsed: number;
  metadata: Record<string, unknown>;
}

interface CostEstimate {
  provider: string;
  model?: string;
  unitType: string;
  unitPrice: number;
  estimatedCost: number;
  currency: string;
}

interface VoiceAlias {
  name: string;
  description?: string;
  gender?: 'male' | 'female' | 'neutral';
  mappings: { [provider: string]: string };
}

type AudioFormat = 'pcm' | 'wav' | 'mp3' | 'ogg' | 'mulaw' | 'alaw' | 'flac' | 'webm';
type SampleRate = 8000 | 16000 | 22050 | 24000 | 44100 | 48000;

Requirements

Node.js >= 18
TypeScript >= 5.0 (recommended, ships with .d.ts declarations)

License

MIT — see LICENSE for details.