@voicebridge/voicebridge
v1.0.0
Published
Unified ASR & TTS SDK for Node.js — one interface for every voice AI provider
Maintainers
Readme
VoiceBridge
Unified ASR & TTS SDK for Node.js — one interface for every voice AI provider.
VoiceBridge lets you integrate multiple speech-to-text (ASR) and text-to-speech (TTS) providers through a single, consistent API. Switch providers with a config change — no code rewrite required.
Table of Contents
- Features
- Supported Providers
- Installation
- Quick Start
- Examples
- Full Configuration Reference
- Error Handling
- API Reference
- Types
- Requirements
- License
Features
| Feature | Description |
|---------|-------------|
| Unified API | Single interface for ASR and TTS across all providers |
| Provider Adapters | Deepgram, Google Cloud, ElevenLabs — with Azure, OpenAI, Polly, Cartesia planned |
| Automatic Fallback | Seamless failover to backup providers on errors |
| Retry with Backoff | Exponential backoff with jitter for transient failures |
| Rate Limiting | Per-provider sliding-window rate limiting with request queuing |
| TTS Caching | LRU in-memory cache for repeated synthesis requests |
| Voice Aliases | Map friendly names like female-warm to provider-specific voice IDs |
| Cost Estimation | Estimate and compare costs across providers before calling |
| SSML Builder | Fluent API for building W3C-compliant SSML markup |
| Typed Events | Full event system for monitoring provider activity |
| Zero Lock-in | Provider SDKs are optional peer dependencies — install only what you use |
| Tree-shakeable | ESM + CJS dual output, ships with TypeScript declarations |
Supported Providers
| Provider | ASR | TTS | Streaming | Status | |----------|:---:|:---:|:---------:|--------| | Deepgram | ✅ | — | Planned | Shipped | | Google Cloud | ✅ | ✅ | Planned | Shipped | | ElevenLabs | — | ✅ | Planned | Shipped | | Azure | Planned | Planned | Planned | P1 | | OpenAI (Whisper) | Planned | Planned | Planned | P1 | | AWS Polly | — | Planned | Planned | P1 | | Cartesia | — | Planned | Planned | P2 |
Installation
npm install VoiceBridgeThen install only the provider SDK(s) you need (they are optional peer dependencies):
# Deepgram — Speech-to-Text
npm install @deepgram/sdk
# Google Cloud — Speech-to-Text + Text-to-Speech
npm install @google-cloud/speech @google-cloud/text-to-speech
# ElevenLabs — Text-to-Speech
npm install elevenlabsTip: You don't need every SDK — VoiceBridge only loads adapters for installed SDKs.
Quick Start
import { VoiceBridge } from 'VoiceBridge';
import fs from 'node:fs';
// 1. Create an instance with your provider(s)
const vu = new VoiceBridge({
providers: {
deepgram: {
provider: 'deepgram',
credentials: { apiKey: process.env.DEEPGRAM_API_KEY },
},
},
defaultASRProvider: 'deepgram',
});
// 2. Initialize
await vu.initialize();
// 3. Transcribe
const audio = fs.readFileSync('./recording.wav');
const result = await vu.asr.transcribe(audio, {
provider: 'deepgram',
language: 'en-US',
});
console.log(result.text); // "Hello, world!"
console.log(result.confidence); // 0.98
// 4. Clean up when done
await vu.dispose();Examples
Example 1: Transcribe Audio with Deepgram
import { VoiceBridge } from 'VoiceBridge';
import fs from 'node:fs';
const vu = new VoiceBridge({
providers: {
deepgram: {
provider: 'deepgram',
credentials: { apiKey: process.env.DEEPGRAM_API_KEY },
},
},
defaultASRProvider: 'deepgram',
});
await vu.initialize();
const audio = fs.readFileSync('./interview.wav');
const result = await vu.asr.transcribe(audio, {
provider: 'deepgram',
language: 'en-US',
model: 'nova-2',
diarization: true, // identify different speakers
punctuation: true,
});
// Full transcript
console.log(result.text);
// Per-word timestamps & confidence
for (const word of result.words) {
console.log(`${word.word} [${word.start}s - ${word.end}s] (${(word.confidence * 100).toFixed(1)}%)`);
}
// Speaker segments (when diarization is enabled)
for (const seg of result.speakers) {
console.log(`Speaker ${seg.speaker}: "${seg.text}" [${seg.start}s - ${seg.end}s]`);
}
console.log(`Provider: ${result.provider}, Duration: ${result.duration}s`);Example 2: Synthesize Speech with ElevenLabs
import { VoiceBridge } from 'VoiceBridge';
import fs from 'node:fs';
const vu = new VoiceBridge({
providers: {
elevenlabs: {
provider: 'elevenlabs',
credentials: { apiKey: process.env.ELEVENLABS_API_KEY },
},
},
defaultTTSProvider: 'elevenlabs',
});
await vu.initialize();
const result = await vu.tts.synthesize('Hello from VoiceBridge! This is a test.', {
provider: 'elevenlabs',
voice: 'EXAVITQu4vr4xnSDxMaL', // Bella voice ID
outputFormat: 'mp3',
sampleRate: 44100,
});
fs.writeFileSync('./output.mp3', result.audio);
console.log(`Generated ${result.audio.length} bytes, format: ${result.format}`);
console.log(`Characters used: ${result.charactersUsed}`);Example 3: Automatic Fallback Between Providers
If the primary provider fails (network error, rate limit, outage), VoiceBridge automatically tries the next one:
const vu = new VoiceBridge({
providers: {
deepgram: {
provider: 'deepgram',
credentials: { apiKey: process.env.DEEPGRAM_API_KEY },
},
google: {
provider: 'google',
credentials: { credentials: JSON.parse(process.env.GOOGLE_CREDS!) },
},
},
defaultASRProvider: 'deepgram',
// When Deepgram is configured in the fallback chain, Google is used as backup
});
await vu.initialize();
// Listen for fallback events
vu.on('asr:fallback', ({ from, to, reason }) => {
console.warn(`⚠️ Fell back from ${from} → ${to}: ${reason}`);
});
const audio = fs.readFileSync('./audio.wav');
// If Deepgram is down, this automatically falls back to Google
const result = await vu.asr.transcribe(audio, {
provider: 'deepgram',
language: 'en-US',
});
console.log(`Transcribed by: ${result.provider}`); // "deepgram" or "google"Example 4: Compare Costs Before Calling
const vu = new VoiceBridge({
providers: {
deepgram: { provider: 'deepgram', credentials: { apiKey: '...' } },
google: { provider: 'google', credentials: { apiKey: '...' } },
},
});
// Compare ASR cost for a 5-minute audio file
const asrCosts = vu.compareASRCosts(300); // 300 seconds
console.log('ASR cost comparison (5 min):');
for (const c of asrCosts) {
console.log(` ${c.provider}: $${c.estimatedCost.toFixed(6)} (${c.unitType})`);
}
// Output (sorted cheapest first):
// deepgram: $0.021500 (per minute)
// google: $0.030000 (per minute)
// Compare TTS cost for 10,000 characters
const ttsCosts = vu.compareTTSCosts(10_000);
console.log('\nTTS cost comparison (10K chars):');
for (const c of ttsCosts) {
console.log(` ${c.provider}: $${c.estimatedCost.toFixed(6)} (${c.unitType})`);
}
// Estimate a single provider
const estimate = vu.estimateASRCost('deepgram', 60, 'nova-2');
console.log(`\nDeepgram nova-2 for 1 min: $${estimate.estimatedCost}`);Example 5: Build SSML for Expressive TTS
Use the fluent SSMLBuilder to create W3C-compliant SSML without writing XML by hand:
import { VoiceBridge, SSMLBuilder } from 'VoiceBridge';
const ssml = new SSMLBuilder()
.text('Welcome to VoiceBridge!')
.pause(800) // 800ms pause
.emphasis('This is really important.', 'strong') // emphasized text
.pause(400)
.prosody('And now, speaking more slowly...', { // control rate/pitch
rate: 'slow',
pitch: 'low',
})
.pause(300)
.sayAs('12345', 'digits') // read as individual digits
.pause(200)
.sentence('Here is a sentence.') // <s> wrapper
.paragraph('And a full paragraph.') // <p> wrapper
.build();
console.log(ssml);
// <speak>Welcome to VoiceBridge!<break time="800ms"/>
// <emphasis level="strong">This is really important.</emphasis>
// <break time="400ms"/>
// <prosody rate="slow" pitch="low">And now, speaking more slowly...</prosody>
// ...
// </speak>
// Use it with any TTS provider that supports SSML
const result = await vu.tts.synthesize(ssml, {
provider: 'google',
ssml: true,
});You can also check and strip SSML:
import { isSSML, stripSSML } from 'VoiceBridge';
isSSML('<speak>Hello</speak>'); // true
isSSML('Hello'); // false
stripSSML('<speak>Hello <break time="500ms"/> World</speak>');
// "Hello World"Example 6: Voice Aliases
Instead of memorizing provider-specific voice IDs, use human-friendly aliases:
const vu = new VoiceBridge({
providers: {
elevenlabs: { provider: 'elevenlabs', credentials: { apiKey: '...' } },
google: { provider: 'google', credentials: { apiKey: '...' } },
},
});
await vu.initialize();
// Use a built-in alias — VoiceBridge resolves it to the right voice ID per provider
const result = await vu.tts.synthesize('Hello!', {
provider: 'elevenlabs',
voiceAlias: 'female-warm', // resolves to "EXAVITQu4vr4xnSDxMaL" (Bella) for ElevenLabs
});Built-in aliases:
| Alias | Gender | ElevenLabs | Google | Azure | OpenAI |
|-------|--------|------------|--------|-------|--------|
| female-warm | Female | Bella | en-US-Journey-F | en-US-JennyNeural | nova |
| male-professional | Male | Adam | en-US-Journey-D | en-US-GuyNeural | onyx |
| female-energetic | Female | Lily | en-US-Studio-O | en-US-AriaNeural | shimmer |
| male-calm | Male | Daniel | en-US-Neural2-D | en-US-DavisNeural | echo |
Register your own aliases:
vu.registerVoiceAlias({
name: 'brand-narrator',
description: 'Our brand narrator voice',
gender: 'female',
mappings: {
elevenlabs: 'your-custom-voice-id',
google: 'en-US-Studio-O',
azure: 'en-US-JennyNeural',
},
});
// Now use it anywhere
const result = await vu.tts.synthesize('Our latest product...', {
provider: 'elevenlabs',
voiceAlias: 'brand-narrator',
});Example 7: Streaming ASR
Open a live transcription session and send audio chunks as they arrive:
const stream = await vu.asr.stream({
provider: 'deepgram',
language: 'en-US',
interimResults: true,
endpointing: 300,
});
// Listen for real-time partial results
stream.on('partial', (result) => {
process.stdout.write(`\r Hearing: ${result.text}`);
});
// Final, punctuated transcripts
stream.on('final', (result) => {
console.log(`\n✅ ${result.text} (confidence: ${result.confidence})`);
});
stream.on('error', (err) => console.error('Stream error:', err));
// Feed audio from a microphone, file, or WebSocket
stream.send(audioChunk1);
stream.send(audioChunk2);
// When done
await stream.close();Example 8: Event Monitoring
VoiceBridge emits typed events for every stage of the pipeline:
// Provider initialization
vu.on('provider:init', ({ provider, type }) => {
console.log(`✓ Initialized ${type} provider: ${provider}`);
});
// ASR events
vu.on('asr:transcribe:start', ({ provider }) => {
console.log(`🎙 Transcribing with ${provider}...`);
});
vu.on('asr:transcribe:end', ({ provider, duration }) => {
console.log(`🎙 Transcription complete (${provider}, ${duration}s)`);
});
// TTS events
vu.on('tts:synthesize:start', ({ provider, characters }) => {
console.log(`🔊 Synthesizing ${characters} chars with ${provider}...`);
});
// Fallback events
vu.on('asr:fallback', ({ from, to, reason }) => {
console.warn(`⚠️ ASR fallback: ${from} → ${to} (${reason})`);
});
vu.on('tts:fallback', ({ from, to, reason }) => {
console.warn(`⚠️ TTS fallback: ${from} → ${to} (${reason})`);
});
// Retry events
vu.on('retry', ({ provider, attempt, delay }) => {
console.log(`🔄 Retry #${attempt} for ${provider} in ${delay}ms`);
});
// Cost tracking
vu.on('cost:estimated', (estimate) => {
console.log(`💰 Estimated: $${estimate.estimatedCost} (${estimate.provider})`);
});
// Cache events
vu.on('cache:hit', ({ key }) => console.log(`📦 Cache hit: ${key}`));
vu.on('cache:miss', ({ key }) => console.log(`📦 Cache miss: ${key}`));Example 9: Register a Custom Provider Plugin
Build your own adapter and plug it in at runtime:
import type { ASRProviderAdapter, ASRConfig, ASRResult } from 'VoiceBridge';
const myAdapter: ASRProviderAdapter = {
name: 'my-whisper-server',
supportsBatch: true,
supportsStreaming: false,
supportsDiarization: false,
supportsLanguageDetection: true,
async initialize() { /* connect to your server */ },
async transcribe(audio: Buffer, config: ASRConfig): Promise<ASRResult> {
// Call your custom ASR endpoint
const response = await fetch('https://my-whisper.internal/v1/transcribe', {
method: 'POST',
body: audio,
headers: { 'Content-Type': 'audio/wav' },
});
const data = await response.json();
return {
text: data.text,
confidence: data.confidence,
words: [],
speakers: [],
language: config.language ?? 'en',
duration: data.duration,
provider: 'my-whisper-server',
metadata: {},
};
},
isAvailable: () => true,
getSupportedLanguages: () => ['en-US', 'es-ES', 'fr-FR'],
getSupportedModels: () => ['whisper-large-v3'],
async dispose() {},
};
// Register the plugin
vu.registerASRPlugin('my-whisper-server', myAdapter);
// Now use it like any other provider
const result = await vu.asr.transcribe(audio, {
provider: 'my-whisper-server',
language: 'en-US',
});Example 10: Per-Provider Rate Limiting
Prevent hitting provider rate limits with built-in sliding-window limiters:
const vu = new VoiceBridge({
providers: {
deepgram: { provider: 'deepgram', credentials: { apiKey: '...' } },
elevenlabs: { provider: 'elevenlabs', credentials: { apiKey: '...' } },
},
// Configure rate limits per provider
rateLimits: {
deepgram: {
maxRequests: 100, // max 100 requests
windowMs: 60_000, // per 60-second window
maxQueueSize: 50, // queue up to 50 overflow requests
},
elevenlabs: {
maxRequests: 20,
windowMs: 60_000,
},
},
});
// Requests exceeding the limit are queued automatically
// If the queue is full, an error is thrownExample 11: Audio Format Utilities
Detect audio formats and convert between them:
import { detectAudioFormat, convertAudio } from 'VoiceBridge';
// Detect format from buffer headers
const buffer = fs.readFileSync('./unknown-file');
const format = detectAudioFormat(buffer);
console.log(format); // 'wav', 'mp3', 'flac', 'ogg', 'webm', or null
// Convert audio between PCM formats
const wavBuffer = convertAudio(pcmBuffer, {
fromFormat: 'pcm',
toFormat: 'wav',
sampleRate: 16000,
channels: 1,
bitDepth: 16,
});Full Configuration Reference
import { VoiceBridge } from 'VoiceBridge';
const vu = new VoiceBridge({
// ── Provider credentials ──────────────────────────────────
providers: {
deepgram: {
provider: 'deepgram',
credentials: {
apiKey: 'dg-...',
},
},
google: {
provider: 'google',
credentials: {
credentials: { /* service account JSON */ },
projectId: 'my-gcp-project',
},
},
elevenlabs: {
provider: 'elevenlabs',
credentials: {
apiKey: 'el-...',
},
},
},
// ── Default providers ─────────────────────────────────────
defaultASRProvider: 'deepgram',
defaultTTSProvider: 'elevenlabs',
// ── Retry configuration ───────────────────────────────────
retry: {
maxRetries: 3, // attempts after first failure
initialDelay: 500, // first retry delay (ms)
maxDelay: 10_000, // cap on retry delay
backoffMultiplier: 2, // exponential factor
retryableStatusCodes: [429, 500, 502, 503, 504],
timeout: 30_000, // per-request timeout (ms)
},
// ── Fallback chain ────────────────────────────────────────
fallback: {
providers: [
{ provider: 'deepgram', credentials: { apiKey: '...' } },
{ provider: 'google', credentials: { apiKey: '...' } },
],
},
// ── Cache configuration (TTS only) ───────────────────────
cache: {
enabled: true,
backend: 'memory', // 'memory' | 'redis'
maxSize: 100 * 1024 * 1024, // 100 MB
ttl: 3_600_000, // 1 hour
},
// ── Per-provider rate limits ──────────────────────────────
rateLimits: {
deepgram: { maxRequests: 100, windowMs: 60_000 },
elevenlabs: { maxRequests: 20, windowMs: 60_000, maxQueueSize: 50 },
},
// ── Logging ───────────────────────────────────────────────
logLevel: 'info', // 'debug' | 'info' | 'warn' | 'error' | 'silent'
// Optional: provide your own log transport (e.g., winston, pino)
// logger: { debug: ..., info: ..., warn: ..., error: ... },
});Error Handling
VoiceBridge provides a structured error hierarchy so you can handle each failure type precisely:
import {
VoiceBridgeError, // Base class — all errors extend this
ProviderError, // Generic provider failure
AuthenticationError, // Invalid or expired API key
RateLimitError, // Provider rate limit exceeded
QuotaExceededError, // Usage quota exhausted
ProviderUnavailableError, // Provider is down
ValidationError, // Invalid input / config
TimeoutError, // Request timed out
SDKNotInstalledError, // Missing peer dependency
} from 'VoiceBridge';
try {
const result = await vu.asr.transcribe(audio, { provider: 'deepgram' });
} catch (error) {
if (error instanceof AuthenticationError) {
// error.provider → "deepgram"
// error.suggestion → "Verify your API key for deepgram is correct and not expired."
console.error(`🔑 Bad API key for ${error.provider}`);
} else if (error instanceof RateLimitError) {
// error.retryAfterMs → 30000 (if provider sends Retry-After header)
console.error(`⏱ Rate limited. Retry after ${error.retryAfterMs}ms`);
} else if (error instanceof QuotaExceededError) {
console.error(`📊 Quota exceeded for ${error.provider}`);
} else if (error instanceof ProviderUnavailableError) {
// error.retryable → true
console.error(`🔌 ${error.provider} is down`);
} else if (error instanceof TimeoutError) {
// error.timeoutMs → 30000
console.error(`⏰ Timed out after ${error.timeoutMs}ms`);
} else if (error instanceof ValidationError) {
// error.field → "language", error.expected → "BCP-47 code"
console.error(`❌ Invalid input: ${error.message}`);
} else if (error instanceof SDKNotInstalledError) {
// error.packageName → "@deepgram/sdk"
// error.suggestion → "Run: npm install @deepgram/sdk"
console.error(error.suggestion);
}
}Every error includes:
| Property | Type | Description |
|----------|------|-------------|
| code | string | Machine-readable code (PROVIDER_ERROR, TIMEOUT_ERROR, etc.) |
| provider | string? | Which provider caused the error |
| operation | string? | The operation that failed |
| retryable | boolean | Whether automatic retry is safe |
| suggestion | string? | Human-readable fix suggestion |
| cause | Error? | Original underlying error |
API Reference
VoiceBridge (main class)
class VoiceBridge {
asr: ASR; // Access ASR methods
tts: TTS; // Access TTS methods
constructor(config: VoiceBridgeConfig);
initialize(): Promise<void>;
dispose(): Promise<void>;
// Provider plugins
registerASRPlugin(name: string, adapter: ASRProviderAdapter): void;
registerTTSPlugin(name: string, adapter: TTSProviderAdapter): void;
listASRProviders(): string[];
listTTSProviders(): string[];
// Voice aliases
registerVoiceAlias(alias: VoiceAlias): void;
// Cost estimation
estimateASRCost(provider: string, durationSeconds: number, model?: string): CostEstimate;
estimateTTSCost(provider: string, characterCount: number, model?: string): CostEstimate;
compareASRCosts(durationSeconds: number, model?: string): CostEstimate[];
compareTTSCosts(characterCount: number, model?: string): CostEstimate[];
pricingLastUpdated: string; // e.g. "2026-03-01"
// Events
on<K extends keyof VoiceBridgeEvents>(event: K, listener: (payload: VoiceBridgeEvents[K]) => void): void;
off<K extends keyof VoiceBridgeEvents>(event: K, listener: (payload: VoiceBridgeEvents[K]) => void): void;
}ASR
class ASR {
transcribe(audio: Buffer, config: ASRConfig): Promise<ASRResult>;
stream(config: ASRStreamConfig): Promise<ASRStream>;
}TTS
class TTS {
synthesize(text: string, config: TTSConfig): Promise<TTSResult>;
synthesizeStream(text: string, config: TTSStreamConfig): Promise<TTSStream>;
listVoices(provider?: string, language?: string): Promise<TTSVoice[]>;
}SSMLBuilder
class SSMLBuilder {
text(content: string): this;
pause(timeMs: number): this;
emphasis(text: string, level?: 'strong' | 'moderate' | 'reduced'): this;
prosody(text: string, options: { rate?: string; pitch?: string; volume?: string }): this;
sayAs(text: string, interpretAs: string, format?: string): this;
paragraph(text: string): this;
sentence(text: string): this;
build(): string;
}Utility Functions
detectAudioFormat(buffer: Buffer): AudioFormat | null;
convertAudio(buffer: Buffer, options: AudioConvertOptions): Buffer;
stripSSML(ssml: string): string;
isSSML(text: string): boolean;Types
interface ASRConfig {
provider: string;
language?: string;
model?: string;
modelTier?: 'fast' | 'accurate' | 'balanced';
diarization?: boolean;
punctuation?: boolean;
profanityFilter?: boolean;
keywordBoosting?: string[];
encoding?: AudioFormat;
sampleRate?: SampleRate;
channels?: number;
options?: Record<string, unknown>;
}interface ASRResult {
text: string;
confidence: number;
words: ASRWord[];
speakers: ASRSpeakerSegment[];
language: string;
duration: number;
provider: string;
model?: string;
metadata: Record<string, unknown>;
}interface TTSConfig {
provider: string;
voice?: string;
voiceAlias?: string;
model?: string;
outputFormat?: AudioFormat;
sampleRate?: SampleRate;
speed?: number;
pitch?: number;
ssml?: boolean;
language?: string;
options?: Record<string, unknown>;
}interface TTSResult {
audio: Buffer;
format: AudioFormat;
sampleRate: SampleRate;
duration?: number;
provider: string;
charactersUsed: number;
metadata: Record<string, unknown>;
}interface CostEstimate {
provider: string;
model?: string;
unitType: string;
unitPrice: number;
estimatedCost: number;
currency: string;
}interface VoiceAlias {
name: string;
description?: string;
gender?: 'male' | 'female' | 'neutral';
mappings: { [provider: string]: string };
}type AudioFormat = 'pcm' | 'wav' | 'mp3' | 'ogg' | 'mulaw' | 'alaw' | 'flac' | 'webm';
type SampleRate = 8000 | 16000 | 22050 | 24000 | 44100 | 48000;Requirements
- Node.js >= 18
- TypeScript >= 5.0 (recommended, ships with
.d.tsdeclarations)
License
MIT — see LICENSE for details.
