voice-router-dev
v0.8.0
Published
Universal speech-to-text router for Gladia, AssemblyAI, Deepgram, Azure, OpenAI Whisper, Speechmatics, and Soniox
Maintainers
Readme
Voice Router SDK
Universal speech-to-text router for 8 transcription providers with a single, unified API.
Why Voice Router?
Switch between speech-to-text providers without changing your code. One API for Gladia, AssemblyAI, Deepgram, Azure, OpenAI Whisper, Speechmatics, and Soniox.
import { VoiceRouter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: {
gladia: { apiKey: process.env.GLADIA_KEY },
deepgram: { apiKey: process.env.DEEPGRAM_KEY }
}
});
// Same code works with ANY provider
const result = await router.transcribe(audio, {
provider: 'gladia' // Switch to 'deepgram' anytime
});Features
- Provider-Agnostic - Switch providers with one line
- Unified API - Same interface for all providers
- Webhook Normalization - Auto-detect and parse webhooks
- Real-time Streaming - WebSocket support (Gladia, AssemblyAI, Deepgram, Soniox, OpenAI Realtime)
- Advanced Features - Diarization, sentiment, summarization, chapters, entities
- Type-Safe - Full TypeScript support with OpenAPI-generated types
- Typed Extended Data - Access provider-specific features with full autocomplete
- Provider Fallback - Automatic failover strategies
- Zero Config - Works out of the box
Supported Providers
| Provider | Batch | Streaming | Webhooks | Special Features | |----------|-------|-----------|----------|------------------| | Gladia | Yes | WebSocket | Yes | Multi-language, code-switching, translation | | AssemblyAI | Yes | Real-time | HMAC | Chapters, entities, content moderation | | Deepgram | Sync | WebSocket | Yes | PII redaction, keyword boosting | | Azure STT | Async | No | HMAC | Custom models, language ID | | OpenAI | Sync | Realtime | No | gpt-4o, diarization, Realtime API | | Speechmatics | Async | No | Query params | High accuracy, summarization | | Soniox | Yes | WebSocket | No | 60+ languages, translation, regions |
Installation
npm install voice-router-dev
# or
pnpm add voice-router-dev
# or
yarn add voice-router-devQuick Start
Basic Transcription
import { VoiceRouter, GladiaAdapter } from 'voice-router-dev';
// Initialize router
const router = new VoiceRouter({
providers: {
gladia: { apiKey: 'YOUR_GLADIA_KEY' }
},
defaultProvider: 'gladia'
});
// Register adapter
router.registerAdapter(new GladiaAdapter());
// Transcribe from URL
const result = await router.transcribe({
type: 'url',
url: 'https://example.com/audio.mp3'
}, {
language: 'en',
diarization: true
});
if (result.success) {
console.log('Transcript:', result.data.text);
console.log('Speakers:', result.data.speakers);
}Multi-Provider with Fallback
import {
VoiceRouter,
GladiaAdapter,
AssemblyAIAdapter,
DeepgramAdapter
} from 'voice-router-dev';
const router = new VoiceRouter({
providers: {
gladia: { apiKey: process.env.GLADIA_KEY },
assemblyai: { apiKey: process.env.ASSEMBLYAI_KEY },
deepgram: { apiKey: process.env.DEEPGRAM_KEY }
},
selectionStrategy: 'round-robin' // Auto load-balance
});
// Register all providers
router.registerAdapter(new GladiaAdapter());
router.registerAdapter(new AssemblyAIAdapter());
router.registerAdapter(new DeepgramAdapter());
// Automatically rotates between providers
await router.transcribe(audio1); // Uses Gladia
await router.transcribe(audio2); // Uses AssemblyAI
await router.transcribe(audio3); // Uses DeepgramReal-time Streaming
import { VoiceRouter, DeepgramAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: {
deepgram: { apiKey: process.env.DEEPGRAM_KEY }
}
});
router.registerAdapter(new DeepgramAdapter());
// Start streaming session
const session = await router.transcribeStream({
provider: 'deepgram',
encoding: 'linear16',
sampleRate: 16000,
language: 'en',
interimResults: true
}, {
onTranscript: (event) => {
if (event.isFinal) {
console.log('Final:', event.text);
} else {
console.log('Interim:', event.text);
}
},
onError: (error) => console.error(error)
});
// Send audio chunks
const audioStream = getMicrophoneStream();
for await (const chunk of audioStream) {
await session.sendAudio({ data: chunk });
}
await session.close();Webhook Normalization
import express from 'express';
// Webhooks use node:crypto - import from separate entry point
import { WebhookRouter } from 'voice-router-dev/webhooks';
const app = express();
const webhookRouter = new WebhookRouter();
// Single endpoint handles ALL providers
app.post('/webhooks/transcription', express.json(), (req, res) => {
// Auto-detect provider from payload
const result = webhookRouter.route(req.body, {
queryParams: req.query,
userAgent: req.headers['user-agent'],
verification: {
signature: req.headers['x-signature'],
secret: process.env.WEBHOOK_SECRET
}
});
if (!result.success) {
return res.status(400).json({ error: result.error });
}
// Unified format across all providers
console.log('Provider:', result.provider); // 'gladia' | 'assemblyai' | etc
console.log('Event:', result.event?.eventType); // 'transcription.completed'
console.log('ID:', result.event?.data?.id);
console.log('Text:', result.event?.data?.text);
res.json({ received: true });
});Advanced Usage
Provider-Specific Features with Type Safety
Use typed provider options for full autocomplete and compile-time safety:
// Gladia - Full type-safe options
const result = await router.transcribe(audio, {
provider: 'gladia',
gladia: {
translation: true,
translation_config: { target_languages: ['fr', 'es'] },
moderation: true,
named_entity_recognition: true,
sentiment_analysis: true,
chapterization: true,
audio_to_llm: true,
audio_to_llm_config: [{ prompt: 'Summarize key points' }],
custom_metadata: { session_id: 'abc123' }
}
});
// Access typed extended data
if (result.extended) {
const translations = result.extended.translation?.results;
const chapters = result.extended.chapters?.results;
const entities = result.extended.entities?.results;
console.log('Custom metadata:', result.extended.customMetadata);
}
// AssemblyAI - Typed options with extended data
const assemblyResult = await router.transcribe(audio, {
provider: 'assemblyai',
assemblyai: {
auto_chapters: true,
entity_detection: true,
sentiment_analysis: true,
auto_highlights: true,
content_safety: true,
iab_categories: true
}
});
if (assemblyResult.extended) {
assemblyResult.extended.chapters?.forEach(ch => {
console.log(`${ch.headline}: ${ch.summary}`);
});
assemblyResult.extended.entities?.forEach(e => {
console.log(`${e.entity_type}: ${e.text}`);
});
}
// Deepgram - Typed options with metadata tracking
const deepgramResult = await router.transcribe(audio, {
provider: 'deepgram',
deepgram: {
model: 'nova-3',
smart_format: true,
paragraphs: true,
detect_topics: true,
tag: ['meeting', 'sales'],
extra: { user_id: '12345' }
}
});
if (deepgramResult.extended) {
console.log('Request ID:', deepgramResult.extended.requestId);
console.log('Audio SHA256:', deepgramResult.extended.sha256);
console.log('Tags:', deepgramResult.extended.tags);
}
// OpenAI Whisper - Typed options
const whisperResult = await router.transcribe(audio, {
provider: 'openai-whisper',
diarization: true,
openai: {
temperature: 0.2,
prompt: 'Technical discussion about APIs'
}
});
// Speechmatics - Enhanced accuracy with summarization
const speechmaticsResult = await router.transcribe(audio, {
provider: 'speechmatics',
model: 'enhanced',
summarization: true,
diarization: true
});
// All providers include request tracking
console.log('Request ID:', result.tracking?.requestId);Error Handling
const result = await router.transcribe(audio, {
provider: 'gladia',
language: 'en'
});
if (!result.success) {
console.error('Provider:', result.provider);
console.error('Error:', result.error);
console.error('Details:', result.data);
// Implement fallback strategy
const fallbackResult = await router.transcribe(audio, {
provider: 'assemblyai' // Try different provider
});
}Custom Provider Selection
// Explicit provider selection
const router = new VoiceRouter({
providers: {
gladia: { apiKey: '...' },
deepgram: { apiKey: '...' }
},
selectionStrategy: 'explicit' // Must specify provider
});
// Round-robin load balancing
const router = new VoiceRouter({
providers: { /* ... */ },
selectionStrategy: 'round-robin'
});
// Default fallback
const router = new VoiceRouter({
providers: { /* ... */ },
defaultProvider: 'gladia',
selectionStrategy: 'default'
});API Reference
VoiceRouter
Main class for provider-agnostic transcription.
Constructor:
new VoiceRouter(config: VoiceRouterConfig)Methods:
registerAdapter(adapter: TranscriptionAdapter)- Register a provider adaptertranscribe(audio: AudioInput, options?: TranscribeOptions)- Transcribe audiotranscribeStream(options: StreamingOptions, callbacks: StreamingCallbacks)- Stream audiogetTranscript(id: string, provider: string)- Get transcript by IDgetProviderCapabilities(provider: string)- Get provider features
WebhookRouter
Automatic webhook detection and normalization.
Methods:
route(payload: unknown, options?: WebhookRouterOptions)- Parse webhookdetectProvider(payload: unknown)- Detect provider from payloadvalidate(payload: unknown)- Validate webhook structure
Adapters
Provider-specific implementations:
GladiaAdapter- Gladia transcriptionAssemblyAIAdapter- AssemblyAI transcriptionDeepgramAdapter- Deepgram transcriptionAzureSTTAdapter- Azure Speech-to-TextOpenAIWhisperAdapter- OpenAI Whisper + Realtime APISpeechmaticsAdapter- Speechmatics transcriptionSonioxAdapter- Soniox transcription (batch + streaming)
TypeScript Support
Full type definitions included with provider-specific type safety:
import type {
VoiceRouter,
VoiceRouterConfig,
AudioInput,
TranscribeOptions,
UnifiedTranscriptResponse,
StreamingSession,
StreamingOptions,
UnifiedWebhookEvent,
TranscriptionProvider,
// Normalized data types
TranscriptData,
Word,
Utterance,
Speaker
} from 'voice-router-dev';Normalized Result Structure
All providers return UnifiedTranscriptResponse with consistent data structure:
interface UnifiedTranscriptResponse<P extends TranscriptionProvider> {
success: boolean;
provider: P;
// Normalized data - same structure for ALL providers
data?: {
id: string; // Transcript ID
text: string; // Full transcript text
status: TranscriptionStatus;
confidence?: number; // 0-1 confidence score
duration?: number; // Audio duration in seconds
language?: string; // Detected/specified language
// Normalized arrays - consistent across providers
words?: Word[]; // { word, start, end, confidence, speaker }
utterances?: Utterance[]; // { text, start, end, speaker, words }
speakers?: Speaker[]; // { id, label, confidence }
summary?: string;
metadata?: TranscriptMetadata;
};
// Provider-specific rich data (typed per provider)
extended?: ProviderExtendedData;
// Request tracking
tracking?: { requestId, audioHash, processingTimeMs };
// Error info (on failure)
error?: { code, message, details, statusCode };
// Raw provider response (fully typed per provider)
raw?: ProviderRawResponse;
}Use cases:
- Display transcripts - Use
data.text,data.words,data.utterances - Re-normalize stored responses - Store
raw, reconstruct via adapter - Access provider features - Use
extendedfor chapters, entities, etc.
See docs/NORMALIZED_RESULTS.md for detailed documentation.
Provider-Specific Type Safety
The SDK provides full type safety for provider-specific responses:
// Generic response - raw and extended fields are unknown
const result: UnifiedTranscriptResponse = await router.transcribe(audio);
// Provider-specific response - raw and extended are properly typed!
const deepgramResult: UnifiedTranscriptResponse<'deepgram'> = await router.transcribe(audio, {
provider: 'deepgram'
});
// TypeScript knows raw is ListenV1Response
const metadata = deepgramResult.raw?.metadata;
// TypeScript knows extended is DeepgramExtendedData
const requestId = deepgramResult.extended?.requestId;
const sha256 = deepgramResult.extended?.sha256;Provider-specific raw response types:
gladia-PreRecordedResponsedeepgram-ListenV1Responseopenai-whisper-CreateTranscription200Oneassemblyai-AssemblyAITranscriptazure-stt-AzureTranscription
Provider-specific extended data types:
gladia-GladiaExtendedData(translation, moderation, entities, sentiment, chapters, audioToLlm, customMetadata)assemblyai-AssemblyAIExtendedData(chapters, entities, sentimentResults, highlights, contentSafety, topics)deepgram-DeepgramExtendedData(metadata, requestId, sha256, modelInfo, tags)
Typed Extended Data
Access rich provider-specific data beyond basic transcription:
import type {
GladiaExtendedData,
AssemblyAIExtendedData,
DeepgramExtendedData,
// Individual types for fine-grained access
GladiaTranslation,
GladiaChapters,
AssemblyAIChapter,
AssemblyAIEntity,
DeepgramMetadata
} from 'voice-router-dev';
// Gladia extended data
const gladiaResult = await router.transcribe(audio, { provider: 'gladia', gladia: { translation: true } });
const translation: GladiaTranslation | undefined = gladiaResult.extended?.translation;
// AssemblyAI extended data
const assemblyResult = await router.transcribe(audio, { provider: 'assemblyai', assemblyai: { auto_chapters: true } });
const chapters: AssemblyAIChapter[] | undefined = assemblyResult.extended?.chapters;
// All responses include tracking info
console.log('Request ID:', gladiaResult.tracking?.requestId);Exported Parameter Enums
Import and use provider-specific enums for type-safe configuration:
import {
// Deepgram enums
ListenV1EncodingParameter,
ListenV1ModelParameter,
SpeakV1EncodingParameter,
// Gladia enums
StreamingSupportedEncodingEnum,
StreamingSupportedSampleRateEnum,
// OpenAI types
AudioResponseFormat
} from 'voice-router-dev';
// Type-safe Deepgram encoding
const session = await router.transcribeStream({
provider: 'deepgram',
encoding: ListenV1EncodingParameter.linear16,
model: ListenV1ModelParameter['nova-2'],
sampleRate: 16000
});
// Type-safe Gladia encoding - use unified format
const gladiaSession = await router.transcribeStream({
provider: 'gladia',
encoding: 'linear16', // Unified format - mapped to Gladia's 'wav/pcm'
sampleRate: 16000
});Type-Safe Streaming Options
Streaming options are fully typed based on provider OpenAPI specifications:
// Deepgram streaming - all options are type-safe
const deepgramSession = await router.transcribeStream({
provider: 'deepgram',
encoding: 'linear16',
model: 'nova-3',
language: 'en-US',
diarization: true
}, callbacks);
// Gladia streaming - with typed gladiaStreaming options
const gladiaSession = await router.transcribeStream({
provider: 'gladia',
encoding: 'linear16', // Unified format - mapped to Gladia's 'wav/pcm'
sampleRate: 16000,
gladiaStreaming: {
realtime_processing: { words_accurate_timestamps: true },
messages_config: { receive_partial_transcripts: true }
}
}, callbacks);
// AssemblyAI streaming
const assemblySession = await router.transcribeStream({
provider: 'assemblyai',
sampleRate: 16000,
wordTimestamps: true
}, callbacks);Benefits:
- Full IntelliSense - Autocomplete for all provider-specific options
- Compile-time Safety - Invalid options caught before runtime
- Provider Discrimination - Type system knows which provider you're using
- OpenAPI-Generated - Types come directly from provider specifications
Typed Field Configs
Build type-safe UI field overrides with compile-time validation:
import {
GladiaStreamingFieldName,
GladiaStreamingConfig,
FieldOverrides,
GladiaStreamingSchema,
FieldConfig
} from 'voice-router-dev/field-configs'
// Type-safe field overrides - typos caught at compile time!
const overrides: Partial<Record<GladiaStreamingFieldName, FieldConfig | null>> = {
encoding: { name: 'encoding', type: 'select', required: false },
language_config: null, // Hide this field
// typo_field: null, // ✗ TypeScript error!
}
// Fully typed config values - option values validated too!
const config: Partial<GladiaStreamingConfig> = {
encoding: 'wav/pcm', // ✓ Only valid options allowed
sample_rate: 16000,
}
// Extract specific field's valid options
type EncodingOptions = GladiaStreamingConfig['encoding']
// = 'wav/pcm' | 'wav/alaw' | 'wav/ulaw'Available for all 7 providers:
GladiaStreamingFieldName,DeepgramTranscriptionFieldName,AssemblyAIStreamingFieldName, etc.GladiaStreamingConfig,DeepgramTranscriptionConfig,AzureTranscriptionConfig, etc.GladiaStreamingSchema,DeepgramTranscriptionSchema, etc. (Zod schemas for advanced extraction)
Lightweight Field Metadata (Performance-Optimized)
For UI form generation without heavy Zod schema types (156KB vs 2.8MB):
// Lightweight import - 156KB types instead of 2.8MB
import {
GLADIA_STREAMING_FIELDS,
GladiaStreamingFieldName,
PROVIDER_FIELDS,
FieldMetadata
} from 'voice-router-dev/field-metadata'
// Pre-computed field metadata - no Zod at runtime
GLADIA_STREAMING_FIELDS.forEach(field => {
if (field.type === 'select' && field.options) {
renderDropdown(field.name, field.options)
}
})When to use which:
| Use Case | Import | Types Size |
|----------|--------|------------|
| UI form generation (no validation) | field-metadata | 156 KB |
| Runtime Zod validation needed | field-configs | 2.8 MB |
Requirements
- Node.js: 20.0.0 or higher
- TypeScript: 5.0+ (optional)
- Package Managers: npm, pnpm, or yarn
Documentation
API Reference (Auto-Generated)
Comprehensive API documentation is auto-generated with TypeDoc from TypeScript source code:
docs/generated/ - Complete API reference
Main Documentation Sets:
router/ - Core SDK API
voice-router.md- VoiceRouter class (main entry point)types.md- Unified types (UnifiedTranscriptResponse, StreamingOptions, etc.)adapters/base-adapter.md- BaseAdapter interface
webhooks/ - Webhook handling
webhook-router.md- WebhookRouter class (auto-detect providers)types.md- Webhook event types{provider}-webhook.md- Provider-specific webhook handlers
Provider-Specific Adapters:
- gladia/ - Gladia adapter API
- deepgram/ - Deepgram adapter API
- assemblyai/ - AssemblyAI adapter API
- openai/ - OpenAI Whisper adapter API
- azure/ - Azure STT adapter API
- speechmatics/ - Speechmatics adapter API
Most Important Files:
docs/generated/router/router/voice-router.md- Main router classdocs/generated/router/router/types.md- Core typesdocs/generated/webhooks/webhook-router.md- Webhook handling
Developer Documentation
- docs/DEVELOPMENT.md - Quick reference for developers
- docs/SDK_GENERATION_WORKFLOW.md - Technical workflow
Provider Setup Guides
Gladia
import { VoiceRouter, GladiaAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: { gladia: { apiKey: 'YOUR_KEY' } }
});
router.registerAdapter(new GladiaAdapter());Get your API key: https://gladia.io
AssemblyAI
import { VoiceRouter, AssemblyAIAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: { assemblyai: { apiKey: 'YOUR_KEY' } }
});
router.registerAdapter(new AssemblyAIAdapter());Get your API key: https://assemblyai.com
Deepgram
import { VoiceRouter, DeepgramAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: { deepgram: { apiKey: 'YOUR_KEY' } }
});
router.registerAdapter(new DeepgramAdapter());Get your API key: https://deepgram.com
Azure Speech-to-Text
import { VoiceRouter, AzureSTTAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: {
'azure-stt': {
apiKey: 'YOUR_KEY',
region: 'eastus' // Required
}
}
});
router.registerAdapter(new AzureSTTAdapter());Get your credentials: https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
OpenAI Whisper
import { VoiceRouter, OpenAIWhisperAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: { 'openai-whisper': { apiKey: 'YOUR_KEY' } }
});
router.registerAdapter(new OpenAIWhisperAdapter());Get your API key: https://platform.openai.com
Speechmatics
import { VoiceRouter, SpeechmaticsAdapter } from 'voice-router-dev';
const router = new VoiceRouter({
providers: { speechmatics: { apiKey: 'YOUR_KEY' } }
});
router.registerAdapter(new SpeechmaticsAdapter());Get your API key: https://speechmatics.com
Soniox
import { VoiceRouter, SonioxAdapter, SonioxRegion } from 'voice-router-dev';
const router = new VoiceRouter({
providers: {
soniox: {
apiKey: 'YOUR_KEY',
region: SonioxRegion.us // or 'eu', 'jp'
}
}
});
router.registerAdapter(new SonioxAdapter());Get your API key: https://soniox.com
Contributing
Contributions welcome! Please read our Contributing Guide.
License
MIT © Lazare Zemliak
Support
- Issues: GitHub Issues
- Repository: GitHub
Note: This is a development version (voice-router-dev). The stable release will be published as voice-router.
