@360labs/live-transcribe
v0.2.2
Published
Professional live speech transcription library for TypeScript/JavaScript with multi-provider support
Maintainers
Readme
Live Transcribe
Professional live speech transcription library for TypeScript/JavaScript with multi-provider support
Built by 360labs - We build AI-powered developer tools and APIs.
Table of Contents
- Features
- Installation
- Quick Start
- Providers
- Session Management
- Events
- Export Formats
- Supported Languages
- API Reference
- Examples
- Browser Support
- Contributing
- License
Features
| Feature | Description | |---------|-------------| | Multi-Provider Support | Web Speech API, Deepgram, AssemblyAI, and custom providers | | Real-time Transcription | Live results with interim and final transcripts | | 40+ Languages | Extensive language support across all providers | | Session Management | Full control with start, stop, pause, and resume | | Voice Activity Detection | Automatic speech detection (VAD) | | Audio Recording | Built-in recording capabilities | | Export Formats | JSON, Plain Text, SRT, VTT, CSV | | TypeScript First | Complete type definitions and IntelliSense support | | Event-Driven | Subscribe to transcription events easily | | Lightweight | ~200KB package size with zero runtime dependencies | | Cross-Platform | Works in browsers and Node.js |
Installation
# npm
npm install @360labs/live-transcribe
# yarn
yarn add @360labs/live-transcribe
# pnpm
pnpm add @360labs/live-transcribeQuick Start
Basic Usage (Web Speech API)
import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';
// Create a transcriber (Web Speech API - no API key required)
const transcriber = createTranscriber({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
interimResults: true,
});
// Listen for transcription results
transcriber.on('transcript', (result) => {
if (result.isFinal) {
console.log('Final:', result.text);
} else {
console.log('Interim:', result.text);
}
});
// Handle errors
transcriber.on('error', (error) => {
console.error('Error:', error.message);
});
// Start transcribing
await transcriber.initialize();
await transcriber.start();
// Stop when done
await transcriber.stop();Using Sessions (Recommended)
import { createSession, TranscriptionProvider } from '@360labs/live-transcribe';
// Create a session for full control
const session = createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
// Access the provider for events
session.provider.on('transcript', (result) => {
console.log(result.text);
// Add to session for later export
if (result.isFinal) {
session.addTranscript(result);
}
});
// Lifecycle control
await session.start();
session.pause(); // Pause transcription
session.resume(); // Resume transcription
await session.stop();
// Get results
const transcripts = session.getTranscripts();
const fullText = session.getFullText();
const stats = session.getStatistics();
// Export in various formats
const srtFile = session.export('srt');
const jsonFile = session.export('json');Providers
Web Speech API (Browser)
The Web Speech API is built into modern browsers and requires no API key. It's perfect for quick prototypes and applications that don't need cloud-based accuracy.
import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';
const transcriber = createTranscriber({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
interimResults: true, // Get real-time interim results
});
// Check browser support
if (transcriber.isSupported()) {
await transcriber.initialize();
await transcriber.start();
}Pros:
- No API key required
- Free to use
- Works offline (in some browsers)
- Low latency
Cons:
- Accuracy varies by browser
- Limited language support compared to cloud providers
- Requires internet connection in most browsers
Deepgram
Deepgram offers high-accuracy, real-time transcription with advanced features like speaker diarization and custom vocabularies.
import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';
const transcriber = createTranscriber({
provider: TranscriptionProvider.Deepgram,
apiKey: 'your-deepgram-api-key',
language: 'en-US',
model: 'nova-2', // Latest model
punctuate: true, // Auto-punctuation
interimResults: true,
});
transcriber.on('transcript', (result) => {
console.log(result.text);
console.log('Confidence:', result.confidence);
});
await transcriber.initialize();
await transcriber.start();Configuration Options:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| apiKey | string | required | Your Deepgram API key |
| model | string | 'nova-2' | Model to use (nova-2, nova, enhanced, base) |
| language | string | 'en-US' | Language code |
| punctuate | boolean | true | Enable auto-punctuation |
| interimResults | boolean | true | Enable interim results |
| smartFormat | boolean | false | Enable smart formatting |
| diarize | boolean | false | Enable speaker diarization |
AssemblyAI
AssemblyAI provides state-of-the-art transcription with features like automatic language detection and content moderation.
import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';
const transcriber = createTranscriber({
provider: TranscriptionProvider.AssemblyAI,
apiKey: 'your-assemblyai-api-key',
sampleRate: 16000,
});
transcriber.on('transcript', (result) => {
console.log(result.text);
});
await transcriber.initialize();
await transcriber.start();Configuration Options:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| apiKey | string | required | Your AssemblyAI API key |
| sampleRate | number | 16000 | Audio sample rate in Hz |
| wordBoost | string[] | [] | Words to boost recognition |
Custom Provider
You can create custom providers by extending the BaseTranscriber class:
import { BaseTranscriber, TranscriptionConfig, SessionState } from '@360labs/live-transcribe';
class MyCustomProvider extends BaseTranscriber {
private recognition: any;
constructor(config: TranscriptionConfig) {
super(config);
}
isSupported(): boolean {
return true; // Check if your provider is available
}
async initialize(): Promise<void> {
// Initialize your provider
this.setState(SessionState.INITIALIZING);
}
async start(): Promise<void> {
this.setState(SessionState.ACTIVE);
this.emit('start');
// Start transcription
}
async stop(): Promise<void> {
this.setState(SessionState.STOPPED);
this.emit('stop');
// Stop transcription
}
pause(): void {
this.setState(SessionState.PAUSED);
this.emit('pause');
}
resume(): void {
this.setState(SessionState.ACTIVE);
this.emit('resume');
}
sendAudio(audioData: ArrayBuffer): void {
// Send audio to your provider
}
async cleanup(): Promise<void> {
// Clean up resources
}
}Session Management
Sessions provide a higher-level API for managing transcription with built-in transcript storage and export capabilities.
import { createSession, SessionManager, TranscriptionProvider } from '@360labs/live-transcribe';
// Single session
const session = createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
// Session properties
console.log(session.id); // Unique session ID
console.log(session.getState()); // Current state
// Multiple sessions with SessionManager
const manager = new SessionManager();
const session1 = manager.createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
const session2 = manager.createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'es-ES',
});
// Get all sessions
const allSessions = manager.getAllSessions();
// Get session by ID
const session = manager.getSession('session-id');
// Get active sessions
const activeSessions = manager.getActiveSessions();Session States
import { SessionState } from '@360labs/live-transcribe';
// Available states
SessionState.IDLE // Initial state
SessionState.INITIALIZING // Provider initializing
SessionState.ACTIVE // Transcription in progress
SessionState.PAUSED // Transcription paused
SessionState.STOPPING // Stopping transcription
SessionState.STOPPED // Transcription stopped
SessionState.ERROR // Error occurredEvents
Subscribe to events for real-time updates:
const transcriber = createTranscriber({ /* config */ });
// Transcript events
transcriber.on('transcript', (result) => {
console.log('Text:', result.text);
console.log('Is Final:', result.isFinal);
console.log('Confidence:', result.confidence);
console.log('Timestamp:', result.timestamp);
});
transcriber.on('final', (result) => {
// Only final transcripts
console.log('Final transcript:', result.text);
});
transcriber.on('interim', (result) => {
// Only interim transcripts
console.log('Interim:', result.text);
});
// Lifecycle events
transcriber.on('start', () => {
console.log('Transcription started');
});
transcriber.on('stop', () => {
console.log('Transcription stopped');
});
transcriber.on('pause', () => {
console.log('Transcription paused');
});
transcriber.on('resume', () => {
console.log('Transcription resumed');
});
// State changes
transcriber.on('stateChange', (state) => {
console.log('State changed to:', state);
});
// Language changes
transcriber.on('languageChange', (change) => {
console.log(`Language: ${change.from} -> ${change.to}`);
});
// Error handling
transcriber.on('error', (error) => {
console.error('Error code:', error.code);
console.error('Error message:', error.message);
console.error('Provider:', error.provider);
});
// Remove listeners
transcriber.off('transcript', myHandler);
transcriber.removeAllListeners();TranscriptionResult Object
interface TranscriptionResult {
text: string; // Transcribed text
isFinal: boolean; // Is this a final result?
confidence?: number; // Confidence score (0-1)
timestamp: number; // Unix timestamp
speaker?: string; // Speaker ID (if diarization enabled)
language?: string; // Detected language
words?: Word[]; // Word-level timing
}
interface Word {
text: string;
start: number; // Start time in ms
end: number; // End time in ms
confidence?: number;
}Export Formats
Export transcripts in multiple formats:
const session = createSession({ /* config */ });
// Add transcripts during session
session.provider.on('transcript', (result) => {
if (result.isFinal) {
session.addTranscript(result);
}
});
// After transcription, export in various formats
// JSON - Full data with metadata
const jsonExport = session.export('json');
console.log(jsonExport.data); // JSON string
console.log(jsonExport.filename); // 'transcript-{id}.json'
console.log(jsonExport.mimeType); // 'application/json'
// Plain Text - Just the text
const textExport = session.export('text');
// Output: "Hello world. How are you today?"
// SRT - SubRip subtitles
const srtExport = session.export('srt');
// Output:
// 1
// 00:00:01,000 --> 00:00:03,500
// Hello world.
//
// 2
// 00:00:04,000 --> 00:00:06,500
// How are you today?
// VTT - WebVTT subtitles
const vttExport = session.export('vtt');
// Output:
// WEBVTT
//
// 00:00:01.000 --> 00:00:03.500
// Hello world.
//
// 00:00:04.000 --> 00:00:06.500
// How are you today?
// CSV - Spreadsheet format
const csvExport = session.export('csv');
// Output: timestamp,text,confidence,isFinal
// 1234567890,Hello world,0.95,true
// Download in browser
function downloadTranscript(format: string) {
const exported = session.export(format);
const blob = new Blob([exported.data], { type: exported.mimeType });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = exported.filename;
a.click();
URL.revokeObjectURL(url);
}Supported Languages
The library supports 40+ languages. Language support varies by provider.
Web Speech API Languages
| Language | Code | Language | Code |
|----------|------|----------|------|
| English (US) | en-US | English (UK) | en-GB |
| English (Australia) | en-AU | English (India) | en-IN |
| Spanish (Spain) | es-ES | Spanish (Mexico) | es-MX |
| French (France) | fr-FR | French (Canada) | fr-CA |
| German | de-DE | Italian | it-IT |
| Portuguese (Brazil) | pt-BR | Portuguese (Portugal) | pt-PT |
| Chinese (Simplified) | zh-CN | Chinese (Traditional) | zh-TW |
| Japanese | ja-JP | Korean | ko-KR |
| Hindi | hi-IN | Arabic (Saudi Arabia) | ar-SA |
| Russian | ru-RU | Dutch | nl-NL |
| Polish | pl-PL | Turkish | tr-TR |
| Thai | th-TH | Vietnamese | vi-VN |
| Indonesian | id-ID | Hebrew | he-IL |
| Czech | cs-CZ | Greek | el-GR |
| Swedish | sv-SE | Danish | da-DK |
| Finnish | fi-FI | Norwegian | no-NO |
| Ukrainian | uk-UA | Romanian | ro-RO |
| Hungarian | hu-HU | Malay | ms-MY |
Setting Language
// At creation
const transcriber = createTranscriber({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'es-ES', // Spanish (Spain)
});
// Or use session
const session = createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'fr-FR', // French
});Changing Language Mid-Transcript
You can change the language during an active transcription session. The library automatically handles stopping and restarting the transcription with the new language:
const session = createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
await session.start();
// User switches to Spanish mid-conversation
await session.setLanguage('es-ES');
// Transcription continues seamlessly in Spanish
// Switch to French
await session.setLanguage('fr-FR');
// Get current language
console.log(session.getLanguage()); // 'fr-FR'With Transcriber Directly:
const transcriber = createTranscriber({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
// Listen for language changes
transcriber.on('languageChange', (change) => {
console.log(`Language changed from ${change.from} to ${change.to}`);
});
await transcriber.initialize();
await transcriber.start();
// Change language while recording
await transcriber.setLanguage('de-DE');
// Get current language
console.log(transcriber.getLanguage()); // 'de-DE'React Example with Language Selector:
function TranscriptionWithLanguageSwitch() {
const [language, setLanguage] = useState('en-US');
const sessionRef = useRef<TranscriptionSession | null>(null);
const changeLanguage = async (newLang: string) => {
setLanguage(newLang);
if (sessionRef.current) {
await sessionRef.current.setLanguage(newLang);
}
};
return (
<div>
<select value={language} onChange={(e) => changeLanguage(e.target.value)}>
<option value="en-US">English</option>
<option value="es-ES">Spanish</option>
<option value="fr-FR">French</option>
<option value="de-DE">German</option>
<option value="ja-JP">Japanese</option>
</select>
{/* Transcription continues without interruption */}
</div>
);
}API Reference
createTranscriber(config)
Creates a new transcriber instance.
function createTranscriber(config: TranscriptionConfig): ITranscriptionProvider;Config Options:
| Option | Type | Required | Default | Description |
|--------|------|----------|---------|-------------|
| provider | TranscriptionProvider | Yes | - | Provider to use |
| apiKey | string | For cloud | - | API key for cloud providers |
| language | string | No | 'en-US' | Language code |
| interimResults | boolean | No | true | Enable interim results |
| punctuation | boolean | No | true | Enable auto-punctuation |
| profanityFilter | boolean | No | false | Filter profanity |
createSession(config)
Creates a new transcription session.
function createSession(config: TranscriptionConfig): TranscriptionSession;TranscriptionSession
| Method | Returns | Description |
|--------|---------|-------------|
| start() | Promise | Start transcription |
| stop() | Promise | Stop transcription |
| pause() | void | Pause transcription |
| resume() | void | Resume transcription |
| getState() | SessionState | Get current state |
| getTranscripts(finalOnly?) | TranscriptionResult[] | Get all transcripts |
| getFullText() | string | Get concatenated text |
| getStatistics() | SessionStatistics | Get session stats |
| addTranscript(result) | void | Add a transcript |
| export(format) | ExportResult | Export transcripts |
| setLanguage(language) | Promise | Change language mid-session |
| getLanguage() | string | Get current language |
SessionStatistics
interface SessionStatistics {
wordCount: number;
transcriptCount: number;
duration: number;
averageConfidence: number;
}Examples
React Integration
import React, { useState, useEffect, useRef } from 'react';
import { createSession, TranscriptionProvider, TranscriptionSession } from '@360labs/live-transcribe';
function TranscriptionComponent() {
const [isRecording, setIsRecording] = useState(false);
const [transcript, setTranscript] = useState('');
const sessionRef = useRef<TranscriptionSession | null>(null);
useEffect(() => {
return () => {
sessionRef.current?.stop();
};
}, []);
const startRecording = async () => {
const session = createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
session.provider.on('transcript', (result) => {
if (result.isFinal) {
setTranscript(prev => prev + ' ' + result.text);
session.addTranscript(result);
}
});
sessionRef.current = session;
await session.start();
setIsRecording(true);
};
const stopRecording = async () => {
await sessionRef.current?.stop();
setIsRecording(false);
};
return (
<div>
<button onClick={isRecording ? stopRecording : startRecording}>
{isRecording ? 'Stop' : 'Start'} Recording
</button>
<p>{transcript}</p>
</div>
);
}Vue Integration
<template>
<div>
<button @click="toggleRecording">
{{ isRecording ? 'Stop' : 'Start' }} Recording
</button>
<p>{{ transcript }}</p>
</div>
</template>
<script setup lang="ts">
import { ref, onUnmounted } from 'vue';
import { createSession, TranscriptionProvider } from '@360labs/live-transcribe';
const isRecording = ref(false);
const transcript = ref('');
let session: any = null;
const toggleRecording = async () => {
if (isRecording.value) {
await session?.stop();
isRecording.value = false;
} else {
session = createSession({
provider: TranscriptionProvider.WebSpeechAPI,
language: 'en-US',
});
session.provider.on('transcript', (result: any) => {
if (result.isFinal) {
transcript.value += ' ' + result.text;
}
});
await session.start();
isRecording.value = true;
}
};
onUnmounted(() => {
session?.stop();
});
</script>Node.js with Deepgram
import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';
import { createReadStream } from 'fs';
const transcriber = createTranscriber({
provider: TranscriptionProvider.Deepgram,
apiKey: process.env.DEEPGRAM_API_KEY,
language: 'en-US',
});
transcriber.on('transcript', (result) => {
console.log(result.text);
});
await transcriber.initialize();
await transcriber.start();
// Send audio data
const audioStream = createReadStream('audio.wav');
audioStream.on('data', (chunk) => {
transcriber.sendAudio(chunk);
});
audioStream.on('end', async () => {
await transcriber.stop();
});Browser Support
| Browser | Web Speech API | WebSocket (Cloud) | |---------|---------------|-------------------| | Chrome 33+ | ✅ Full | ✅ Full | | Edge 79+ | ✅ Full | ✅ Full | | Safari 14.1+ | ✅ Partial | ✅ Full | | Firefox | ❌ | ✅ Full | | Opera 20+ | ✅ Full | ✅ Full |
Note: Web Speech API requires an internet connection in most browsers as it uses cloud-based recognition.
Error Handling
import { TranscriptionError, ErrorCode } from '@360labs/live-transcribe';
transcriber.on('error', (error: TranscriptionError) => {
switch (error.code) {
case ErrorCode.MICROPHONE_ACCESS_DENIED:
console.log('Please allow microphone access');
break;
case ErrorCode.NETWORK_ERROR:
console.log('Network error - check your connection');
break;
case ErrorCode.AUTHENTICATION_FAILED:
console.log('Invalid API key');
break;
case ErrorCode.UNSUPPORTED_BROWSER:
console.log('Browser not supported');
break;
default:
console.log('Error:', error.message);
}
});Error Codes
| Code | Description |
|------|-------------|
| INITIALIZATION_FAILED | Provider failed to initialize |
| AUTHENTICATION_FAILED | Invalid or missing API key |
| NETWORK_ERROR | Network connection error |
| MICROPHONE_ACCESS_DENIED | Microphone permission denied |
| UNSUPPORTED_BROWSER | Browser doesn't support required APIs |
| INVALID_CONFIG | Invalid configuration provided |
| PROVIDER_ERROR | Provider-specific error |
| UNKNOWN_ERROR | Unknown error occurred |
Audio Processing Utilities
The library includes audio processing utilities:
import { AudioProcessor } from '@360labs/live-transcribe';
// Convert Float32 to Int16 (for sending to APIs)
const int16Data = AudioProcessor.convertFloat32ToInt16(float32Array);
// Convert Int16 to Float32
const float32Data = AudioProcessor.convertInt16ToFloat32(int16Array);
// Resample audio
const resampled = AudioProcessor.resampleBuffer(buffer, 44100, 16000);
// Normalize audio levels
const normalized = AudioProcessor.normalizeBuffer(buffer);
// Apply gain
const amplified = AudioProcessor.applyGain(buffer, 1.5);
// Mix two audio buffers
const mixed = AudioProcessor.mixBuffers(buffer1, buffer2, 0.5);Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Website: 360labs.dev
- Issues: GitHub Issues
- Discussions: GitHub Discussions
About 360labs
360labs builds AI-powered developer tools and APIs. We focus on creating simple, powerful libraries that help developers build amazing products faster.
Other products by 360labs:
- Visit 360labs.dev to see all our tools and APIs
Made with ❤️ by 360labs
