@sathsarabandaraj/audio-handler
v0.1.2
Published
A comprehensive audio handling library for STT (Speech-to-Text) and TTS (Text-to-Speech) functionality using Deepgram
Downloads
9
Maintainers
Readme
@sathsarabandaraj/audio-handler
A unified TypeScript library for browser-based audio management, featuring Speech-to-Text (STT) and Text-to-Speech (TTS) powered by Deepgram AI.
🌟 Features
- 🎤 Speech-to-Text: Real-time speech transcription using Deepgram's AI
- 🔊 Text-to-Speech: High-quality voice synthesis with multiple voice options
- 🔌 Provider System: Pluggable architecture for different AI providers
- ⚡️ Event-Driven: React to audio events in real-time
- 🌐 Browser Ready: Works seamlessly in web applications
- 🛡️ TypeScript: Full type safety and IntelliSense support
- 🔄 Smart Fallbacks: Automatic fallback to Web Speech API when needed
- 🎯 Easy Integration: Simple API, minimal setup
📦 Installation
npm install @sathsarabandaraj/audio-handlerOr with pnpm:
pnpm add @sathsarabandaraj/audio-handler🚀 Quick Start
1. Initialize AudioManager
import AudioManager from '@sathsarabandaraj/audio-handler';
// Create instance
const audioManager = new AudioManager({
apiKey: 'your-deepgram-api-key',
language: 'en-US' // Optional
});
// Initialize
await audioManager.initialize();2. Speech-to-Text
// Start listening
await audioManager.startListening((result) => {
console.log('Transcript:', result.text);
console.log('Is Final:', result.isFinal);
console.log('Confidence:', result.confidence);
});
// Stop listening
await audioManager.stopListening();3. Text-to-Speech
// Generate speech
const result = await audioManager.speak('Hello, world!', {
model: 'aura-asteria-en'
});
// Play audio
const blob = new Blob([result.audioBuffer], { type: 'audio/mpeg' });
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
audio.play();📚 Complete Examples
We provide three complete, working examples that demonstrate different implementation approaches:
1. HTML Example (Vanilla JavaScript)
Single-file HTML application with no build step required.
cd examples/html-example
python3 -m http.server 8080
# Open http://localhost:8080Features:
- ✅ Single HTML file
- ✅ No framework needed
- ✅ ES6 modules
- ✅ Smart TTS fallback
- ✅ Deferred transcription
2. React Example (Component-Based)
Modern React application with Vite build system.
cd examples/react-example
pnpm install
pnpm run dev
# Open http://localhost:5173Features:
- ✅ React 18 + Hooks
- ✅ Component architecture
- ✅ Hot Module Replacement
- ✅ TypeScript-ready
- ✅ Production-ready structure
3. Backend Proxy (Node.js/Express)
Secure proxy server to solve CORS issues and protect API keys.
cd examples/backend
npm install
cp .env.example .env
# Edit .env and add your API key
npm start
# Server runs on http://localhost:3001Features:
- ✅ Express.js REST API
- ✅ CORS-enabled
- ✅ API key security
- ✅ Audio streaming
- ✅ Health checks
🎯 Complete Documentation
For a comprehensive understanding of how everything works together, read our detailed guide:
- Examples Documentation - Complete architecture overview, data flow, and system design
📖 API Reference
AudioManager
Constructor
new AudioManager(config: AudioManagerConfig)Config Options:
interface AudioManagerConfig {
apiKey: string; // Required: Deepgram API key
language?: string; // Optional: Language code (default: 'en-US')
}Methods
initialize()
Initialize the audio manager. Must be called before using STT/TTS.
await audioManager.initialize();startListening(callback)
Start speech-to-text transcription.
await audioManager.startListening((result) => {
console.log(result.text); // Transcribed text
console.log(result.isFinal); // Is this a final result?
console.log(result.confidence); // Confidence score (0-1)
});Result Interface:
interface TranscriptionResult {
text: string; // Transcribed text
isFinal: boolean; // Final result or interim?
confidence: number; // Confidence score (0-1)
}stopListening()
Stop speech-to-text transcription.
await audioManager.stopListening();speak(text, options)
Generate speech from text.
const result = await audioManager.speak('Hello, world!', {
model: 'aura-asteria-en' // Optional: Voice model
});
console.log(result.audioBuffer); // ArrayBuffer
console.log(result.duration); // Duration in secondsOptions:
interface TTSOptions {
model?: string; // Voice model (default: 'aura-asteria-en')
}Result Interface:
interface TTSResult {
audioBuffer: ArrayBuffer; // Audio data
duration: number; // Duration in seconds
}Available Voice Models
| Model | Gender | Language | Description |
|-------|--------|----------|-------------|
| aura-asteria-en | Female | English | Warm and friendly |
| aura-luna-en | Female | English | Clear and professional |
| aura-stella-en | Female | English | Expressive |
| aura-athena-en | Female | English | Authoritative |
| aura-hera-en | Female | English | Confident |
| aura-orion-en | Male | English | Deep and commanding |
| aura-arcas-en | Male | English | Neutral and clear |
| aura-perseus-en | Male | English | Energetic |
| aura-angus-en | Male | English | Smooth |
| aura-orpheus-en | Male | English | Rich and resonant |
| aura-helios-en | Male | English | Bright |
| aura-zeus-en | Male | English | Powerful |
🏗️ Architecture
AudioManager (Core)
│
├── DeepgramSTTProvider
│ ├── Microphone Permission
│ ├── AudioContext
│ ├── WebSocket (Live Transcription)
│ └── Event Emitter
│
└── DeepgramTTSProvider
├── HTTP Client (REST API)
├── Audio Buffer Handler
└── Duration Calculator🔒 Security Considerations
API Key Protection
❌ Don't expose API keys in frontend:
// Bad - API key visible in browser
const audioManager = new AudioManager({
apiKey: 'abc123...' // Visible in source code!
});✅ Use backend proxy:
- Set up the backend proxy server
- Frontend calls your proxy
- Proxy adds API key server-side
- Proxy calls Deepgram API
CORS Issues
- STT (WebSocket): Works directly from browser
- TTS (REST): Requires backend proxy due to CORS
See our backend example for a complete proxy implementation.
🛠️ Development
Build from Source
# Clone repository
git clone https://github.com/sathsarabandaraj/audio-handler.git
cd audio-handler
# Install dependencies
pnpm install
# Build package
pnpm run build
# Output: dist/index.jsProject Structure
audio-handler/
├── src/
│ ├── index.ts # Main exports
│ ├── core/
│ │ ├── audio-manager.ts # Core AudioManager class
│ │ ├── types.ts # TypeScript interfaces
│ │ └── mic-permission.ts # Microphone permission handler
│ └── providers/
│ └── deepgram/
│ ├── index.ts # Provider exports
│ ├── stt.ts # STT implementation
│ ├── tts.ts # TTS implementation
│ └── types.ts # Provider types
├── examples/
│ ├── html-example/ # Vanilla JS example
│ ├── react-example/ # React example
│ └── backend/ # Express proxy server
├── dist/ # Build output
├── package.json
├── tsconfig.json
└── rollup.config.js🧪 Testing
Manual Testing
Use the provided examples to test functionality:
# Test HTML example
cd examples/html-example
python3 -m http.server 8080
# Test React example
cd examples/react-example
pnpm run dev
# Test backend proxy
cd examples/backend
npm start🤝 Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
📝 License
MIT License
Copyright (c) 2025 Sathsara Bandaraj
🙏 Acknowledgments
- Deepgram for their excellent AI-powered speech APIs
- Rollup for module bundling
- TypeScript for type safety
🔗 Links
💬 Support
- 📧 Email: [email protected]
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
📊 Browser Compatibility
| Feature | Chrome | Firefox | Safari | Edge | |---------|--------|---------|--------|------| | STT (getUserMedia) | ✅ 53+ | ✅ 36+ | ✅ 11+ | ✅ 79+ | | AudioContext | ✅ 35+ | ✅ 25+ | ✅ 14.1+ | ✅ 79+ | | WebSocket | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | | Web Speech API | ✅ 33+ | ✅ 49+ | ✅ 14.1+ | ✅ 79+ |
Recommendation: Use the latest versions of Chrome, Firefox, Edge, or Safari for the best experience.
📈 Roadmap
- [ ] Additional provider support (Azure, AWS, Google)
- [ ] Language detection
- [ ] Custom vocabulary support
- [ ] Audio recording and export
- [ ] Real-time waveform visualization
- [ ] Voice activity detection
- [ ] Noise cancellation
- [ ] Multiple speaker detection
🎉 Getting Started
Ready to add voice capabilities to your app? Start with our examples:
- Quick Test: Try the HTML example - no build required
- Production Ready: Use the React example as a template
- Security: Set up the backend proxy for production
Have questions? Check out our comprehensive documentation!
Built with ❤️ by Sathsara Bandaraj
Star ⭐ this repo if you find it useful!
