npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sathsarabandaraj/audio-handler

v0.1.2

Published

A comprehensive audio handling library for STT (Speech-to-Text) and TTS (Text-to-Speech) functionality using Deepgram

Downloads

9

Readme

@sathsarabandaraj/audio-handler

A unified TypeScript library for browser-based audio management, featuring Speech-to-Text (STT) and Text-to-Speech (TTS) powered by Deepgram AI.

npm version License: MIT

🌟 Features

  • 🎤 Speech-to-Text: Real-time speech transcription using Deepgram's AI
  • 🔊 Text-to-Speech: High-quality voice synthesis with multiple voice options
  • 🔌 Provider System: Pluggable architecture for different AI providers
  • ⚡️ Event-Driven: React to audio events in real-time
  • 🌐 Browser Ready: Works seamlessly in web applications
  • 🛡️ TypeScript: Full type safety and IntelliSense support
  • 🔄 Smart Fallbacks: Automatic fallback to Web Speech API when needed
  • 🎯 Easy Integration: Simple API, minimal setup

📦 Installation

npm install @sathsarabandaraj/audio-handler

Or with pnpm:

pnpm add @sathsarabandaraj/audio-handler

🚀 Quick Start

1. Initialize AudioManager

import AudioManager from '@sathsarabandaraj/audio-handler';

// Create instance
const audioManager = new AudioManager({
  apiKey: 'your-deepgram-api-key',
  language: 'en-US'  // Optional
});

// Initialize
await audioManager.initialize();

2. Speech-to-Text

// Start listening
await audioManager.startListening((result) => {
  console.log('Transcript:', result.text);
  console.log('Is Final:', result.isFinal);
  console.log('Confidence:', result.confidence);
});

// Stop listening
await audioManager.stopListening();

3. Text-to-Speech

// Generate speech
const result = await audioManager.speak('Hello, world!', {
  model: 'aura-asteria-en'
});

// Play audio
const blob = new Blob([result.audioBuffer], { type: 'audio/mpeg' });
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
audio.play();

📚 Complete Examples

We provide three complete, working examples that demonstrate different implementation approaches:

1. HTML Example (Vanilla JavaScript)

Single-file HTML application with no build step required.

cd examples/html-example
python3 -m http.server 8080
# Open http://localhost:8080

Features:

  • ✅ Single HTML file
  • ✅ No framework needed
  • ✅ ES6 modules
  • ✅ Smart TTS fallback
  • ✅ Deferred transcription

📖 HTML Example Documentation

2. React Example (Component-Based)

Modern React application with Vite build system.

cd examples/react-example
pnpm install
pnpm run dev
# Open http://localhost:5173

Features:

  • ✅ React 18 + Hooks
  • ✅ Component architecture
  • ✅ Hot Module Replacement
  • ✅ TypeScript-ready
  • ✅ Production-ready structure

📖 React Example Documentation

3. Backend Proxy (Node.js/Express)

Secure proxy server to solve CORS issues and protect API keys.

cd examples/backend
npm install
cp .env.example .env
# Edit .env and add your API key
npm start
# Server runs on http://localhost:3001

Features:

  • ✅ Express.js REST API
  • ✅ CORS-enabled
  • ✅ API key security
  • ✅ Audio streaming
  • ✅ Health checks

📖 Backend Documentation

🎯 Complete Documentation

For a comprehensive understanding of how everything works together, read our detailed guide:

📖 API Reference

AudioManager

Constructor

new AudioManager(config: AudioManagerConfig)

Config Options:

interface AudioManagerConfig {
  apiKey: string;         // Required: Deepgram API key
  language?: string;      // Optional: Language code (default: 'en-US')
}

Methods

initialize()

Initialize the audio manager. Must be called before using STT/TTS.

await audioManager.initialize();
startListening(callback)

Start speech-to-text transcription.

await audioManager.startListening((result) => {
  console.log(result.text);        // Transcribed text
  console.log(result.isFinal);     // Is this a final result?
  console.log(result.confidence);  // Confidence score (0-1)
});

Result Interface:

interface TranscriptionResult {
  text: string;          // Transcribed text
  isFinal: boolean;      // Final result or interim?
  confidence: number;    // Confidence score (0-1)
}
stopListening()

Stop speech-to-text transcription.

await audioManager.stopListening();
speak(text, options)

Generate speech from text.

const result = await audioManager.speak('Hello, world!', {
  model: 'aura-asteria-en'  // Optional: Voice model
});

console.log(result.audioBuffer);  // ArrayBuffer
console.log(result.duration);     // Duration in seconds

Options:

interface TTSOptions {
  model?: string;  // Voice model (default: 'aura-asteria-en')
}

Result Interface:

interface TTSResult {
  audioBuffer: ArrayBuffer;  // Audio data
  duration: number;          // Duration in seconds
}

Available Voice Models

| Model | Gender | Language | Description | |-------|--------|----------|-------------| | aura-asteria-en | Female | English | Warm and friendly | | aura-luna-en | Female | English | Clear and professional | | aura-stella-en | Female | English | Expressive | | aura-athena-en | Female | English | Authoritative | | aura-hera-en | Female | English | Confident | | aura-orion-en | Male | English | Deep and commanding | | aura-arcas-en | Male | English | Neutral and clear | | aura-perseus-en | Male | English | Energetic | | aura-angus-en | Male | English | Smooth | | aura-orpheus-en | Male | English | Rich and resonant | | aura-helios-en | Male | English | Bright | | aura-zeus-en | Male | English | Powerful |

🏗️ Architecture

AudioManager (Core)
│
├── DeepgramSTTProvider
│   ├── Microphone Permission
│   ├── AudioContext
│   ├── WebSocket (Live Transcription)
│   └── Event Emitter
│
└── DeepgramTTSProvider
    ├── HTTP Client (REST API)
    ├── Audio Buffer Handler
    └── Duration Calculator

🔒 Security Considerations

API Key Protection

❌ Don't expose API keys in frontend:

// Bad - API key visible in browser
const audioManager = new AudioManager({
  apiKey: 'abc123...'  // Visible in source code!
});

✅ Use backend proxy:

  1. Set up the backend proxy server
  2. Frontend calls your proxy
  3. Proxy adds API key server-side
  4. Proxy calls Deepgram API

CORS Issues

  • STT (WebSocket): Works directly from browser
  • TTS (REST): Requires backend proxy due to CORS

See our backend example for a complete proxy implementation.

🛠️ Development

Build from Source

# Clone repository
git clone https://github.com/sathsarabandaraj/audio-handler.git
cd audio-handler

# Install dependencies
pnpm install

# Build package
pnpm run build

# Output: dist/index.js

Project Structure

audio-handler/
├── src/
│   ├── index.ts              # Main exports
│   ├── core/
│   │   ├── audio-manager.ts  # Core AudioManager class
│   │   ├── types.ts          # TypeScript interfaces
│   │   └── mic-permission.ts # Microphone permission handler
│   └── providers/
│       └── deepgram/
│           ├── index.ts      # Provider exports
│           ├── stt.ts        # STT implementation
│           ├── tts.ts        # TTS implementation
│           └── types.ts      # Provider types
├── examples/
│   ├── html-example/         # Vanilla JS example
│   ├── react-example/        # React example
│   └── backend/              # Express proxy server
├── dist/                     # Build output
├── package.json
├── tsconfig.json
└── rollup.config.js

🧪 Testing

Manual Testing

Use the provided examples to test functionality:

# Test HTML example
cd examples/html-example
python3 -m http.server 8080

# Test React example
cd examples/react-example
pnpm run dev

# Test backend proxy
cd examples/backend
npm start

🤝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

📝 License

MIT License

Copyright (c) 2025 Sathsara Bandaraj

🙏 Acknowledgments

🔗 Links

💬 Support

📊 Browser Compatibility

| Feature | Chrome | Firefox | Safari | Edge | |---------|--------|---------|--------|------| | STT (getUserMedia) | ✅ 53+ | ✅ 36+ | ✅ 11+ | ✅ 79+ | | AudioContext | ✅ 35+ | ✅ 25+ | ✅ 14.1+ | ✅ 79+ | | WebSocket | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | | Web Speech API | ✅ 33+ | ✅ 49+ | ✅ 14.1+ | ✅ 79+ |

Recommendation: Use the latest versions of Chrome, Firefox, Edge, or Safari for the best experience.

📈 Roadmap

  • [ ] Additional provider support (Azure, AWS, Google)
  • [ ] Language detection
  • [ ] Custom vocabulary support
  • [ ] Audio recording and export
  • [ ] Real-time waveform visualization
  • [ ] Voice activity detection
  • [ ] Noise cancellation
  • [ ] Multiple speaker detection

🎉 Getting Started

Ready to add voice capabilities to your app? Start with our examples:

  1. Quick Test: Try the HTML example - no build required
  2. Production Ready: Use the React example as a template
  3. Security: Set up the backend proxy for production

Have questions? Check out our comprehensive documentation!


Built with ❤️ by Sathsara Bandaraj

Star ⭐ this repo if you find it useful!