npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

voice-to-text-converter

v1.0.0

Published

A modern, lightweight Node.js package for speech-to-text conversion with support for multiple engines

Readme

Voice-to-Text Converter

A modern, lightweight Node.js package for speech-to-text conversion with support for multiple engines and both Node.js and browser environments.

Features

  • 🎤 Multiple Input Sources: Microphone, audio files, and streams
  • 🔧 Multiple Engines: Web Speech API, Vosk (offline), Google Cloud Speech-to-Text
  • 🌐 Cross-Platform: Works in Node.js and browsers
  • 📝 TypeScript Support: Full type definitions included
  • 🔄 Real-time Processing: Streaming and continuous recognition
  • 🛡️ Error Handling: Comprehensive error handling and fallback mechanisms
  • 🎯 Simple API: Clean, intuitive interface for developers

Installation

npm install voice-to-text-converter

Optional Dependencies

For offline processing with Vosk:

npm install vosk

For Google Cloud Speech-to-Text:

npm install @google-cloud/speech

For microphone recording in Node.js:

npm install node-record-lpcm16

Quick Start

Node.js

import { VoiceToText, transcribeFromFile, transcribeFromMicrophone } from 'voice-to-text-converter';

// Quick transcription from file
const results = await transcribeFromFile('audio.wav', {
  language: 'en-US'
});
console.log(results[0].transcript);

// Quick transcription from microphone
const micResults = await transcribeFromMicrophone({
  duration: 5000, // 5 seconds
  language: 'en-US'
});
console.log(micResults[0].transcript);

Browser

<script src="https://unpkg.com/voice-to-text-converter/lib/browser.js"></script>
<script>
  // Quick transcription from microphone
  voiceToText.transcribeFromMicrophone({
    duration: 5000,
    language: 'en-US'
  }).then(results => {
    console.log(results[0].transcript);
  });
</script>

Usage Examples

Basic Usage

import { VoiceToText } from 'voice-to-text-converter';

const voiceToText = new VoiceToText({
  defaultEngine: { engine: 'vosk', modelPath: './models/vosk-model-en-us' },
  defaultRecognitionConfig: {
    language: 'en-US',
    continuous: true,
    interimResults: true
  }
});

// Initialize the converter
await voiceToText.initialize();

// Set up event listeners
voiceToText.on('result', (result) => {
  console.log(`Transcript: ${result.transcript}`);
  console.log(`Confidence: ${result.confidence}`);
  console.log(`Is Final: ${result.isFinal}`);
});

voiceToText.on('error', (error) => {
  console.error('Recognition error:', error.message);
});

// Start listening from microphone
await voiceToText.fromMicrophone({
  duration: 10000 // Record for 10 seconds
});

// Clean up
await voiceToText.cleanup();

File Processing

import { VoiceToText } from 'voice-to-text-converter';

const voiceToText = new VoiceToText();
await voiceToText.initialize();

// Process single file
const results = await voiceToText.fromFile('speech.wav', {
  language: 'en-US',
  maxAlternatives: 3
});

results.forEach((result, index) => {
  console.log(`Result ${index + 1}: ${result.transcript}`);
  console.log(`Confidence: ${result.confidence}`);
});

Stream Processing

import { VoiceToText } from 'voice-to-text-converter';
import fs from 'fs';

const voiceToText = new VoiceToText();
await voiceToText.initialize();

const audioStream = fs.createReadStream('audio.wav');
const results = await voiceToText.fromStream(audioStream, {
  language: 'es-ES'
});

console.log('Transcription:', results.map(r => r.transcript).join(' '));

Real-time Recognition

import { VoiceToText } from 'voice-to-text-converter';

const voiceToText = new VoiceToText({
  defaultRecognitionConfig: {
    continuous: true,
    interimResults: true
  }
});

await voiceToText.initialize();

// Handle real-time results
voiceToText.on('result', (result) => {
  if (result.isFinal) {
    console.log('Final:', result.transcript);
  } else {
    console.log('Interim:', result.transcript);
  }
});

// Start continuous listening
await voiceToText.startListening({
  source: 'microphone'
});

// Stop after 30 seconds
setTimeout(async () => {
  await voiceToText.stopListening();
}, 30000);

Engine-Specific Usage

Vosk (Offline)

import { VoiceToText, VoskEngine } from 'voice-to-text-converter';

// Download and setup Vosk model first
const modelPath = await VoskEngine.downloadModel('en-US', 'small');

const voiceToText = new VoiceToText({
  defaultEngine: {
    engine: 'vosk',
    modelPath: modelPath
  }
});

await voiceToText.initialize();
const results = await voiceToText.fromFile('audio.wav');

Google Cloud Speech-to-Text

import { VoiceToText } from 'voice-to-text-converter';

const voiceToText = new VoiceToText({
  defaultEngine: {
    engine: 'google-cloud',
    apiKey: 'your-api-key',
    projectId: 'your-project-id'
  }
});

await voiceToText.initialize();
const results = await voiceToText.fromFile('audio.wav', {
  language: 'en-US',
  encoding: 'FLAC'
});

Web Speech API (Browser)

import { VoiceToText } from 'voice-to-text-converter';

const voiceToText = new VoiceToText({
  defaultEngine: { engine: 'web-speech' }
});

await voiceToText.initialize();

// Only works in browsers with microphone access
await voiceToText.fromMicrophone({
  duration: 5000,
  language: 'en-US'
});

API Reference

VoiceToText Class

Constructor

new VoiceToText(options?: VoiceToTextOptions)

Options:

  • defaultEngine?: EngineConfig - Default engine configuration
  • defaultRecognitionConfig?: SpeechRecognitionConfig - Default recognition settings
  • enableFallback?: boolean - Enable automatic engine fallback (default: true)
  • enginePriority?: Array<'web-speech' | 'vosk' | 'google-cloud'> - Engine priority order
  • debug?: boolean - Enable debug logging (default: false)

Methods

initialize(): Promise<void>

Initialize the voice-to-text converter and select the best available engine.

fromMicrophone(options?: MicrophoneOptions): Promise<void>

Start speech recognition from microphone input.

Options:

  • duration?: number - Recording duration in milliseconds
  • deviceId?: string - Specific microphone device ID
  • sampleRate?: number - Audio sample rate (default: 16000)
fromFile(filePath: string, config?: SpeechRecognitionConfig): Promise<SpeechRecognitionResult[]>

Process an audio file and return transcription results.

fromStream(stream: NodeJS.ReadableStream, config?: SpeechRecognitionConfig): Promise<SpeechRecognitionResult[]>

Process an audio stream and return transcription results.

startListening(audioConfig: AudioInputConfig, config?: SpeechRecognitionConfig): Promise<void>

Start continuous speech recognition.

stopListening(): Promise<void>

Stop ongoing speech recognition.

abort(): Promise<void>

Abort speech recognition immediately.

switchEngine(engineConfig: EngineConfig): Promise<void>

Switch to a different speech recognition engine.

getCurrentEngine(): EngineInfo | null

Get information about the currently active engine.

cleanup(): Promise<void>

Clean up resources and stop all recognition processes.

Properties

isListening: boolean

Whether the converter is currently listening/recording.

Static Methods

getAvailableEngines(): Array<'web-speech' | 'vosk' | 'google-cloud'>

Get list of available engines in the current environment.

isEngineAvailable(engine: string): boolean

Check if a specific engine is available.

getEngineCapabilities(engine: string): EngineCapabilities

Get capabilities and features of a specific engine.

getBrowserSupport(): BrowserSupport

Get browser compatibility information.

quickTranscribe(source, options): Promise<SpeechRecognitionResult[]>

Quick one-time transcription without managing instance lifecycle.

Events

The VoiceToText class extends EventEmitter and emits the following events:

  • start - Recognition started
  • end - Recognition ended
  • result - Transcription result available
  • error - Error occurred
  • audiostart - Audio input started
  • audioend - Audio input ended
  • soundstart - Sound detected
  • soundend - Sound ended
  • speechstart - Speech detected
  • speechend - Speech ended

Types and Interfaces

SpeechRecognitionResult

interface SpeechRecognitionResult {
  transcript: string;
  confidence: number;
  isFinal: boolean;
  alternatives?: Array<{
    transcript: string;
    confidence: number;
  }>;
  timestamp?: {
    start: number;
    end: number;
  };
}

SpeechRecognitionConfig

interface SpeechRecognitionConfig {
  language?: string; // Language code (e.g., 'en-US')
  sampleRate?: number; // Audio sample rate
  continuous?: boolean; // Enable continuous recognition
  interimResults?: boolean; // Return interim results
  maxAlternatives?: number; // Maximum alternatives to return
  confidenceThreshold?: number; // Confidence threshold (0-1)
  phrases?: string[]; // Custom vocabulary
  encoding?: 'LINEAR16' | 'FLAC' | 'MULAW' | 'AMR' | 'AMR_WB' | 'OGG_OPUS';
}

EngineConfig

interface EngineConfig {
  engine: 'web-speech' | 'vosk' | 'google-cloud';
  apiKey?: string; // For cloud services
  modelPath?: string; // For offline engines
  projectId?: string; // For Google Cloud
  endpoint?: string; // Custom endpoint URL
}

Quick Start Functions

transcribeFromFile(filePath: string, options?): Promise<SpeechRecognitionResult[]>

Quick transcription from an audio file.

transcribeFromMicrophone(options?): Promise<SpeechRecognitionResult[]>

Quick transcription from microphone input.

transcribeFromStream(stream: NodeJS.ReadableStream, options?): Promise<SpeechRecognitionResult[]>

Quick transcription from an audio stream.

createVoiceToText(options?): VoiceToText

Factory function to create a VoiceToText instance.

getSystemInfo(): SystemInfo

Get system information and available engines.

Engine Comparison

| Feature | Web Speech API | Vosk | Google Cloud Speech | |---------|---------------|------|-------------------| | Environment | Browser only | Node.js + Browser | Node.js + Browser | | Online/Offline | Online | Offline | Online | | Accuracy | High | Medium-High | Very High | | Speed | Fast | Fast | Fast | | Privacy | Data sent to Google | Fully private | Data sent to Google | | Cost | Free | Free | Pay per use | | Languages | 60+ | 20+ | 125+ | | File Processing | Limited | Yes | Yes | | Streaming | Yes | Yes | Yes | | Setup Complexity | None | Model download required | API key required |

When to Use Each Engine

Web Speech API:

  • Browser-based applications
  • Quick prototyping
  • No setup required
  • Real-time microphone input

Vosk:

  • Privacy-sensitive applications
  • Offline processing required
  • Cost-sensitive projects
  • Edge computing scenarios

Google Cloud Speech:

  • High accuracy requirements
  • Production applications
  • Multiple language support
  • Advanced features needed

Configuration

Environment Variables

# Google Cloud Speech (optional)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
GOOGLE_CLOUD_PROJECT=your-project-id

# OpenAI API (if using Whisper integration)
OPENAI_API_KEY=your-openai-api-key

Vosk Model Setup

  1. Download a Vosk model from alphacephei.com/vosk/models
  2. Extract the model to a directory
  3. Use the model path in your configuration:
const voiceToText = new VoiceToText({
  defaultEngine: {
    engine: 'vosk',
    modelPath: './models/vosk-model-en-us-0.22'
  }
});

Google Cloud Setup

  1. Create a Google Cloud project
  2. Enable the Speech-to-Text API
  3. Create a service account and download the JSON key
  4. Set the environment variable or pass credentials directly:
const voiceToText = new VoiceToText({
  defaultEngine: {
    engine: 'google-cloud',
    apiKey: 'your-api-key',
    projectId: 'your-project-id'
  }
});

Browser Usage

CDN

<script src="https://unpkg.com/voice-to-text-converter/lib/browser.js"></script>

ES Modules

import { VoiceToText } from 'voice-to-text-converter/browser';

const voiceToText = new VoiceToText();
await voiceToText.initialize();

// Request microphone permission
const hasPermission = await voiceToText.requestMicrophonePermission();
if (hasPermission) {
  await voiceToText.fromMicrophone({ duration: 5000 });
}

Browser Compatibility

  • Chrome 25+
  • Firefox 44+
  • Safari 14.1+
  • Edge 79+

Note: Web Speech API requires HTTPS in production environments.

Error Handling

import { VoiceToText, SpeechRecognitionError, SpeechRecognitionErrorType } from 'voice-to-text-converter';

const voiceToText = new VoiceToText();

voiceToText.on('error', (error) => {
  switch (error.type) {
    case SpeechRecognitionErrorType.NO_SPEECH:
      console.log('No speech detected');
      break;
    case SpeechRecognitionErrorType.AUDIO_CAPTURE:
      console.log('Microphone access denied');
      break;
    case SpeechRecognitionErrorType.NETWORK:
      console.log('Network error occurred');
      break;
    case SpeechRecognitionErrorType.NOT_ALLOWED:
      console.log('Permission denied');
      break;
    default:
      console.error('Recognition error:', error.message);
  }
});

try {
  await voiceToText.initialize();
  const results = await voiceToText.fromFile('audio.wav');
} catch (error) {
  console.error('Failed to process audio:', error.message);
}

Performance Tips

Optimization

  1. Choose the right engine for your use case
  2. Set appropriate sample rates (16000 Hz is usually sufficient)
  3. Use confidence thresholds to filter low-quality results
  4. Enable interim results only when needed
  5. Implement proper cleanup to prevent memory leaks

Memory Management

// Always clean up resources
const voiceToText = new VoiceToText();
try {
  await voiceToText.initialize();
  // ... use the converter
} finally {
  await voiceToText.cleanup();
}

// Or use the quick functions for one-time use
const results = await transcribeFromFile('audio.wav');

Batch Processing

// Process multiple files efficiently
const voiceToText = new VoiceToText();
await voiceToText.initialize();

const files = ['audio1.wav', 'audio2.wav', 'audio3.wav'];
const results = await Promise.all(
  files.map(file => voiceToText.fromFile(file))
);

await voiceToText.cleanup();

Troubleshooting

Common Issues

"No speech recognition engines are available"

  • Cause: No compatible engines are installed or available
  • Solution: Install optional dependencies (vosk, @google-cloud/speech) or use in a browser environment

"Microphone access denied"

  • Cause: Browser blocked microphone access
  • Solution: Enable microphone permissions in browser settings, ensure HTTPS in production

"Model not found" (Vosk)

  • Cause: Vosk model path is incorrect or model not downloaded
  • Solution: Download the correct model and verify the path

"Authentication failed" (Google Cloud)

  • Cause: Invalid API credentials
  • Solution: Verify API key and project ID, check service account permissions

Poor recognition accuracy

  • Cause: Low audio quality, wrong language setting, or inappropriate engine
  • Solution:
    • Improve audio quality (reduce noise, use better microphone)
    • Set correct language in configuration
    • Try different engines
    • Adjust confidence threshold

Debug Mode

Enable debug mode to get detailed logging:

const voiceToText = new VoiceToText({ debug: true });

Testing Audio Setup

import { getSystemInfo, VoiceToText } from 'voice-to-text-converter';

// Check system capabilities
const systemInfo = getSystemInfo();
console.log('Available engines:', systemInfo.availableEngines);
console.log('Platform:', systemInfo.platform);

// Test browser support
if (systemInfo.platform === 'browser') {
  const support = VoiceToText.getBrowserSupport();
  console.log('Web Speech API:', support.webSpeechAPI);
  console.log('Media Recorder:', support.mediaRecorder);
  console.log('getUserMedia:', support.getUserMedia);
}

Examples

See the examples/ directory for complete working examples:

  • examples/node-basic.js - Basic Node.js usage
  • examples/node-advanced.js - Advanced Node.js features
  • examples/browser-simple.html - Simple browser implementation
  • examples/browser-advanced.html - Advanced browser features
  • examples/real-time.js - Real-time speech recognition
  • examples/file-processing.js - Batch file processing

Testing

Run the test suite:

npm test

Run tests with coverage:

npm run test:coverage

Run tests in watch mode:

npm run test:watch

Building

Build the package:

npm run build

Build in watch mode:

npm run build:watch

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

  1. Clone the repository:
git clone https://github.com/yourusername/voice-to-text-converter.git
cd voice-to-text-converter
  1. Install dependencies:
npm install
  1. Install optional dependencies for testing:
npm install vosk @google-cloud/speech node-record-lpcm16
  1. Run tests:
npm test
  1. Build the package:
npm run build

Code Style

This project uses ESLint and TypeScript for code quality. Run linting:

npm run lint
npm run lint:fix

Security Considerations

Privacy

  • Web Speech API: Audio data is sent to Google's servers
  • Google Cloud Speech: Audio data is sent to Google Cloud (with enterprise privacy controls)
  • Vosk: Fully offline, no data transmission

Permissions

  • Browser applications require microphone permission
  • Ensure HTTPS for production browser deployments
  • Validate and sanitize all audio file inputs

Best Practices

  1. Always request explicit user consent for microphone access
  2. Implement proper error handling for permission denials
  3. Use HTTPS in production environments
  4. Consider data retention policies for transcribed text
  5. Implement rate limiting for cloud-based engines

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for version history and changes.

Support

Acknowledgments


Made with ❤️ by the Voice-to-Text Converter team