npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

aac-speech-recognition

v1.2.0

Published

Multi-API speech recognition library with confidence scoring for AAC applications

Downloads

22

Readme

AAC Speech Recognition Library

Multi-API speech recognition library with confidence scoring, designed for AAC (Augmentative and Alternative Communication) applications. Supports Whisper, Google, and Sphinx APIs with automatic best-result selection.

Installation

For Users of This Library

Option 1: Install from npm (if published)

npm install aac-speech-recognition

Option 2: Install from GitHub

npm install git+https://github.com/Capstone-Projects-2025-Fall/project-002-aac-api.git#library:Initial_API

Option 3: Install Locally

git clone https://github.com/Capstone-Projects-2025-Fall/project-002-aac-api.git
cd project-002-aac-api
git checkout library
cd Initial_API
npm install

Python Requirements

This library requires Python 3.11+ with the following packages:

# macOS (recommended: use Homebrew Python)
brew install [email protected]
/opt/homebrew/bin/pip3.11 install openai-whisper SpeechRecognition pocketsphinx

# Linux
pip3.11 install openai-whisper SpeechRecognition pocketsphinx

# Windows
pip install openai-whisper SpeechRecognition pocketsphinx

See INSTALLATION.md for detailed setup instructions.

Usage

As a Library (Node.js/Server-Side)

⚠️ IMPORTANT: The main export is for Node.js/server-side use only. Do not import it in browser/client components (React, Next.js client components, etc.) as it uses Node.js modules like fs and express.

const { transcribeAudio } = require('aac-speech-recognition');
// or if installed locally:
// const { transcribeAudio } = require('./index');
const fs = require('fs');

// Read audio file
const audioBuffer = fs.readFileSync('audio.wav');

// Transcribe with default APIs (whisper,google,sphinx)
const result = await transcribeAudio(audioBuffer);

console.log('Transcription:', result.transcription);
console.log('Confidence:', result.confidenceScore);
console.log('Selected API:', result.selectedApi);

// Or specify which APIs to use
const result2 = await transcribeAudio(audioBuffer, {
    speechApis: 'whisper,google'
});

As a Browser Client (React/Next.js Client Components)

For browser/client-side usage, use the browser export:

// ✅ CORRECT - Use browser export in client components
import { transcribeAudio } from 'aac-speech-recognition/browser';

// ❌ WRONG - This will cause "Module not found: Can't resolve 'fs'" error
// import { transcribeAudio } from 'aac-speech-recognition';

// In Next.js, make sure to mark as client component
'use client'; // Add this at the top of your file

// Use with audio Blob (from MediaRecorder, File input, etc.)
const audioBlob = new Blob([audioData], { type: 'audio/wav' });

const result = await transcribeAudio(audioBlob, {
    apiUrl: 'http://localhost:8080/upload', // Your API server URL
    speechApis: 'whisper,google,sphinx'
});

console.log('Transcription:', result.transcription);
console.log('Confidence:', result.confidenceScore);
console.log('Selected API:', result.selectedApi);

Note: The browser version requires the API server to be running. Make sure to start the server first (see "As a Server" section below).

As a Server

# Start the API server
npm start
# or
node index.js
# or
node server.js

The server will run on http://localhost:8080 (or the port specified in PORT environment variable).

API Endpoint

POST /upload

Upload an audio file for transcription.

curl -X POST http://localhost:8080/upload \
  -F "[email protected]" \
  -H "x-logging-consent: true"

Response:

{
  "success": true,
  "transcription": "Hello, how are you?",
  "confidenceScore": 0.85,
  "aggregatedConfidenceScore": 0.72,
  "selectedApi": "whisper",
  "apiResults": [...],
  "audio": {
    "filename": "audio.wav",
    "size": 12345,
    "format": "WAV",
    "duration": 2.5,
    "sampleRate": 16000
  }
}

Custom Server Setup

const { app } = require('./index');
// or
const app = require('./server');

// Add custom routes, middleware, etc.
app.use('/custom', customRouter);

app.listen(3000);

API Reference

transcribeAudio(audioBuffer, options)

Transcribe audio buffer using multi-API speech recognition.

Parameters:

  • audioBuffer (Buffer): Audio file buffer
  • options (Object, optional):
    • pythonPath (string): Path to Python executable (default: auto-detect)
    • speechApis (string): Comma-separated list of APIs (default: "whisper,google,sphinx")

Returns: Promise

  • success (boolean): Whether transcription succeeded
  • transcription (string): Transcribed text
  • confidenceScore (number): Confidence score of selected API
  • aggregatedConfidenceScore (number): Average confidence across all APIs
  • selectedApi (string): API that provided the best result
  • apiResults (Array): Results from all APIs tried
  • duration (number): Audio duration in seconds
  • format (string): Audio format
  • sampleRate (number): Sample rate
  • error (Object, optional): Error information if failed

parseUserAgent(userAgent)

Parse user agent string to extract browser and device info.

Parameters:

  • userAgent (string): User agent string

Returns: Object

  • browser (string): Browser name
  • device (string): Device type (Mobile/Tablet/Desktop)

logRequest(data, consentGiven, logDir)

Log request data to file (with consent).

Parameters:

  • data (Object): Data to log
  • consentGiven (boolean): Whether user consented to logging
  • logDir (string, optional): Directory for log files (default: ./logs)

Supported APIs

  1. Whisper (OpenAI) - Best for robotic/synthesized voices

    • Excellent accuracy with synthesized voices
    • Works offline
    • High confidence scores (~0.85)
  2. Google Speech Recognition - Good for natural speech

    • Free tier available
    • Requires internet connection
    • Default confidence: 0.7
  3. Sphinx (CMU) - Offline fallback

    • Works offline
    • Better with synthesized voices than Google
    • Default confidence: 0.6

Configuration

Set the SPEECH_APIS environment variable to customize which APIs to use:

export SPEECH_APIS=whisper,google,sphinx  # All three (default)
export SPEECH_APIS=whisper,google         # Whisper + Google
export SPEECH_APIS=whisper               # Only Whisper

Python Requirements

The library requires Python 3.11+ with the following packages:

# Using Homebrew Python (recommended on macOS)
/opt/homebrew/bin/pip3.11 install openai-whisper SpeechRecognition pocketsphinx

Testing

npm test

License

ISC