aac-speech-recognition
v1.2.0
Published
Multi-API speech recognition library with confidence scoring for AAC applications
Downloads
22
Maintainers
Readme
AAC Speech Recognition Library
Multi-API speech recognition library with confidence scoring, designed for AAC (Augmentative and Alternative Communication) applications. Supports Whisper, Google, and Sphinx APIs with automatic best-result selection.
Installation
For Users of This Library
Option 1: Install from npm (if published)
npm install aac-speech-recognitionOption 2: Install from GitHub
npm install git+https://github.com/Capstone-Projects-2025-Fall/project-002-aac-api.git#library:Initial_APIOption 3: Install Locally
git clone https://github.com/Capstone-Projects-2025-Fall/project-002-aac-api.git
cd project-002-aac-api
git checkout library
cd Initial_API
npm installPython Requirements
This library requires Python 3.11+ with the following packages:
# macOS (recommended: use Homebrew Python)
brew install [email protected]
/opt/homebrew/bin/pip3.11 install openai-whisper SpeechRecognition pocketsphinx
# Linux
pip3.11 install openai-whisper SpeechRecognition pocketsphinx
# Windows
pip install openai-whisper SpeechRecognition pocketsphinxSee INSTALLATION.md for detailed setup instructions.
Usage
As a Library (Node.js/Server-Side)
⚠️ IMPORTANT: The main export is for Node.js/server-side use only. Do not import it in browser/client components (React, Next.js client components, etc.) as it uses Node.js modules like fs and express.
const { transcribeAudio } = require('aac-speech-recognition');
// or if installed locally:
// const { transcribeAudio } = require('./index');
const fs = require('fs');
// Read audio file
const audioBuffer = fs.readFileSync('audio.wav');
// Transcribe with default APIs (whisper,google,sphinx)
const result = await transcribeAudio(audioBuffer);
console.log('Transcription:', result.transcription);
console.log('Confidence:', result.confidenceScore);
console.log('Selected API:', result.selectedApi);
// Or specify which APIs to use
const result2 = await transcribeAudio(audioBuffer, {
speechApis: 'whisper,google'
});As a Browser Client (React/Next.js Client Components)
For browser/client-side usage, use the browser export:
// ✅ CORRECT - Use browser export in client components
import { transcribeAudio } from 'aac-speech-recognition/browser';
// ❌ WRONG - This will cause "Module not found: Can't resolve 'fs'" error
// import { transcribeAudio } from 'aac-speech-recognition';
// In Next.js, make sure to mark as client component
'use client'; // Add this at the top of your file
// Use with audio Blob (from MediaRecorder, File input, etc.)
const audioBlob = new Blob([audioData], { type: 'audio/wav' });
const result = await transcribeAudio(audioBlob, {
apiUrl: 'http://localhost:8080/upload', // Your API server URL
speechApis: 'whisper,google,sphinx'
});
console.log('Transcription:', result.transcription);
console.log('Confidence:', result.confidenceScore);
console.log('Selected API:', result.selectedApi);Note: The browser version requires the API server to be running. Make sure to start the server first (see "As a Server" section below).
As a Server
# Start the API server
npm start
# or
node index.js
# or
node server.jsThe server will run on http://localhost:8080 (or the port specified in PORT environment variable).
API Endpoint
POST /upload
Upload an audio file for transcription.
curl -X POST http://localhost:8080/upload \
-F "[email protected]" \
-H "x-logging-consent: true"Response:
{
"success": true,
"transcription": "Hello, how are you?",
"confidenceScore": 0.85,
"aggregatedConfidenceScore": 0.72,
"selectedApi": "whisper",
"apiResults": [...],
"audio": {
"filename": "audio.wav",
"size": 12345,
"format": "WAV",
"duration": 2.5,
"sampleRate": 16000
}
}Custom Server Setup
const { app } = require('./index');
// or
const app = require('./server');
// Add custom routes, middleware, etc.
app.use('/custom', customRouter);
app.listen(3000);API Reference
transcribeAudio(audioBuffer, options)
Transcribe audio buffer using multi-API speech recognition.
Parameters:
audioBuffer(Buffer): Audio file bufferoptions(Object, optional):pythonPath(string): Path to Python executable (default: auto-detect)speechApis(string): Comma-separated list of APIs (default: "whisper,google,sphinx")
Returns: Promise
success(boolean): Whether transcription succeededtranscription(string): Transcribed textconfidenceScore(number): Confidence score of selected APIaggregatedConfidenceScore(number): Average confidence across all APIsselectedApi(string): API that provided the best resultapiResults(Array): Results from all APIs triedduration(number): Audio duration in secondsformat(string): Audio formatsampleRate(number): Sample rateerror(Object, optional): Error information if failed
parseUserAgent(userAgent)
Parse user agent string to extract browser and device info.
Parameters:
userAgent(string): User agent string
Returns: Object
browser(string): Browser namedevice(string): Device type (Mobile/Tablet/Desktop)
logRequest(data, consentGiven, logDir)
Log request data to file (with consent).
Parameters:
data(Object): Data to logconsentGiven(boolean): Whether user consented to logginglogDir(string, optional): Directory for log files (default: ./logs)
Supported APIs
Whisper (OpenAI) - Best for robotic/synthesized voices
- Excellent accuracy with synthesized voices
- Works offline
- High confidence scores (~0.85)
Google Speech Recognition - Good for natural speech
- Free tier available
- Requires internet connection
- Default confidence: 0.7
Sphinx (CMU) - Offline fallback
- Works offline
- Better with synthesized voices than Google
- Default confidence: 0.6
Configuration
Set the SPEECH_APIS environment variable to customize which APIs to use:
export SPEECH_APIS=whisper,google,sphinx # All three (default)
export SPEECH_APIS=whisper,google # Whisper + Google
export SPEECH_APIS=whisper # Only WhisperPython Requirements
The library requires Python 3.11+ with the following packages:
# Using Homebrew Python (recommended on macOS)
/opt/homebrew/bin/pip3.11 install openai-whisper SpeechRecognition pocketsphinxTesting
npm testLicense
ISC
