whisper-nodejs-wrapper
v1.0.0
Published
Node.js wrapper for OpenAI Whisper speech recognition with TypeScript support
Maintainers
Readme
Whisper for Node.js
A Node.js wrapper for OpenAI's Whisper speech recognition model. This package provides an easy-to-use interface for transcribing audio files with word-level timestamps.
Features
- 🎯 Simple async/await API
- 🔄 Automatic retry with exponential backoff
- 📝 Word-level timestamps
- 🌍 Multi-language support
- 🔧 TypeScript support
- 🚀 Automatic dependency installation
- 💻 CPU and GPU support
Installation
npm install @whisper/nodejsThe package will automatically create a Python virtual environment and install dependencies during the npm install process. This avoids conflicts with system Python packages.
Quick Start
const { whisper } = require('@whisper/nodejs');
// Basic transcription
const result = await whisper.transcribe('audio.mp3');
console.log(result.text);
// With options
const result = await whisper.transcribe('audio.mp3', {
language: 'en',
modelSize: 'base'
});TypeScript Usage
import { WhisperTranscriber, WhisperOptions, WhisperResult } from '@whisper/nodejs';
const transcriber = new WhisperTranscriber();
const options: WhisperOptions = {
language: 'en',
modelSize: 'base',
verbose: true
};
const result: WhisperResult = await transcriber.transcribe('audio.mp3', options);
// Access word-level timestamps
result.segments.forEach(segment => {
console.log(`[${segment.start}-${segment.end}] ${segment.text}`);
segment.words?.forEach(word => {
console.log(` ${word.text} (${word.start}-${word.end})`);
});
});API Reference
WhisperTranscriber
Constructor
new WhisperTranscriber(options?: { pythonPath?: string })pythonPath(optional): Path to Python executable. Auto-detects if not provided.
Methods
transcribe(audioPath: string, options?: WhisperOptions): Promise<WhisperResult>
Transcribe an audio file.
Parameters:
audioPath: Path to the audio fileoptions: Transcription options
transcribeWithRetry(audioPath: string, options?: WhisperOptions, maxRetries?: number): Promise<WhisperResult>
Transcribe with automatic retry on failure.
Parameters:
audioPath: Path to the audio fileoptions: Transcription optionsmaxRetries: Maximum number of retry attempts (default: 3)
initialize(): Promise<void>
Initialize and check/install dependencies.
checkDependencies(): Promise<boolean>
Check if Python dependencies are installed.
Types
WhisperOptions
interface WhisperOptions {
language?: string; // Language code (e.g., 'en', 'es', 'fr')
modelSize?: 'tiny' | 'base' | 'small' | 'medium' | 'large';
pythonPath?: string; // Custom Python path
cpuOnly?: boolean; // Force CPU-only mode
verbose?: boolean; // Enable verbose logging
}WhisperResult
interface WhisperResult {
text: string; // Full transcribed text
segments: WhisperSegment[]; // Time-aligned segments
language?: string; // Detected language
duration?: number; // Total audio duration
}WhisperSegment
interface WhisperSegment {
text: string; // Segment text
start: number; // Start time in seconds
end: number; // End time in seconds
words?: WhisperWord[]; // Word-level timestamps
}Model Sizes
| Model | Parameters | English-only | Multilingual | Required VRAM | Relative Speed | |-------|------------|--------------|--------------|---------------|----------------| | tiny | 39 M | ✓ | ✓ | ~1 GB | ~32x | | base | 74 M | ✓ | ✓ | ~1 GB | ~16x | | small | 244 M | ✓ | ✓ | ~2 GB | ~6x | | medium| 769 M | ✓ | ✓ | ~5 GB | ~2x | | large | 1550 M | ✗ | ✓ | ~10 GB | 1x |
Language Support
Supports 100+ languages including:
- English (
en) - Spanish (
es) - French (
fr) - German (
de) - Italian (
it) - Portuguese (
pt) - Russian (
ru) - Chinese (
zh) - Japanese (
ja) - Korean (
ko) - Vietnamese (
vi) - And many more...
Environment Variables
WHISPER_CPU_ONLY: Set to"1"to force CPU-only modeWHISPER_VERBOSE: Set to"true"for verbose loggingSKIP_WHISPER_SETUP: Set to"true"to skip automatic setup
Requirements
- Node.js >= 16.0.0
- Python >= 3.7
- FFmpeg (for audio processing)
Troubleshooting
Python not found
Make sure Python 3.7+ is installed and available in PATH:
python3 --versionManual dependency installation
If automatic installation fails:
pip install openai-whisper torchGPU Support
For GPU acceleration, install CUDA-enabled PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
