whisper-nodejs-wrapper

v1.0.0

Published

8 months ago

Node.js wrapper for OpenAI Whisper speech recognition with TypeScript support

0High
0Medium
0Low

whisper openai-whisper speech-to-text speech-recognition transcription audio-transcription nodejs typescript asr automatic-speech-recognition

Whisper for Node.js

A Node.js wrapper for OpenAI's Whisper speech recognition model. This package provides an easy-to-use interface for transcribing audio files with word-level timestamps.

Features

🎯 Simple async/await API
🔄 Automatic retry with exponential backoff
📝 Word-level timestamps
🌍 Multi-language support
🔧 TypeScript support
🚀 Automatic dependency installation
💻 CPU and GPU support

Installation

npm install @whisper/nodejs

The package will automatically create a Python virtual environment and install dependencies during the npm install process. This avoids conflicts with system Python packages.

Quick Start

const { whisper } = require('@whisper/nodejs');

// Basic transcription
const result = await whisper.transcribe('audio.mp3');
console.log(result.text);

// With options
const result = await whisper.transcribe('audio.mp3', {
  language: 'en',
  modelSize: 'base'
});

TypeScript Usage

import { WhisperTranscriber, WhisperOptions, WhisperResult } from '@whisper/nodejs';

const transcriber = new WhisperTranscriber();

const options: WhisperOptions = {
  language: 'en',
  modelSize: 'base',
  verbose: true
};

const result: WhisperResult = await transcriber.transcribe('audio.mp3', options);

// Access word-level timestamps
result.segments.forEach(segment => {
  console.log(`[${segment.start}-${segment.end}] ${segment.text}`);
  
  segment.words?.forEach(word => {
    console.log(`  ${word.text} (${word.start}-${word.end})`);
  });
});

API Reference

`WhisperTranscriber`

Constructor

new WhisperTranscriber(options?: { pythonPath?: string })

pythonPath (optional): Path to Python executable. Auto-detects if not provided.

Methods

`transcribe(audioPath: string, options?: WhisperOptions): Promise<WhisperResult>`

Transcribe an audio file.

Parameters:

audioPath: Path to the audio file
options: Transcription options

`transcribeWithRetry(audioPath: string, options?: WhisperOptions, maxRetries?: number): Promise<WhisperResult>`

Transcribe with automatic retry on failure.

Parameters:

audioPath: Path to the audio file
options: Transcription options
maxRetries: Maximum number of retry attempts (default: 3)

`initialize(): Promise<void>`

Initialize and check/install dependencies.

`checkDependencies(): Promise<boolean>`

Check if Python dependencies are installed.

Types

`WhisperOptions`

interface WhisperOptions {
  language?: string;           // Language code (e.g., 'en', 'es', 'fr')
  modelSize?: 'tiny' | 'base' | 'small' | 'medium' | 'large';
  pythonPath?: string;         // Custom Python path
  cpuOnly?: boolean;           // Force CPU-only mode
  verbose?: boolean;           // Enable verbose logging
}

`WhisperResult`

interface WhisperResult {
  text: string;                // Full transcribed text
  segments: WhisperSegment[];  // Time-aligned segments
  language?: string;           // Detected language
  duration?: number;           // Total audio duration
}

`WhisperSegment`

interface WhisperSegment {
  text: string;                // Segment text
  start: number;               // Start time in seconds
  end: number;                 // End time in seconds
  words?: WhisperWord[];       // Word-level timestamps
}

Model Sizes

| Model | Parameters | English-only | Multilingual | Required VRAM | Relative Speed | |-------|------------|--------------|--------------|---------------|----------------| | tiny | 39 M | ✓ | ✓ | ~1 GB | ~32x | | base | 74 M | ✓ | ✓ | ~1 GB | ~16x | | small | 244 M | ✓ | ✓ | ~2 GB | ~6x | | medium| 769 M | ✓ | ✓ | ~5 GB | ~2x | | large | 1550 M | ✗ | ✓ | ~10 GB | 1x |

Language Support

Supports 100+ languages including:

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Russian (ru)
Chinese (zh)
Japanese (ja)
Korean (ko)
Vietnamese (vi)
And many more...

Environment Variables

WHISPER_CPU_ONLY: Set to "1" to force CPU-only mode
WHISPER_VERBOSE: Set to "true" for verbose logging
SKIP_WHISPER_SETUP: Set to "true" to skip automatic setup

Requirements

Node.js >= 16.0.0
Python >= 3.7
FFmpeg (for audio processing)

Troubleshooting

Python not found

Make sure Python 3.7+ is installed and available in PATH:

python3 --version

Manual dependency installation

If automatic installation fails:

pip install openai-whisper torch

GPU Support

For GPU acceleration, install CUDA-enabled PyTorch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme