whisper-coreml

v1.1.0

Published

5 days ago

OpenAI Whisper ASR for Node.js with CoreML/ANE acceleration on Apple Silicon

Downloads

491

0High
0Medium
0Low

swernerx

whisper openai asr speech-recognition transcription coreml apple-silicon neural-engine macos whisper.cpp

whisper-coreml

Transcribe audio in 99 languages. Run 100% offline on your Mac.

OpenAI's Whisper is the gold standard for speech recognition accuracy. This package brings it to Node.js – powered by Apple's Neural Engine for fast, private, local transcription.

The Pitch

🎯 Accuracy first. Whisper large-v3-turbo delivers state-of-the-art transcription quality – better than any cloud API, right on your Mac.

🌍 99 languages. From Afrikaans to Zulu. Handles accents, dialects, and background noise.

🔒 100% private. Your audio never leaves your device. No API keys. No cloud. No subscription.

⚡ Fast enough. 14x real-time on M1 Ultra – transcribe 1 hour of audio in under 5 minutes.

Why CoreML?

Running Whisper without hardware acceleration is painfully slow. Here's how the alternatives compare:

| Approach | Speed | Drawbacks | | ----------------------- | ----------------- | --------------------------- | | OpenAI Whisper (Python) | ~2x real-time | Slow, needs Python | | whisper.cpp (CPU) | ~4x real-time | No acceleration | | faster-whisper | ~6x real-time | Needs NVIDIA GPU | | Cloud APIs | ~1x + latency | Costs $$$, privacy concerns | | whisper-coreml | 14x real-time | macOS only ✓ |

The Neural Engine in every Apple Silicon Mac is a dedicated ML accelerator that usually sits idle. This package puts it to work.

vs. parakeet-coreml

Need even more speed? Our sister project parakeet-coreml trades language coverage for 40x real-time performance.

| | whisper-coreml | parakeet-coreml | | ------------- | ------------------------ | --------------- | | Best for | Accuracy, rare languages | Maximum speed | | Speed | 14x real-time | 40x real-time | | Languages | 99 | 25 European |

Features

🎯 99 Languages – Full OpenAI Whisper multilingual support
🚀 14x real-time – 1 hour of audio in ~4.5 minutes (M1 Ultra)
🍎 Neural Engine – Runs on Apple's dedicated ML chip via CoreML
🔒 Fully Offline – No internet required after setup
📦 Zero Dependencies – No Python, no subprocess, no hassle
📝 Timestamps – Segment-level timing for subtitles
⬇️ One Command Setup – npx whisper-coreml download

Get Started

# Install
npm install whisper-coreml

# Download the model (~3GB, one-time)
npx whisper-coreml download

Requirements: macOS 14+ (Sonoma), Apple Silicon (M1/M2/M3/M4), Node.js 20+

Performance

Measured on M1 Ultra:

5 min audio  →  22 seconds  →  14x real-time
1 hour audio →  4.5 minutes

Run npx whisper-coreml benchmark to test on your machine.

Quick Start

import { WhisperAsrEngine, getModelPath } from "whisper-coreml"

const engine = new WhisperAsrEngine({
  modelPath: getModelPath()
})

await engine.initialize()

// Transcribe audio (16kHz, mono, Float32Array)
const result = await engine.transcribe(audioSamples, 16000)

console.log(result.text)
// "Hello, this is a test transcription."

console.log(`Language: ${result.language}`)
console.log(`Processed in ${result.durationMs}ms`)

// Segments include timestamps
for (const seg of result.segments) {
  console.log(`[${seg.startMs}ms - ${seg.endMs}ms] ${seg.text}`)
}

engine.cleanup()

Audio Format

| Property | Requirement | | ----------- | --------------------------------------------- | | Sample Rate | 16,000 Hz (16 kHz) | | Channels | Mono (single channel) | | Format | Float32Array with values between -1.0–1.0 | | Duration | Any length (auto-chunked internally) |

Converting Audio Files

Example with ffmpeg:

ffmpeg -i input.mp3 -ar 16000 -ac 1 -f f32le output.pcm

Then load the raw PCM file:

import { readFileSync } from "fs"

const buffer = readFileSync("output.pcm")
const samples = new Float32Array(buffer.buffer, buffer.byteOffset, buffer.length / 4)

CLI Commands

# Download the model (~1.5GB)
npx whisper-coreml download

# Check status
npx whisper-coreml status

# Run benchmark (requires cloned repo)
npx whisper-coreml benchmark

# Get model directory path
npx whisper-coreml path

API Reference

`WhisperAsrEngine`

The main class for speech recognition.

new WhisperAsrEngine(options: WhisperAsrOptions)

Options

| Option | Type | Default | Description | | ----------- | -------- | -------- | --------------------------------- | | modelPath | string | required | Path to ggml model file | | language | string | "auto" | Language code or "auto" to detect | | threads | number | 0 | CPU threads (0 = auto) |

Methods

| Method | Description | | --------------------------- | ------------------------------ | | initialize() | Load model (async) | | transcribe(samples, rate) | Transcribe audio | | isReady() | Check if engine is initialized | | cleanup() | Release native resources | | getVersion() | Get version information |

`TranscriptionResult`

interface TranscriptionResult {
  text: string // Full transcription
  language: string // Detected language (ISO code)
  durationMs: number // Processing time in milliseconds
  segments: TranscriptionSegment[]
}

interface TranscriptionSegment {
  startMs: number // Segment start in milliseconds
  endMs: number // Segment end in milliseconds
  text: string // Transcription for this segment
  confidence: number // Confidence score (0-1)
}

Helper Functions

| Function | Description | | ---------------------- | -------------------------------------- | | isAvailable() | Check if running on supported platform | | getDefaultModelDir() | Get default model cache path | | getModelPath() | Get path to the model file | | isModelDownloaded() | Check if model is downloaded | | downloadModel() | Download the model |

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Node.js App                     │
├─────────────────────────────────────────────────────────┤
│                  whisper-coreml API                     │  TypeScript
├─────────────────────────────────────────────────────────┤
│                   Native Addon                          │  N-API + C++
│                  (whisper_engine)                       │
├─────────────────────────────────────────────────────────┤
│                    whisper.cpp                          │  C++
├─────────────────────────────────────────────────────────┤
│                      CoreML                             │  Apple Framework
├─────────────────────────────────────────────────────────┤
│                 Apple Neural Engine                     │  Dedicated ML Silicon
└─────────────────────────────────────────────────────────┘

Use Cases

Maximum accuracy – When other solutions aren't good enough
Rare languages – 99 languages, far beyond English/European
Accented speech – Whisper handles accents and dialects well
Noisy audio – Robust to background noise and music

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

MIT – see LICENSE for details.

Credits

whisper.cpp by Georgi Gerganov
OpenAI Whisper by OpenAI

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

whisper-coreml

The Pitch

Why CoreML?

vs. parakeet-coreml

Features

Get Started

Performance

Quick Start

Audio Format

Converting Audio Files

CLI Commands

API Reference

WhisperAsrEngine

Options

Methods

TranscriptionResult

Helper Functions

Architecture

Use Cases

Contributing

License

Credits

`WhisperAsrEngine`

`TranscriptionResult`