parakeet-coreml

v2.2.0

Published

11 days ago

NVIDIA Parakeet TDT ASR for Node.js with CoreML/ANE acceleration on Apple Silicon

0High
0Medium
0Low

swernerx

asr speech-recognition transcription parakeet nvidia coreml apple-silicon neural-engine macos

parakeet-coreml

Why parakeet-coreml?

Modern Macs contain a powerful Neural Engine (ANE) – dedicated silicon for machine learning that often sits idle. This library puts it to work for speech recognition, delivering real-time transcription without cloud dependencies.

The Problem with Alternatives

| Approach | Drawbacks | | ------------------------------------ | ---------------------------------------------------------------- | | Cloud APIs (OpenAI, Google, AWS) | Privacy concerns, ongoing costs, latency, requires internet | | Whisper.cpp | CPU-bound, significantly slower on Apple Silicon | | Python solutions | Requires Python runtime, complex deployment, subprocess overhead | | Electron + subprocess | Memory overhead, IPC latency, complex architecture |

Our Solution

parakeet-coreml is a native Node.js addon that directly interfaces with CoreML. No Python. No subprocess. No cloud. Just fast, private speech recognition leveraging the full power of Apple Silicon.

Features

🚀 40x real-time – Transcribe 1 hour of audio in 90 seconds (M1 Ultra, measured)
🍎 Neural Engine Acceleration – Runs on Apple's dedicated ML silicon, not CPU
🔒 Fully Offline – All processing happens locally. Your audio never leaves your device.
📦 Zero Runtime Dependencies – No Python, no subprocess, no external services
🎯 Smart Voice Detection – Built-in VAD automatically segments long recordings
🌍 Multilingual – English and major European languages (German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, and more)
⬇️ Automatic Setup – Models download on first use. Just npm install and go.

Performance

The Apple Neural Engine delivers exceptional speech recognition performance:

Measured: M1 Ultra

5 minutes of audio → 7.7 seconds
Speed: 40x real-time
1 hour of audio in 90 seconds

Run your own benchmark:

git clone https://github.com/sebastian-software/parakeet-coreml
cd parakeet-coreml && pnpm install && pnpm benchmark

Estimated Performance by Chip

Based on Neural Engine TOPS (tera operations per second):

| Chip | ANE TOPS | Estimated Speed | | -------- | -------- | ------------------ | | M4 Pro | 38 | 70x real-time | | M3 Pro | 18 | 35x real-time | | M2 Pro | 16 | 30x real-time | | M1 Ultra | 22 | 40x (measured) | | M1 Pro | 11 | 20x real-time |

Performance scales roughly with Neural Engine compute. Ultra variants have 2x ANE cores. Results may vary based on thermal conditions and system load.

Use Cases

Meeting transcription – Process recordings without uploading to third-party services
Podcast production – Generate transcripts for show notes and accessibility
Voice interfaces – Build voice-controlled applications with predictable latency
Content indexing – Make audio/video content searchable
Accessibility tools – Real-time captioning for the hearing impaired
Privacy-sensitive applications – Healthcare, legal, finance – where data cannot leave the device

Requirements

macOS 14.0+ (Sonoma or later)
Apple Silicon (M1, M2, M3, M4 – any variant)
Node.js 20+

Installation

npm install parakeet-coreml

The native addon compiles during installation. Xcode Command Line Tools are required.

Quick Start

import { ParakeetAsrEngine } from "parakeet-coreml"

const engine = new ParakeetAsrEngine()

// First run downloads models (cached for future use)
await engine.initialize()

// Transcribe audio of ANY length (16kHz, mono, Float32Array)
const result = await engine.transcribe(audioSamples)

console.log(result.text)
// "Hello, this is a test transcription."

console.log(`Processed in ${result.durationMs}ms`)

// Every result includes timestamps
for (const seg of result.segments) {
  console.log(`[${seg.startTime}s] ${seg.text}`)
}

engine.cleanup()

That's it. No API keys. No configuration. No internet required after the initial model download. No length limits – audio of any duration is automatically handled.

Audio Format

| Property | Requirement | | ----------- | --------------------------------------------- | | Sample Rate | 16,000 Hz (16 kHz) | | Channels | Mono (single channel) | | Format | Float32Array with values between -1.0–1.0 | | Duration | Any length |

Voice Activity Detection (VAD) automatically finds speech segments and provides timestamps. The result always includes segments with timing information – useful for subtitles, search indexing, or speaker diarization.

Converting Audio Files

This library processes raw PCM samples, not audio files directly. You'll need to decode your audio files before transcription. Common approaches:

ffmpeg – Convert any audio/video format to raw PCM
node-wav – Parse WAV files in Node.js
Web Audio API – Decode audio in browser/Electron environments

Example with ffmpeg (CLI):

ffmpeg -i input.mp3 -ar 16000 -ac 1 -f f32le output.pcm

Then load the raw PCM file:

import { readFileSync } from "fs"

const buffer = readFileSync("output.pcm")
const samples = new Float32Array(buffer.buffer, buffer.byteOffset, buffer.length / 4)

Model Management

Models are automatically downloaded on first use:

ASR models (~1.5GB) → ~/.cache/parakeet-coreml/models
VAD model (~1MB) → ~/.cache/parakeet-coreml/vad

CLI Commands

# Download all models (~1.5GB)
npx parakeet-coreml download

# Run benchmark
npx parakeet-coreml benchmark

# Check status
npx parakeet-coreml status

# Force re-download
npx parakeet-coreml download --force

Custom Configuration

// Use custom model directories
const engine = new ParakeetAsrEngine({
  modelDir: "./my-models",
  vadDir: "./my-vad-model"
})

// Disable auto-download (for controlled environments)
const engine = new ParakeetAsrEngine({
  autoDownload: false // Will throw if models not present
})

API Reference

`ParakeetAsrEngine`

The main class for speech recognition.

new ParakeetAsrEngine(options?: AsrEngineOptions)

Options

| Option | Type | Default | Description | | -------------- | --------- | --------------------------------- | ------------------------------- | | modelDir | string | ~/.cache/parakeet-coreml/models | Path to ASR model directory | | vadDir | string | ~/.cache/parakeet-coreml/vad | Path to VAD model directory | | autoDownload | boolean | true | Auto-download models if missing |

Methods

| Method | Description | | ---------------------------- | --------------------------------- | | initialize() | Load models (downloads if needed) | | transcribe(samples, opts?) | Transcribe audio of any length | | isReady() | Check if engine is initialized | | cleanup() | Release native resources | | getVersion() | Get version information |

`TranscriptionResult`

interface TranscriptionResult {
  text: string // Combined transcription
  durationMs: number // Processing time in milliseconds
  segments: TranscribedSegment[] // Speech segments with timestamps
}

interface TranscribedSegment {
  startTime: number // Segment start in seconds
  endTime: number // Segment end in seconds
  text: string // Transcription for this segment
}

`TranscribeOptions`

interface TranscribeOptions {
  sampleRate?: number // Default: 16000
  vadThreshold?: number // Speech detection sensitivity (0-1), default: 0.5
  minSilenceDurationMs?: number // Pause length to split, default: 300
  minSpeechDurationMs?: number // Minimum segment length, default: 250
}

Helper Functions

| Function | Description | | ----------------------- | -------------------------------------- | | isAvailable() | Check if running on supported platform | | getDefaultModelDir() | Get default ASR model cache path | | areModelsDownloaded() | Check if ASR models are present |

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Node.js App                     │
├─────────────────────────────────────────────────────────┤
│                  parakeet-coreml API                    │  TypeScript
├─────────────────────────────────────────────────────────┤
│          ASR Engine          │       VAD Engine         │  N-API + Objective-C++
│      (Parakeet TDT v3)       │      (Silero VAD)        │
├─────────────────────────────────────────────────────────┤
│                      CoreML                             │  Apple Framework
├─────────────────────────────────────────────────────────┤
│                 Apple Neural Engine                     │  Dedicated ML Silicon
└─────────────────────────────────────────────────────────┘

The library bridges Node.js directly to Apple's CoreML framework via a native N-API addon written in Objective-C++. Both ASR and VAD models run on the Neural Engine:

VAD detects speech segments with timestamps
ASR transcribes each segment (splitting at 15s if needed)
Results are combined with full timing information

This eliminates subprocess overhead and Python interop, resulting in minimal latency and efficient memory usage.

Contributing

Contributions are welcome! Please read our Contributing Guide for details on:

Development setup
Code style guidelines
Pull request process

License

MIT – see LICENSE for details.

Credits

NVIDIA Parakeet TDT v3 – The underlying ASR model
Silero VAD – Voice Activity Detection model
FluidInference – CoreML model conversions for both Parakeet and Silero VAD

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

parakeet-coreml

Why parakeet-coreml?

The Problem with Alternatives

Our Solution

Features

Performance

Estimated Performance by Chip

Use Cases

Requirements

Installation

Quick Start

Audio Format

Converting Audio Files

Model Management

CLI Commands

Custom Configuration

API Reference

ParakeetAsrEngine

Options

Methods

TranscriptionResult

TranscribeOptions

Helper Functions

Architecture

Contributing

License

Credits

`ParakeetAsrEngine`

`TranscriptionResult`

`TranscribeOptions`