mac-livetext

v1.0.0

Published

5 months ago

Apple's VisionKit LiveText OCR for Node.js and Bun on macOS with per-character bounding boxes

Downloads

0High
0Medium
0Low

adambarbato

ocr livetext visionkit apple macos text-recognition vision character-detection bounding-box

mac-livetext

Apple's VisionKit LiveText OCR for Node.js and Bun on macOS with per-character bounding boxes.

This library provides a simple TypeScript/JavaScript API to Apple's state-of-the-art LiveText OCR technology, which powers text recognition in Photos, Safari, and other Apple apps. It extracts text with precise character-level bounding boxes from images.

Features

✅ Apple's LiveText OCR: Uses VisionKit's ImageAnalyzer for best-in-class text recognition
✅ Character-level bounding boxes: Get precise coordinates for each recognized character
✅ Multi-language support: Excellent recognition for English, Chinese, Japanese, and many other languages
✅ Zero build required: Precompiled binaries included - no Xcode or build tools needed
✅ Node.js & Bun compatible: Works with both Node.js 18+ and Bun
✅ TypeScript support: Full type definitions included
✅ Normalized coordinates: Optional normalized [0,1] coordinate system

Requirements

macOS 14.0+ (Sonoma or later)
Node.js 18+ or Bun
Apple Silicon (M1/M2/M3) or Intel Mac

Installation

npm install mac-livetext

Quick Start

import ocr from "mac-livetext";

// Simple text extraction
const result = await ocr.livetextFromImage("path/to/image.png");
console.log("Text:", result.text);

// With character bounding boxes
const result = await ocr.livetextFromImage("path/to/image.png", {
  normalizedBoxes: true,
});

console.log("Text:", result.text);
console.log("Characters:", result.characters.length);

// Print each character with its position
result.characters.forEach((char, i) => {
  console.log(`${i}: '${char.char}' at [${char.box.x}, ${char.box.y}]`);
});

API Reference

`livetextFromImage(input, options?)`

Performs OCR on an image and returns recognized text with character bounding boxes.

Parameters:

input: string | URL | Blob | ArrayBuffer | Uint8Array
- File path, file:// URL, or image data
options: LiveTextOptions (optional)
- normalizedBoxes?: boolean - Return coordinates in [0,1] range instead of pixels
- timeoutMs?: number - Timeout in milliseconds (default: 15000)

Returns: Promise<LiveTextRecognizeResult>

interface LiveTextRecognizeResult {
  text: string; // Full recognized text
  characters: LiveTextCharacter[]; // Per-character results
}

interface LiveTextCharacter {
  char: string; // The character
  box: BoundingBox; // Position and size
}

interface BoundingBox {
  x: number; // Left coordinate
  y: number; // Top coordinate (from top-left origin)
  width: number; // Character width
  height: number; // Character height
}

Examples

Basic Usage

import ocr from "mac-livetext";

const result = await ocr.livetextFromImage("screenshot.png");
console.log(result.text);

Character-level Analysis

import ocr from "mac-livetext";

const result = await ocr.livetextFromImage("document.jpg", {
  normalizedBoxes: true,
});

// Find all numbers in the image
const numbers = result.characters.filter(c => /\d/.test(c.char));
console.log("Found numbers:", numbers.map(c => c.char).join(""));

// Get text regions by clustering nearby characters
const lines = groupCharactersByLine(result.characters);
console.log("Text lines:", lines);

Processing Multiple Images

import ocr from "mac-livetext";
import { readdir } from "fs/promises";

const imageFiles = await readdir("./images");
const results = await Promise.all(
  imageFiles
    .filter(f => f.match(/\.(jpg|png|heic)$/i))
    .map(async file => ({
      file,
      result: await ocr.livetextFromImage(`./images/${file}`)
    }))
);

results.forEach(({ file, result }) => {
  console.log(`${file}: ${result.text}`);
});

Using with Image Buffers

import ocr from "mac-livetext";
import { readFile } from "fs/promises";

// From file buffer
const imageBuffer = await readFile("image.png");
const result = await ocr.livetextFromImage(imageBuffer);

// From URL (browser/Bun)
const response = await fetch("https://example.com/image.jpg");
const blob = await response.blob();
const result2 = await ocr.livetextFromImage(blob);

Command Line Usage

The package also includes a CLI tool:

# Using npx
npx mac-livetext image.png

# Or install globally
npm install -g mac-livetext
mac-livetext image.png --normalized

# CLI options
mac-livetext image.png --normalized --timeout 30000

Performance

Cold start: ~1-2 seconds (includes process startup)
Warm performance: ~0.5-1 seconds per image
Memory usage: ~50-100MB during processing
Supported formats: PNG, JPEG, HEIC, and other common image formats

How it Works

This library uses a hybrid approach for maximum reliability:

Native Swift CLI: The core OCR is implemented in Swift using VisionKit's ImageAnalyzer
Node.js Integration: TypeScript wrapper manages the CLI process and handles I/O
Process Isolation: Running OCR in a separate process prevents crashes and memory leaks
Universal Binaries: Precompiled for both Apple Silicon and Intel Macs

The Swift implementation directly calls Apple's VisionKit framework, ensuring you get the same high-quality OCR that powers Apple's own apps.

Troubleshooting

"Command not found" or Permission Errors

The CLI binary should be executable after installation. If you encounter permission issues:

chmod +x node_modules/mac-livetext/src/native/objc/livetext-cli

macOS Version Compatibility

This library requires macOS 14.0+. On older versions, you'll get an error:

VisionKit Live Text requires macOS 14+

Development

Building from Source

git clone https://github.com/adambarbato/mac-livetext.git
cd mac-livetext
npm install
npm run build:native

Running Examples

npm run example path/to/your/image.png

Related Projects

ocrmac - Python bindings that inspired this project
mac-system-ocr - Implements JS bindings for the older Apple 'Vision' API

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mac-livetext

Features

Requirements

Installation

Quick Start

API Reference

livetextFromImage(input, options?)

Examples

Basic Usage

Character-level Analysis

Processing Multiple Images

Using with Image Buffers

Command Line Usage

Performance

How it Works

Troubleshooting

"Command not found" or Permission Errors

macOS Version Compatibility

Development

Building from Source

Running Examples

Related Projects

License

`livetextFromImage(input, options?)`