mac-livetext
v1.0.0
Published
Apple's VisionKit LiveText OCR for Node.js and Bun on macOS with per-character bounding boxes
Downloads
5
Maintainers
Readme
mac-livetext
Apple's VisionKit LiveText OCR for Node.js and Bun on macOS with per-character bounding boxes.
This library provides a simple TypeScript/JavaScript API to Apple's state-of-the-art LiveText OCR technology, which powers text recognition in Photos, Safari, and other Apple apps. It extracts text with precise character-level bounding boxes from images.
Features
- ✅ Apple's LiveText OCR: Uses VisionKit's ImageAnalyzer for best-in-class text recognition
- ✅ Character-level bounding boxes: Get precise coordinates for each recognized character
- ✅ Multi-language support: Excellent recognition for English, Chinese, Japanese, and many other languages
- ✅ Zero build required: Precompiled binaries included - no Xcode or build tools needed
- ✅ Node.js & Bun compatible: Works with both Node.js 18+ and Bun
- ✅ TypeScript support: Full type definitions included
- ✅ Normalized coordinates: Optional normalized [0,1] coordinate system
Requirements
- macOS 14.0+ (Sonoma or later)
- Node.js 18+ or Bun
- Apple Silicon (M1/M2/M3) or Intel Mac
Installation
npm install mac-livetextQuick Start
import ocr from "mac-livetext";
// Simple text extraction
const result = await ocr.livetextFromImage("path/to/image.png");
console.log("Text:", result.text);
// With character bounding boxes
const result = await ocr.livetextFromImage("path/to/image.png", {
normalizedBoxes: true,
});
console.log("Text:", result.text);
console.log("Characters:", result.characters.length);
// Print each character with its position
result.characters.forEach((char, i) => {
console.log(`${i}: '${char.char}' at [${char.box.x}, ${char.box.y}]`);
});API Reference
livetextFromImage(input, options?)
Performs OCR on an image and returns recognized text with character bounding boxes.
Parameters:
input:string | URL | Blob | ArrayBuffer | Uint8Array- File path, file:// URL, or image data
options:LiveTextOptions(optional)normalizedBoxes?: boolean- Return coordinates in [0,1] range instead of pixelstimeoutMs?: number- Timeout in milliseconds (default: 15000)
Returns: Promise<LiveTextRecognizeResult>
interface LiveTextRecognizeResult {
text: string; // Full recognized text
characters: LiveTextCharacter[]; // Per-character results
}
interface LiveTextCharacter {
char: string; // The character
box: BoundingBox; // Position and size
}
interface BoundingBox {
x: number; // Left coordinate
y: number; // Top coordinate (from top-left origin)
width: number; // Character width
height: number; // Character height
}Examples
Basic Usage
import ocr from "mac-livetext";
const result = await ocr.livetextFromImage("screenshot.png");
console.log(result.text);Character-level Analysis
import ocr from "mac-livetext";
const result = await ocr.livetextFromImage("document.jpg", {
normalizedBoxes: true,
});
// Find all numbers in the image
const numbers = result.characters.filter(c => /\d/.test(c.char));
console.log("Found numbers:", numbers.map(c => c.char).join(""));
// Get text regions by clustering nearby characters
const lines = groupCharactersByLine(result.characters);
console.log("Text lines:", lines);Processing Multiple Images
import ocr from "mac-livetext";
import { readdir } from "fs/promises";
const imageFiles = await readdir("./images");
const results = await Promise.all(
imageFiles
.filter(f => f.match(/\.(jpg|png|heic)$/i))
.map(async file => ({
file,
result: await ocr.livetextFromImage(`./images/${file}`)
}))
);
results.forEach(({ file, result }) => {
console.log(`${file}: ${result.text}`);
});Using with Image Buffers
import ocr from "mac-livetext";
import { readFile } from "fs/promises";
// From file buffer
const imageBuffer = await readFile("image.png");
const result = await ocr.livetextFromImage(imageBuffer);
// From URL (browser/Bun)
const response = await fetch("https://example.com/image.jpg");
const blob = await response.blob();
const result2 = await ocr.livetextFromImage(blob);Command Line Usage
The package also includes a CLI tool:
# Using npx
npx mac-livetext image.png
# Or install globally
npm install -g mac-livetext
mac-livetext image.png --normalized
# CLI options
mac-livetext image.png --normalized --timeout 30000Performance
- Cold start: ~1-2 seconds (includes process startup)
- Warm performance: ~0.5-1 seconds per image
- Memory usage: ~50-100MB during processing
- Supported formats: PNG, JPEG, HEIC, and other common image formats
How it Works
This library uses a hybrid approach for maximum reliability:
- Native Swift CLI: The core OCR is implemented in Swift using VisionKit's
ImageAnalyzer - Node.js Integration: TypeScript wrapper manages the CLI process and handles I/O
- Process Isolation: Running OCR in a separate process prevents crashes and memory leaks
- Universal Binaries: Precompiled for both Apple Silicon and Intel Macs
The Swift implementation directly calls Apple's VisionKit framework, ensuring you get the same high-quality OCR that powers Apple's own apps.
Troubleshooting
"Command not found" or Permission Errors
The CLI binary should be executable after installation. If you encounter permission issues:
chmod +x node_modules/mac-livetext/src/native/objc/livetext-climacOS Version Compatibility
This library requires macOS 14.0+. On older versions, you'll get an error:
VisionKit Live Text requires macOS 14+Development
Building from Source
git clone https://github.com/adambarbato/mac-livetext.git
cd mac-livetext
npm install
npm run build:nativeRunning Examples
npm run example path/to/your/image.pngRelated Projects
- ocrmac - Python bindings that inspired this project
- mac-system-ocr - Implements JS bindings for the older Apple 'Vision' API
License
MIT
