mac-ocr

v1.0.0

Published

12 days ago

macOS CLI for OCR and searchable PDFs using Apple's Vision framework. Recognize text in images and PDFs, stream PDF pages, and add an invisible selectable text layer over scanned pages.

0High
0Medium
0Low

hirokiosame

mac macos ocr text-recognition apple-vision vision pdf searchable-pdf ocr-pdf scanned-pdf cli native swift

[!TIP] Useful for AI agents too: instead of spending vision tokens reading documents, an agent can run mac-ocr locally for free. A skill is bundled so agents know how to use it.

Features

Read text from an image: mac-ocr photo.png
Read text from many images: mac-ocr *.png
Stream text from a PDF, page by page: mac-ocr scan.pdf --format jsonl
Turn an image into a searchable PDF: mac-ocr searchable-pdf photo.png → photo.ocr.pdf
Add a selectable text layer to a scanned PDF: mac-ocr searchable-pdf scan.pdf → scan.ocr.pdf

Install

npm install -g mac-ocr

Or run it without installing:

npx mac-ocr receipt.jpg

Requirements: macOS 10.15+. The npm package ships a prebuilt universal binary, so no Xcode or Swift toolchain is needed.

Recognize text

OCR is the default action — you don't need a subcommand:

mac-ocr receipt.jpg                 # text → stdout
mac-ocr page1.png page2.png         # multiple images
mac-ocr scan.pdf                    # multi-page PDF
cat screenshot.png | mac-ocr        # stdin
mac-ocr https://example.com/a.png   # URL (simple GET)

Default output is plain text. Use JSON when you need bounding boxes, confidence, or page metadata:

mac-ocr receipt.jpg --format json
mac-ocr document.pdf --format jsonl   # one JSON object per page, streamed

PDF pages stream as they're recognized, so with a large document you see the first page's text right away.

Save text to files

mac-ocr ~/Screenshots/*.png -o '[dir]/[name].txt'   # a .txt next to each image
mac-ocr scan.pdf -o notes.md                        # recognized text to a chosen .txt/.md file
mac-ocr receipts/*.pdf -o out/                      # one file per input in out/
grep -rli "invoice" ~/Screenshots                    # then search with normal tools

-o takes a file, a directory (out/), or a filename template (all placeholders). Quote templates, since […] is a glob pattern in zsh. Whatever the extension, the content is the plain recognized text.

Create a searchable PDF

searchable-pdf takes a PDF or an image and writes a PDF that looks identical to the source but whose text is selectable and searchable. By default it writes [name].ocr.pdf next to each input — one searchable PDF per input (inputs are never merged):

mac-ocr searchable-pdf scan.pdf            # writes scan.ocr.pdf
mac-ocr searchable-pdf photo.jpg            # image → one-page photo.ocr.pdf
mac-ocr searchable-pdf *.pdf                # writes <name>.ocr.pdf for each

Use -o to control the destination — a directory, a [name] template, a fixed file, or - for stdout:

mac-ocr searchable-pdf scan.pdf -o out/              # out/scan.ocr.pdf
mac-ocr searchable-pdf scan.pdf -o '[name]-ocr.pdf'  # scan-ocr.pdf
mac-ocr searchable-pdf scan.pdf -o searchable.pdf    # fixed path
mac-ocr searchable-pdf scan.pdf -o - > scan.pdf      # stdout

A fixed path or - (stdout) takes a single input; for multiple inputs use a directory or a [name] template.

Pages that already have selectable text are skipped — only scanned pages get OCR. A PDF that needs no OCR at all passes through unchanged. To OCR every page regardless, pass --ocr-all-pages. The finer points (what survives a rewrite, how "already has text" is decided) are in docs/CLI.md.

In an interactive terminal you get a live [page/total] progress counter. Piped or redirected runs are silent on success, so scripts stay clean.

Options

Both OCR and searchable-pdf accept the recognition options:

| Flag | Effect | |------|--------| | --fast | Faster, lower-accuracy recognition (details) | | --password <password> | Password for an encrypted PDF (or set MAC_OCR_PDF_PASSWORD) | | -l, --language <code> | Recognition language (BCP-47, repeatable). e.g. -l en-US -l ja-JP | | -c, --confidence <0–1> | Drop observations below this confidence | | -w, --custom-words <word> | Add custom vocabulary (repeatable) | | --custom-words-file <path> | Custom vocabulary file, one word per line | | --no-language-correction | Disable language correction | | --min-text-height <0–1> | Ignore text shorter than this fraction of image height | | --pdf-dpi <auto\|72–600> | PDF rasterization DPI (default auto) | | --roi <x,y,w,h> | Region of interest: restrict recognition to a normalized region (top-left origin) |

`mac-ocr <file>`

| Flag | Effect | |------|--------| | -f, --format <text\|json\|jsonl> | Output format (default text) | | -o, --output <path> | Output path, directory, or template ([name], [ext], [dir], [page]). Default: stdout. Any extension — e.g. .txt or .md. | | --max-candidates <1–10> | Alternative text candidates per observation |

`mac-ocr searchable-pdf <file>`

| Flag | Effect | |------|--------| | -o, --output <dest> | Output path, [name] template, directory, or - for stdout. Default: [name].ocr.pdf next to each input. | | --ocr-all-pages | OCR every page, including pages that already have selectable text (skipped by default) |

List the recognition languages available on your macOS version with mac-ocr languages (add --fast for the fast recognizer's set).

See docs/CLI.md for the full reference — every command and flag, plus the JSON output schema.

Node.js API

The same package exposes a typed, promise-based API that wraps the binary. Inputs are image or PDF bytes — read files or fetch URLs in your own code and pass the bytes:

npm install mac-ocr

import fs from 'node:fs/promises'
import { ocr, createSearchablePdf, supportedLanguages } from 'mac-ocr'

// Recognize text in an image or single-page PDF
const result = await ocr(await fs.readFile('receipt.jpg'))
console.log(result.text)
for (const { text, confidence, boundingBox } of result.observations) { /* … */ }

// Multi-page PDF: stream pages as they finish…
for await (const page of ocr.pages(await fs.readFile('book.pdf'))) {
    console.log(page.page, '/', page.pageCount, page.text)
}
// …or collect the whole thing into an array
const pages = await Array.fromAsync(ocr.pages(await fs.readFile('book.pdf')))

// Build a searchable PDF (returns the PDF bytes)
const pdf = await createSearchablePdf(await fs.readFile('scan.pdf'), { fast: true })
await fs.writeFile('scan.ocr.pdf', pdf)

// Recognition languages supported on this macOS version (for ocr and createSearchablePdf)
const languages = await supportedLanguages()

Options mirror the CLI flags (like { fast: true } above), plus an AbortSignal for cancellation. Failures throw a MacOcrError with a kind you can branch on. See docs/NODE.md for every option, the result types, and error handling.

How it works

mac-ocr is a native Swift binary built on Apple's Vision framework (VNRecognizeTextRequest). Recognition happens entirely on-device — nothing is uploaded. The searchable-PDF layer is invisible text drawn with Core Graphics + Core Text, placed word by word where Vision found each word.

Agent Skills

The package bundles an agent skill covering the CLI and Node API — set up skills-npm in your project and coding agents discover it automatically.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme