npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

smart-ocr

v1.3.1

Published

OCR library for both scanned and text-based PDFs in .pdf or image format using tesseract.js with AI-powered structured output support.

Readme

Smart OCR

npm version CI License No Known Security Issues

smart-ocr is a Node.js OCR library for:

  • text-based PDFs
  • scanned PDFs
  • mixed PDFs with both text-native and scanned pages
  • PNG and other common raster image formats
  • optional AI-assisted structured output from extracted OCR text

For PDFs, each page is handled independently. If a page already contains selectable text, Smart OCR extracts it directly. If a page is image-only, it renders the page and falls back to OCR.

Requirements

  • Node.js >=20.6.0

This package is designed for Node.js. It is not set up for browser use.

Installation

npm install smart-ocr

Quick Start

import { SmartOCR } from "smart-ocr";
import path from "node:path";
import { fileURLToPath } from "node:url";

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const ocr = new SmartOCR({ language: "eng", workerCount: 2 });

try {
  const pdfText = await ocr.processPDF(path.join(__dirname, "sample-scanned.pdf"));
  console.log(pdfText);
} finally {
  await ocr.terminate();
}

Structured Output

Smart OCR can optionally turn extracted text into structured JSON.

  • OCR still runs first
  • the extracted text is then sent to an AI model to produce structured output

When structuredOutputOptions.ai is configured, processFile(), processPDF(), and processImage() return a JSON object instead of a plain text string.

Supported providers:

  • openai - uses structured outputs (response_format: json_schema)
  • anthropic - uses tool use to enforce schema-shaped output
  • gemini - uses responseMimeType: "application/json" with responseSchema

Example (OpenAI):

import { SmartOCR } from "smart-ocr";

const ocr = new SmartOCR({
  language: "eng",
  structuredOutputOptions: {
    ai: {
      provider: "openai",
      model: "gpt-4.1-mini",
      apiKey: process.env.OPENAI_API_KEY,
      prompt: "Extract the document fields. Use null when a value is missing or unclear.",
    },
    schema: {
      type: "object",
      properties: {
        fullName: { type: ["string", "null"] },
        idNumber: { type: ["string", "null"] },
        dateOfBirth: { type: ["string", "null"] },
        sex: { type: ["string", "null"] },
      },
      required: ["fullName", "idNumber", "dateOfBirth", "sex"],
      additionalProperties: false,
    },
  },
});

try {
  const result = await ocr.processFile("./id.pdf");
  console.log(result);
} finally {
  await ocr.terminate();
}

Example (Anthropic):

const ocr = new SmartOCR({
  structuredOutputOptions: {
    ai: {
      provider: "anthropic",
      model: "claude-opus-4-5",
      apiKey: process.env.ANTHROPIC_API_KEY,
    },
    schema: {
      type: "object",
      properties: {
        fullName: { type: ["string", "null"] },
        idNumber: { type: ["string", "null"] },
      },
      required: ["fullName", "idNumber"],
    },
  },
});

Example (Gemini):

const ocr = new SmartOCR({
  structuredOutputOptions: {
    ai: {
      provider: "gemini",
      model: "gemini-2.5-flash-lite",
      apiKey: process.env.GOOGLE_API_KEY,
    },
    schema: {
      type: "object",
      properties: {
        fullName: { type: ["string", "null"] },
        idNumber: { type: ["string", "null"] },
      },
      required: ["fullName", "idNumber"],
    },
  },
});

Notes for AI mode:

  • apiKey is required for all providers
  • prompt overrides the default extraction instruction
  • schema should be a JSON schema describing the object you want back
  • for OpenAI strict mode, required must list every key in properties
  • Gemini schemas are automatically normalized: array type values (e.g. ["string", "null"]) are converted to nullable: true, and unsupported fields like additionalProperties are stripped
  • when AI mode is enabled, the raw OCR text is not returned by these methods

Reference

new SmartOCR(options?)

Creates an OCR processor.

Options:

  • language: Tesseract language or language list. Default: "eng"
  • pdfRenderScale: render scale used before OCR on scanned PDF pages. Default: 2
  • workerOptions: options passed to the Tesseract worker, such as langPath, cachePath, or logger
  • workerCount: Number of OCR workers to run in parallel.
  • structuredOutputOptions: optional AI configuration for returning structured JSON instead of plain text

Language codes use Tesseract traineddata identifiers, not 2-letter locale codes. For example:

  • "eng" for English
  • "spa" for Spanish
  • "fra" for French
  • ["eng", "spa"] for multilingual OCR

Use "eng", not "en".

structuredOutputOptions shape:

  • ai.provider: AI provider name. One of "openai", "anthropic", or "gemini"
  • ai.model: model name to call for structured extraction
  • ai.apiKey: API key for the chosen provider
  • ai.prompt: optional custom extraction prompt
  • schema: JSON schema describing the expected response object. Gemini schemas are automatically normalized from JSON Schema to Gemini's OpenAPI 3.0 subset.

processFile(filePath)

Routes a supported file to the correct handler based on file extension.

Returns:

  • extracted text by default
  • structured JSON when structuredOutputOptions.ai is configured

Supported extensions:

  • .pdf
  • .png
  • .jpg
  • .jpeg
  • .tif
  • .tiff
  • .bmp
  • .webp
  • .gif

processPDF(pdfPath)

Extracts text from a PDF. Text-native pages are read directly. Scanned pages are rendered to images and OCRed.

The OCR language only affects scanned/image-only pages. If a PDF page already contains selectable text, Smart OCR returns that embedded text directly instead of re-OCRing it.

Returns:

  • extracted text by default
  • structured JSON when structuredOutputOptions.ai is configured

processImage(imagePath)

Runs OCR on an image file.

Returns:

  • extracted text by default
  • structured JSON when structuredOutputOptions.ai is configured

init(language?)

Eagerly initializes the Tesseract worker. This is optional because processing methods initialize on demand.

If you pass a language to init(language), Smart OCR keeps using that language for later OCR calls until you switch it again or create a new instance.

terminate()

Terminates the Tesseract worker and frees resources.

Notes

  • Smart OCR is optimized for Node.js workloads, not browser runtimes.
  • Rendering uses @napi-rs/canvas, which avoids the extra Cairo system setup required by canvas.
  • Scanned PDFs are preprocessed before OCR so sparse content, such as ID cards on large blank pages, is easier to detect.
  • Structured output is an optional post-processing step on top of OCR, not a replacement for OCR itself.
  • AI mode supports OpenAI, Anthropic, and Gemini.
  • OCR quality still depends on the source document quality, scan resolution, and language data.

Development

npm run typecheck
npm run lint
npm test
npm run build
npm run sample

npm run sample builds the library and runs it against the bundled sample files in src/.

License

MIT

Support

Buy Me a Coffee

  • ₿ Bitcoin: bc1qnxq2avwcw3z0zmzphwhegdaclpsarf6p7htmud
  • ⟠ ETH / USDT / USDC (ERC-20): 0x4bC31D1569a68d56CA74DE5e993c4007CAffaaEf