npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

unocr

v0.0.3

Published

Unified OCR library with multi-driver support for Tesseract.js and AI models, providing structured text extraction using hast-based output format

Downloads

193

Readme

unocr 🦜

npm version npm downloads npm license Contributor Covenant

Unified OCR library with multi-driver support for Tesseract.js and AI vision models, providing structured text extraction using hast-based output format.

✨ Features

  • 🔍 Multi-Engine Support: Tesseract.js and AI vision drivers with unified interface
  • 📝 Structured Output: Hast-based OCR results for rich document structure
  • 🌐 Universal Input: Support for various image formats via undio integration
  • ⚡️ High Performance: Parallel processing with scheduler support
  • 🔄 Batch Processing: Efficient batch OCR operations with configurable parallelism
  • 🛡️ TypeScript: Full TypeScript support with comprehensive type definitions
  • 🎯 Driver Architecture: Extensible driver system for easy engine integration
  • 🤖 AI-Powered: Advanced AI vision models for enhanced OCR accuracy
  • 🔧 Flexible AI Configuration: Customizable system prompts and model parameters
  • 📊 Rich Metadata: Comprehensive processing metadata and engine information
  • 🔧 Flexible Options: Customizable OCR options for different use cases

📥 Installation

# Using npm
npm install unocr

# Using yarn
yarn add unocr

# Using pnpm
pnpm add unocr

🚀 Basic Usage

Tesseract.js Driver

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";

// Create OCR manager with Tesseract driver
const ocr = createOCRManager({
  driver: tesseractDriver({
    langs: ["eng", "chi_sim"], // English and Chinese
    logger: (m) => console.log(m), // Progress logging
  }),
});

// Single image OCR
const result = await ocr.recognize(imageBuffer);
console.log(result); // hast Root object

// Batch OCR with parallel processing
const results = await ocr.recognizes(imageArray, { parallel: 4 });
console.log(results); // Array of hast Root objects

// Clean up
await ocr.dispose();

AI Vision Driver

import { createOCRManager } from "unocr";
import aiDriver from "unocr/drivers/ai";
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";

// Create AI model client
const openai = createOpenAICompatible({
  name: "openai",
  baseURL:
    process.env.OPENAI_COMPATIBLE_URL ||
    "https://dashscope.aliyuncs.com/compatible-mode/v1",
  apiKey: process.env.OPENAI_API_KEY,
});

const modelName = process.env.AI_MODEL || "qwen3-vl-flash";

// Create OCR manager with AI driver
const ocr = createOCRManager({
  driver: aiDriver({
    model: openai(modelName),
    system:
      "Extract all text from this image and return it as HTML. Use appropriate tags like h1-h6 for headings, p for paragraphs, and ul/ol for lists.",
  }),
});

// Single image OCR with AI
const result = await ocr.recognize(imageBuffer);
console.log(result); // hast Root object

// Batch OCR with AI
const results = await ocr.recognizes(imageArray, { parallel: 2 });
console.log(results); // Array of hast Root objects

// Clean up
await ocr.dispose();

🔧 Advanced Usage

🎯 Custom Driver Configuration

Tesseract.js Advanced Configuration

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";

// Advanced Tesseract configuration
const ocr = createOCRManager({
  driver: tesseractDriver({
    langs: ["eng", "fra", "deu"],
    oem: 1, // LSTM only
    corePath: "./tesseract-core",
    langPath: "./lang-data",
    cacheMethod: "write",
    logger: (progress) => {
      if (progress.status === "recognizing text") {
        console.log(`Progress: ${progress.progress * 100}%`);
      }
    },
  }),
});

const result = await ocr.recognize(image);

AI Vision Custom Configuration

import { createOCRManager } from "unocr";
import aiDriver from "unocr/drivers/ai";
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";

const openai = createOpenAICompatible({
  name: "openai",
  baseURL:
    process.env.OPENAI_COMPATIBLE_URL ||
    "https://dashscope.aliyuncs.com/compatible-mode/v1",
  apiKey: process.env.OPENAI_API_KEY,
});

const modelName = process.env.AI_MODEL || "qwen3-vl-flash";

// Advanced AI configuration
const ocr = createOCRManager({
  driver: aiDriver({
    model: openai(modelName),
    system:
      "You are an expert OCR system. Extract all visible text with precise formatting and return it as structured HTML.",
    temperature: 0.1,
    maxOutputTokens: 4000,
    maxRetries: 3,
  }),
});

const result = await ocr.recognize(image);

📊 Batch Processing with Custom Parallelism

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";

const ocr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }),
});

// Process many images efficiently
const images = [image1, image2, image3, image4, image5];

// Use 2 workers for lower resource usage
const results = await ocr.recognizes(images, { parallel: 2 });

// Use maximum parallelism (up to image count)
const maxResults = await ocr.recognizes(images, { parallel: images.length });

await ocr.dispose();

🌐 Input Format Support

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";

const ocr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }),
});

// Various input formats supported via undio
const imageInputs = [
  "https://example.com/image.jpg", // URL (string)
  ArrayBuffer, // ArrayBufferLike
  Uint8Array.from([]), // Uint8Array
  "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA", // Base64 data URL
  new Blob(), // Blob
  new ReadableStream(), // ReadableStream
  new Response(), // Response
];

const results = await ocr.recognizes(imageInputs);
await ocr.dispose();

🔍 Working with Hast Output

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";
import { toHtml } from "hast-util-to-html";

const ocr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }),
});

const result = await ocr.recognize(image);

// Convert hast to HTML
const html = toHtml(result);
console.log(html);
// <div class="ocr_page">...</div>

// Extract text content
function extractText(node): string {
  if (node.type === "text") {
    return node.value;
  }
  if (node.children) {
    return node.children.map(extractText).join("");
  }
  return "";
}

const text = extractText(result);
console.log(text);
// "Extracted text content"

await ocr.dispose();

📚 API Reference

🔧 Manager Creation

createOCRManager(options: OCRManagerOptions)

Create an OCR manager with unified API for text recognition.

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";

const ocr = createOCRManager({
  driver: tesseractDriver({
    langs: "eng",
    logger: console.log,
  }),
});

const result = await ocr.recognize(image);
const results = await ocr.recognizes(images, { parallel: 4 });
await ocr.dispose();

🖼️ Input and Output Types

OCRInput

Universal input type supporting various image formats via undio integration:

  • string - URLs or base64 data URLs
  • ArrayBufferLike - ArrayBuffer and similar types
  • Uint8Array - Typed array data
  • Blob - File/Blob objects
  • ReadableStream - Stream data
  • Response - Fetch API Response objects

OCRResult

Structured OCR output in hast format for rich document structure.

🚗 Available Drivers

tesseractDriver(options?: TesseractOptions)

Create a Tesseract.js-based OCR driver with advanced configuration options.

aiDriver(options?: AIDriverOptions)

Create an AI vision-based OCR driver with support for multimodal AI models like GPT-4 Vision.

AI Driver Options:

  • model - AI model instance (from @ai-sdk/openai-compatible or other AI SDK providers)
  • system - System prompt for text extraction instructions
  • temperature - Response randomness (0-1)
  • maxOutputTokens - Maximum tokens in AI response
  • maxRetries - Number of retry attempts on failure
  • Additional AI SDK parameters supported

⚡ Performance

📊 Benchmarks

  • 🚀 Multi-Engine Support: Choose between traditional OCR and AI vision models
  • ⚡️ Parallel Processing: Configurable parallel worker execution
  • 📦 Efficient Memory: Worker reuse and proper cleanup
  • 🔄 Batch Operations: Optimized batch processing with scheduler
  • 🤖 AI Accuracy: Enhanced text recognition with AI vision models
  • 🔧 Flexible Processing: Local Tesseract for speed, AI for complex layouts

🎯 Performance Tips

// Reuse OCR manager for multiple operations
const ocr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }),
});

// Batch process when possible
const results = await ocr.recognizes(images, { parallel: 4 });

// Configure appropriate parallelism based on hardware
const cpuCount = navigator.hardwareConcurrency || 4;
const results = await ocr.recognizes(images, { parallel: cpuCount });

// Choose driver based on use case
const fastOcr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }), // Fast for simple documents
});

const accurateOcr = createOCRManager({
  driver: aiDriver({ model: openai(modelName) }), // Better for complex layouts
});

await ocr.dispose();

🔧 Configuration

Batch Processing

Configure parallel processing for batch operations:

const ocr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }),
});

// Process with 2 workers (conservative)
await ocr.recognizes(images, { parallel: 2 });

// Process with 8 workers (high performance)
await ocr.recognizes(images, { parallel: 8 });

🌐 Ecosystem Integration

📝 Hast Processing

import { createOCRManager } from "unocr";
import tesseractDriver from "unocr/drivers/tesseract";
import aiDriver from "unocr/drivers/ai";
import { toHtml } from "hast-util-to-html";
import { toText } from "hast-util-to-text";
import { rehype } from "rehype";
import { unified } from "unified";

// Works with both Tesseract and AI drivers
const tesseractOcr = createOCRManager({
  driver: tesseractDriver({ langs: "eng" }),
});

const aiOcr = createOCRManager({
  driver: aiDriver({ model: openai(modelName) }),
});

const processor = rehype();

// Process with Tesseract
const tesseractResult = await tesseractOcr.recognize(image);
const tesseractHtml = toHtml(tesseractResult);

// Process with AI
const aiResult = await aiOcr.recognize(image);
const aiHtml = toHtml(aiResult);

// Both outputs are compatible hast format
const text = await processor.process(aiResult);
const processed = unified().use(myPlugin).processSync(tesseractResult);

🔗 Related

📄 License

MIT © By Sages