npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ocr-ai

v1.0.4

Published

Multi-provider AI document extraction library - Extract text or structured JSON from documents using Gemini, OpenAI, Grok, or Claude

Downloads

604

Readme

ocr-ai

Multi-provider AI document extraction for Node.js. Extract text or structured JSON from documents using Gemini, OpenAI, Claude, Grok, or Vertex AI.

Installation

npm install ocr-ai

Quick Start

Using Gemini

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_GEMINI_API_KEY',
});

const result = await ocr.extract('./invoice.png');

if (result.success) {
  const text = result.content;
  console.log(text);
}

Using OpenAI

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_OPENAI_API_KEY',
});

const result = await ocr.extract('./document.pdf');

if (result.success) {
  const text = result.content;
  console.log(text);
}

Custom Model

You can specify a custom model for any provider:

const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_GEMINI_API_KEY',
  model: 'gemini-2.0-flash', // Use a specific model
});

// Or with OpenAI
const ocrOpenAI = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_OPENAI_API_KEY',
  model: 'gpt-4o-mini', // Use a different model
});

From URL

Extract directly from a URL:

const result = await ocr.extract('https://example.com/invoice.png');

if (result.success) {
  console.log(result.content);
}

Custom Instructions

You can provide custom instructions to guide the extraction:

const result = await ocr.extract('./receipt.png', {
  prompt: 'Extract only the total amount and date from this receipt',
});

if (result.success) {
  console.log(result.content);
  // Output: "Total: $154.06, Date: 11/02/2019"
}

Output Format

By default, extraction returns text. You can also extract structured JSON:

// Text output (default)
const textResult = await ocr.extract('./invoice.png', {
  format: 'text',
});

if (textResult.success) {
  console.log(textResult.content); // string
}

// JSON output with schema
const jsonResult = await ocr.extract('./invoice.png', {
  format: 'json',
  schema: {
    invoice_number: 'string',
    date: 'string',
    total: 'number',
    items: [{ name: 'string', quantity: 'number', price: 'number' }],
  },
});

if (jsonResult.success) {
  console.log(jsonResult.data); // { invoice_number: "US-001", date: "11/02/2019", total: 154.06, items: [...] }
}

JSON Schema

The schema defines the structure of the data you want to extract. Use a simple object where keys are field names and values are types:

Basic types:

  • 'string' - Text values
  • 'number' - Numeric values
  • 'boolean' - True/false values

Nested objects:

const schema = {
  company: {
    name: 'string',
    address: 'string',
    phone: 'string',
  },
  customer: {
    name: 'string',
    email: 'string',
  },
};

Arrays:

const schema = {
  // Array of objects
  items: [
    {
      description: 'string',
      quantity: 'number',
      unit_price: 'number',
      total: 'number',
    },
  ],
  // Simple array
  tags: ['string'],
};

Complete example (invoice):

const invoiceSchema = {
  invoice_number: 'string',
  date: 'string',
  due_date: 'string',
  company: {
    name: 'string',
    address: 'string',
    phone: 'string',
    email: 'string',
  },
  bill_to: {
    name: 'string',
    address: 'string',
  },
  items: [
    {
      description: 'string',
      quantity: 'number',
      unit_price: 'number',
      total: 'number',
    },
  ],
  subtotal: 'number',
  tax: 'number',
  total: 'number',
};

const result = await ocr.extract('./invoice.png', {
  format: 'json',
  schema: invoiceSchema,
  prompt: 'Extract all invoice data from this document.',
});

Model Configuration

You can pass model-specific parameters like temperature, max tokens, and more:

// Gemini with model config
const result = await ocr.extract('./invoice.png', {
  modelConfig: {
    temperature: 0.2,
    maxTokens: 4096,
    topP: 0.8,
    topK: 40,
  },
});

// OpenAI with model config
const result = await ocr.extract('./invoice.png', {
  modelConfig: {
    temperature: 0,
    maxTokens: 2048,
    topP: 1,
  },
});

Available options:

| Option | Description | Supported Providers | |--------|-------------|---------------------| | temperature | Controls randomness (0.0-1.0+) | All | | maxTokens | Maximum tokens to generate | All | | topP | Nucleus sampling | All | | topK | Top-k sampling | Gemini, Claude, Vertex | | stopSequences | Stop generation at these strings | All |

Token Usage

Access token usage information from the metadata:

const result = await ocr.extract('./invoice.png');

if (result.success) {
  console.log(result.content);

  // Access metadata
  console.log(result.metadata.processingTimeMs); // 2351
  console.log(result.metadata.tokens?.inputTokens); // 1855
  console.log(result.metadata.tokens?.outputTokens); // 260
  console.log(result.metadata.tokens?.totalTokens); // 2115
}

Supported Providers

| Provider | Default Model | Auth | |----------|---------------|------| | gemini | gemini-1.5-flash | API Key | | openai | gpt-4o | API Key | | claude | claude-sonnet-4-20250514 | API Key | | grok | grok-2-vision-1212 | API Key | | vertex | gemini-2.0-flash | Google Cloud |

Note: For enterprise OCR needs, see Advanced: Vertex AI section below.

Supported Inputs

  • Local files: ./invoice.png, ./document.pdf
  • URLs: https://example.com/invoice.png

Supported Files

  • Images: jpg, png, gif, webp
  • Documents: pdf
  • Text: txt, md, csv, json, xml, html

Advanced: Vertex AI (Google Cloud)

The vertex provider enables access to Google Cloud's AI infrastructure, which is useful for enterprise scenarios requiring:

  • Compliance: Data residency and regulatory requirements
  • Integration: Native integration with Google Cloud services (BigQuery, Cloud Storage, etc.)
  • Specialized OCR: Access to Google's Document AI and Vision AI processors

Basic Setup

Vertex AI uses Google Cloud authentication instead of API keys:

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'vertex',
  vertexConfig: {
    project: 'your-gcp-project-id',
    location: 'us-central1',
  },
});

const result = await ocr.extract('./invoice.png');

Requirements:

  1. Install the gcloud CLI
  2. Run gcloud auth application-default login
  3. Enable the Vertex AI API in your GCP project

When to Use Vertex AI vs Gemini API

| Scenario | Recommended | |----------|-------------| | Quick prototyping | Gemini (API Key) | | Personal projects | Gemini (API Key) | | Enterprise/production | Vertex AI | | Data residency requirements | Vertex AI | | High-volume processing | Vertex AI |

Related Google Cloud OCR Services

For specialized document processing beyond what Gemini models offer, Google Cloud provides dedicated OCR services:

Document AI - Optimized for structured documents:

  • Invoice Parser, Receipt Parser, Form Parser
  • W2, 1040, Bank Statement processors
  • Custom extractors for domain-specific documents
  • Higher accuracy for tables, forms, and handwritten text

Vision API - Optimized for images:

  • Real-time OCR with low latency
  • 80+ language support
  • Handwriting detection
  • Simple integration, ~98% accuracy on clean documents

These services are separate from ocr-ai but can complement it for enterprise document pipelines.

Gemini Model Benchmarks

Performance benchmarks for Gemini models extracting data from an invoice image:

| Model | Text Extraction | JSON Extraction | Best For | |-------|-----------------|-----------------|----------| | gemini-2.0-flash-lite | 2.8s | 2.1s | High-volume processing, cost optimization | | gemini-2.5-flash-lite | 2.2s | 1.9s | Fastest option, simple documents | | gemini-2.0-flash | 3.9s | 2.9s | General purpose, good balance | | gemini-2.5-flash | 5.0s | 5.0s | Standard documents, reliable | | gemini-3-flash-preview | 12.3s | 10.6s | Complex layouts, newer capabilities | | gemini-3-pro-image-preview | 8.0s | 11.9s | Image-heavy documents | | gemini-2.5-pro | 12.6s | 5.5s | High accuracy, complex documents | | gemini-3-pro-preview | 24.8s | 13.1s | Maximum accuracy, handwritten text |

Model Recommendations

For digital documents (invoices, receipts, forms):

// Fast and cost-effective
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-2.5-flash-lite', // ~2s response time
});

For complex documents or when accuracy is critical:

// Higher accuracy, slower processing
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-2.5-pro', // Best accuracy/speed ratio
});

For handwritten documents or poor quality scans:

// Maximum accuracy for difficult documents
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-3-pro-preview', // Best for handwriting
});

Quick Reference

| Use Case | Recommended Model | |----------|-------------------| | High-volume batch processing | gemini-2.5-flash-lite | | Standard invoices/receipts | gemini-2.0-flash | | Complex tables and layouts | gemini-2.5-pro | | Handwritten documents | gemini-3-pro-preview | | Poor quality scans | gemini-3-pro-preview | | Real-time applications | gemini-2.5-flash-lite |

OpenAI Model Benchmarks

Performance benchmarks for OpenAI models extracting data from an invoice image:

| Model | Text Extraction | JSON Extraction | Best For | |-------|-----------------|-----------------|----------| | gpt-4.1-nano | 4.4s | 2.4s | Fastest, cost-effective | | gpt-4.1-mini | 4.8s | 3.2s | Good balance speed/accuracy | | gpt-4.1 | 8.2s | 5.4s | High accuracy, reliable | | gpt-4o-mini | 7.2s | 5.7s | Budget-friendly | | gpt-4o | 12.3s | 10.7s | Standard high accuracy | | gpt-5.2 | 6.4s | 5.0s | Latest generation | | gpt-5-mini | 12.2s | 7.9s | GPT-5 balanced option | | gpt-5-nano | 19.9s | 16.1s | GPT-5 economy tier |

Note: gpt-5.2-pro and gpt-image-1 use different API endpoints and are not currently supported.

Model Recommendations

For digital documents (invoices, receipts, forms):

// Fast and cost-effective
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-4.1-nano', // ~2-4s response time
});

For complex documents or when accuracy is critical:

// Higher accuracy, reliable extraction
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-4.1', // Best accuracy/speed ratio
});

For handwritten documents or poor quality scans:

// Maximum accuracy for difficult documents
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-5.2', // Latest generation, best accuracy
});

Quick Reference

| Use Case | Recommended Model | |----------|-------------------| | High-volume batch processing | gpt-4.1-nano | | Standard invoices/receipts | gpt-4.1-mini | | Complex tables and layouts | gpt-4.1 | | Handwritten documents | gpt-5.2 | | Poor quality scans | gpt-5.2 | | Real-time applications | gpt-4.1-nano | | Budget-conscious projects | gpt-4o-mini |

Promise API

For users who prefer callbacks or need more control over async operations, ocr-ai provides an alternative OcrAIPromise class with additional features.

Basic Usage with Callbacks

import { OcrAIPromise } from 'ocr-ai';

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Using callback style
ocr.extract('./invoice.png', {}, (error, result) => {
  if (error) {
    console.error('Extraction failed:', error.message);
    return;
  }
  console.log('Extracted:', result.content);
});

Using .then()/.catch()

import { OcrAIPromise } from 'ocr-ai';

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Promise chain style
ocr.extract('./invoice.png')
  .then((result) => {
    if (result.success) {
      console.log('Content:', result.content);
    }
  })
  .catch((error) => {
    console.error('Error:', error);
  });

Extract Multiple Files in Parallel

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Extract many files at once
const results = await ocr.extractMany([
  './invoice1.png',
  './invoice2.png',
  './invoice3.png',
]);

results.forEach((result, index) => {
  if (result.success) {
    console.log(`File ${index + 1}:`, result.content);
  }
});

Batch Extraction with Individual Options

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Each file with its own options
const results = await ocr.extractBatch([
  { source: './invoice.png', options: { format: 'json', schema: invoiceSchema } },
  { source: './receipt.png', options: { format: 'text' } },
  { source: './contract.pdf', options: { prompt: 'Extract key dates and amounts' } },
]);

Automatic Retry on Failure

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Retry up to 3 times with 1 second delay between attempts
const result = await ocr.extractWithRetry(
  './invoice.png',
  { format: 'json', schema: invoiceSchema },
  3,    // retries
  1000  // delay in ms
);

if (result.success) {
  console.log('Extracted after retries:', result.data);
}

Access Underlying OcrAI Instance

const ocrPromise = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Get the underlying OcrAI instance for direct access
const ocr = ocrPromise.getOcrAI();

// Use standard async/await if needed
const result = await ocr.extract('./invoice.png');

License

MIT