@pandi2352/gemini-ocr

v4.0.0

Published

22 days ago

A lightweight OCR processing wrapper using Google Gemini Vision models.

0High
0Medium
0Low

pandi2352

ocr gemini google ai pdf image vision

🔮 Gemini OCR (@pandi2352/gemini-ocr)

The Next-Gen Document Intelligence Wrapper

⚡ Why Gemini OCR?

Traditional OCR (Tesseract, AWS Textract) gives you just text. Gemini OCR gives you understanding.

📚 Step-by-Step Usage Guide

1. Prerequisites

You need a Google Gemini API Key. Get your API Key here

2. Installation

Install the package in your Node.js project:

npm install @pandi2352/gemini-ocr

3. Basic Usage (Text Extraction)

Create a file (e.g., index.ts) and add the following. This works for locally stored files or URLs.

import { processOCR } from '@pandi2352/gemini-ocr';

async function main() {
  const results = await processOCR({
    // Input can be a single file string or an array
    input: ['./my-document.pdf'], 
    apiKey: process.env.GEMINI_API_KEY
  });

  console.log(results[0].extractedText);
}

main();

4. Batch Processing (Multiple Files)

Pass an array of file paths or URLs. They are processed in parallel.

const results = await processOCR({
  input: [
    './invoice_january.pdf',
    'https://example.com/receipt.jpg',
    './meeting_notes.docx'
  ],
  apiKey: process.env.GEMINI_API_KEY,
  summarize: true // Optional: Get summaries for all
});

results.forEach((doc, index) => {
  if (doc.status === 'success') {
    console.log(`File ${index + 1}: ${doc.summary}`);
  }
});

5. Advanced Intelligence (Mindmaps & Entities)

Unlock the full power of AI by enabling specific flags.

const [result] = await processOCR({
  input: ['./complex_contract.pdf'],
  apiKey: process.env.GEMINI_API_KEY,
  
  // Enable Advanced Features
  mindmap: true,        // Generates Mermaid.js visualization
  extractEntities: true, // Extracts JSON data
  entitySchema: ['Contract Value', 'Start Date', 'Parties Involved'] // Optional custom fields
});

// 1. Get the Mindmap
console.log('Mindmap Code:', result.mindmap);

// 2. Get Structured Data
console.log('Extracted Data:', result.entityResult);
/* Output:
{
  "contract_value": "$50,000",
  "start_date": "2024-01-01",
  "parties_involved": "Company A, Vendor B"
}
*/


### 6. Realtime Progress Feedback
Get granular updates on the processing stages.

```typescript
await processOCR({
  input: ['./large_document.pdf'],
  apiKey: process.env.GEMINI_API_KEY,
  
  onProgress: (stage, message) => {
    // stage: 'upload' | 'generate_text' | 'enrich' | 'complete'
    console.log(`[${stage}]: ${message}`);
  }
});

🛠️ Configuration Options

🤝 Contributing

We love contributions! Please feel free to submit a Pull Request.

Fork it
Create your feature branch (git checkout -b feature/cool-feature)
Commit your changes
Push to the branch
Open a Pull Request