@pandi2352/gemini-ocr
v4.0.0
Published
A lightweight OCR processing wrapper using Google Gemini Vision models.
Maintainers
Readme
🔮 Gemini OCR (@pandi2352/gemini-ocr)
The Next-Gen Document Intelligence Wrapper
⚡ Why Gemini OCR?
Traditional OCR (Tesseract, AWS Textract) gives you just text. Gemini OCR gives you understanding.
| Feature | Description |
| :--- | :--- |
| 🧠 Deep Understanding | Don't just extract text—understand it. Get summaries, titles, and context. |
| 🗺️ Mindmaps | Auto-generate Mermaid.js mindmaps to visualize complex documents. |
| 🏎️ Batch Processing | Process standard arrays of files (['path', 'url']) in parallel. |
| 🎯 Entity Extraction | Extract specific fields (Dates, Names, IDs) into strict JSON. |
| 🌈 Multimodal | Works on PDFs, Images, Word Docs, Audio, and Video. |
📚 Step-by-Step Usage Guide
1. Prerequisites
You need a Google Gemini API Key. Get your API Key here
2. Installation
Install the package in your Node.js project:
npm install @pandi2352/gemini-ocr3. Basic Usage (Text Extraction)
Create a file (e.g., index.ts) and add the following. This works for locally stored files or URLs.
import { processOCR } from '@pandi2352/gemini-ocr';
async function main() {
const results = await processOCR({
// Input can be a single file string or an array
input: ['./my-document.pdf'],
apiKey: process.env.GEMINI_API_KEY
});
console.log(results[0].extractedText);
}
main();4. Batch Processing (Multiple Files)
Pass an array of file paths or URLs. They are processed in parallel.
const results = await processOCR({
input: [
'./invoice_january.pdf',
'https://example.com/receipt.jpg',
'./meeting_notes.docx'
],
apiKey: process.env.GEMINI_API_KEY,
summarize: true // Optional: Get summaries for all
});
results.forEach((doc, index) => {
if (doc.status === 'success') {
console.log(`File ${index + 1}: ${doc.summary}`);
}
});5. Advanced Intelligence (Mindmaps & Entities)
Unlock the full power of AI by enabling specific flags.
const [result] = await processOCR({
input: ['./complex_contract.pdf'],
apiKey: process.env.GEMINI_API_KEY,
// Enable Advanced Features
mindmap: true, // Generates Mermaid.js visualization
extractEntities: true, // Extracts JSON data
entitySchema: ['Contract Value', 'Start Date', 'Parties Involved'] // Optional custom fields
});
// 1. Get the Mindmap
console.log('Mindmap Code:', result.mindmap);
// 2. Get Structured Data
console.log('Extracted Data:', result.entityResult);
/* Output:
{
"contract_value": "$50,000",
"start_date": "2024-01-01",
"parties_involved": "Company A, Vendor B"
}
*/
### 6. Realtime Progress Feedback
Get granular updates on the processing stages.
```typescript
await processOCR({
input: ['./large_document.pdf'],
apiKey: process.env.GEMINI_API_KEY,
onProgress: (stage, message) => {
// stage: 'upload' | 'generate_text' | 'enrich' | 'complete'
console.log(`[${stage}]: ${message}`);
}
});🛠️ Configuration Options
| Option | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| input | Array<string \| Buffer \| Object> | Required | Array of file paths, URLs, Buffers, or Base64 strings. |
| apiKey | string | Required | Your Google Gemini API Key. |
| model | string | gemini-1.5-flash | The AI model (use gemini-1.5-flash-8b for speed). |
| summarize | boolean | false | Generate metadata (title, desc, thumbnail). |
| mindmap | boolean | false | Generate Mermaid.js syntax for visual mapping. |
| extractEntities| boolean | false | Enable structured field extraction. |
| entitySchema | string[] | auto | Custom fields to extract (optional). |
| onProgress | (stage, userMsg) => void | undefined | Callback for realtime progress updates. |
🤝 Contributing
We love contributions! Please feel free to submit a Pull Request.
- Fork it
- Create your feature branch (
git checkout -b feature/cool-feature) - Commit your changes
- Push to the branch
- Open a Pull Request
