brocconi

v1.0.10

Published

a year ago

CLI for OCRing PDFs using AI platforms.

0High
0Medium
0Low

ragaeeb

nodejs ocr cli bun

🤖 Brocconi: AI-Powered PDF OCR CLI 📚

GitHub License GitHub Release Types CodeRabbit Pull Request Reviews Maintenance npm npm

A CLI for OCRing images using Gemini AI with ocr.space as a fallback. ✨

🛠️ Installation

Get started with Brocconi in just a few steps:

Clone the Repository:

git clone [email protected]:ragaeeb/brocconi.git
cd brocconi

Install Dependencies:
```
bun install
```
Build the project:
```
bun run build
```

Installation

# Clone the repository
git clone https://github.com/ragaeeb/brocconi.git
cd brocconi

# Install dependencies
bun install

# Link the command globally (optional)

brocconi [options] <directory> [<directory2> ...]

Prerequisites

This library uses pdftoppm to convert the PDF to images which can be used for OCR. Ensure you have pdftoppm installed.

You can download it using homebrew:

brew install poppler
pdftoppm -v

🚀 Usage

Set API keys

In order to make calls to the Gemini API, you need to have your API keys set. Get your API keys from Google AI Studio. Then you can set it like this:

bunx brocconi -k "GEMINI_API_KEY"

To work around rate-limiting, you can also set multiple API keys:

bunx brocconi -k "GEMINI_API_KEY1 GEMINI_API_KEY2 GEMINI_API_KEY3"

At runtime, the app will pick a random one.

Set ocr.space API key

Sometimes Gemini fails to OCR the image. The app will retry with different models, but if it cannot succeed with any of them, it can fall back to using a different platform like ocr.space. If you want this fallback, you can get a ocr.space key. Then set the key like this:

bunx brocconi -b "OCRSPACEKEY"

OCR a PDF

bunx brocconi /path/to/file.pdf

This will process the PDF and output the results to /path/to/file.json.

Specify output file

bunx brocconi /path/to/file.pdf -o ./outputFile.json

Extract footnotes

This will do a best-effort to identify footnotes separate from the paragraph body text and include the footnote text in a footnotes property per page.

bunx brocconi /path/to/file.pdf -f

Include Volume Number

If you have a multi-volume book, you can include the part number like this.

bunx brocconi /path/to/file.pdf -p 3

This will add part: 3 for each page.

Delete all uploads before starting

In case of errors, you might want to do a cleanup of all the previously uploaded files. You can run the reset command like this:

bunx brocconi /path/to/file.pdf -r

This will first delete all the files in your Gemini AI Studio, then start OCR. Be careful with this command, it deletes ALL the files in your Gemini AI Studio. Use cautiously! The author of this package is NOT responsible for you accidentally erasing your data.

Method

brocconi works by turning the PDF into images, filtering out blank pages so we don't waste API calls, then giving Gemini an image with an expected output how the OCR results should behave (this is called the "training image"), then giving the actual page to OCR. This allows fine-tuning and improving accuracy of how to format the text back.

✨ Features

🤖 AI-Powered OCR: Leverages advanced AI models for accurate text extraction.
📄 PDF to Text: Converts PDFs into structured JSON output.
✂️ Footer Isolation: Isolates footnotes for cleaner main text.
🔑 API Key Management: Easily configure and manage multiple API keys.
⚙️ Configurable: Customize OCR behavior with various command-line flags.

🛠️ Technologies Used

🤝 Contributing

Want to help improve Brocconi? Here's how:

🐛 Report Bugs: Submit detailed bug reports to help us squash those pesky critters.
✨ Suggest Enhancements: Share your ideas for new features and improvements.
💻 Submit Pull Requests: Contribute code fixes and new features (see guidelines below).

Contribution Guidelines

Fork the repository.
Create a new branch for your feature or bug fix.
Write clear, maintainable code.
Include tests for your changes.
Submit a pull request with a detailed description of your changes.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Author Info

Ragaeeb Haq
https://github.com/ragaeeb