brocconi
v1.0.10
Published
CLI for OCRing PDFs using AI platforms.
Readme
🤖 Brocconi: AI-Powered PDF OCR CLI 📚
A CLI for OCRing images using Gemini AI with ocr.space as a fallback. ✨
🛠️ Installation
Get started with Brocconi in just a few steps:
Clone the Repository:
git clone [email protected]:ragaeeb/brocconi.git cd brocconiInstall Dependencies:
bun installBuild the project:
bun run build
Installation
# Clone the repository
git clone https://github.com/ragaeeb/brocconi.git
cd brocconi
# Install dependencies
bun install
# Link the command globally (optional)brocconi [options] <directory> [<directory2> ...]Prerequisites
This library uses pdftoppm to convert the PDF to images which can be used for OCR. Ensure you have pdftoppm installed.
You can download it using homebrew:
brew install poppler
pdftoppm -v🚀 Usage
Set API keys
In order to make calls to the Gemini API, you need to have your API keys set. Get your API keys from Google AI Studio. Then you can set it like this:
bunx brocconi -k "GEMINI_API_KEY"To work around rate-limiting, you can also set multiple API keys:
bunx brocconi -k "GEMINI_API_KEY1 GEMINI_API_KEY2 GEMINI_API_KEY3"At runtime, the app will pick a random one.
Set ocr.space API key
Sometimes Gemini fails to OCR the image. The app will retry with different models, but if it cannot succeed with any of them, it can fall back to using a different platform like ocr.space. If you want this fallback, you can get a ocr.space key. Then set the key like this:
bunx brocconi -b "OCRSPACEKEY"OCR a PDF
bunx brocconi /path/to/file.pdfThis will process the PDF and output the results to /path/to/file.json.
Specify output file
bunx brocconi /path/to/file.pdf -o ./outputFile.jsonExtract footnotes
This will do a best-effort to identify footnotes separate from the paragraph body text and include the footnote text in a footnotes property per page.
bunx brocconi /path/to/file.pdf -fInclude Volume Number
If you have a multi-volume book, you can include the part number like this.
bunx brocconi /path/to/file.pdf -p 3This will add part: 3 for each page.
Delete all uploads before starting
In case of errors, you might want to do a cleanup of all the previously uploaded files. You can run the reset command like this:
bunx brocconi /path/to/file.pdf -rThis will first delete all the files in your Gemini AI Studio, then start OCR. Be careful with this command, it deletes ALL the files in your Gemini AI Studio. Use cautiously! The author of this package is NOT responsible for you accidentally erasing your data.
Method
brocconi works by turning the PDF into images, filtering out blank pages so we don't waste API calls, then giving Gemini an image with an expected output how the OCR results should behave (this is called the "training image"), then giving the actual page to OCR. This allows fine-tuning and improving accuracy of how to format the text back.
✨ Features
- 🤖 AI-Powered OCR: Leverages advanced AI models for accurate text extraction.
- 📄 PDF to Text: Converts PDFs into structured JSON output.
- ✂️ Footer Isolation: Isolates footnotes for cleaner main text.
- 🔑 API Key Management: Easily configure and manage multiple API keys.
- ⚙️ Configurable: Customize OCR behavior with various command-line flags.
🛠️ Technologies Used
| Technology | Link | | :---------------------------------------- | :--------------------------------------------------------------------------------------- | | Node.js | https://nodejs.org/ | | Bun | https://bun.sh/ | | Google Gemini API | https://ai.google.dev/ | | OCR Space API | https://ocr.space/ | | catsa-janga (for progress saving) | https://www.npmjs.com/package/catsa-janga | | sharp (for image processing) | https://sharp.pixelplumbing.com/ | | semantic-release (for release automation) | https://semantic-release.org/ | | eslint/prettier | https://eslint.org/, https://prettier.io/ |
🤝 Contributing
Want to help improve Brocconi? Here's how:
- 🐛 Report Bugs: Submit detailed bug reports to help us squash those pesky critters.
- ✨ Suggest Enhancements: Share your ideas for new features and improvements.
- 💻 Submit Pull Requests: Contribute code fixes and new features (see guidelines below).
Contribution Guidelines
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Write clear, maintainable code.
- Include tests for your changes.
- Submit a pull request with a detailed description of your changes.
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
Author Info
- Ragaeeb Haq
- https://github.com/ragaeeb
