ollama-starter-document-ocr
v0.0.1
Published
OCR documents using Ollama
Readme
Ollama Starter Document OCR
Simple CLI for extracting text from images and PDFs using Ollama.
Prerequisites
- Install and run Ollama
Usage
# Process explicit files
npx ollama-starter-document-ocr ./receipts/scan1.png ./receipts/scan2.jpg
# Process all images in a folder (non-recursive)
npx ollama-starter-document-ocr ./receipts
# Process a PDF (one .txt per page)
npx ollama-starter-document-ocr ./receipts/statement.pdf
# Override output directory
npx ollama-starter-document-ocr ./receipts --out-dir ./output
# Use a different model
npx ollama-starter-document-ocr ./receipts --model deepseek-ocrOutput
- Each image has a corresponding
.txtfile with the extracted text. - For PDFs: each page is rendered to an image and then processed
- A JSON file is written to the output directory with the full results of every image/page
- Some models will detect text bounding boxes and annotate the images with them
Environment
Use OLLAMA_HOST if your Ollama server is not on the default http://localhost:11434.
OLLAMA_HOST=http://localhost:11444 npx ollama-starter-document-ocr ./receipts