pdf-read
v1.0.0
Published
Agent-friendly CLI for PDF text extraction with OCR fallback
Maintainers
Readme
pdf-read
An agent-friendly, agent-first CLI for reading PDFs fast. Designed for programmatic consumption by AI agents with reliable OCR fallback.
Features
- Native Extraction: Uses
pdftotextfor blazing fast text recovery from text-based PDFs. - OCR Fallback: Automatically triggers Tesseract OCR if text density is low (e.g., scanned documents).
- Deep Mode: Force OCR extraction with
--deepfor 100% text recovery. - Agent-First Output: JSON by default, with rich metadata and semantic exit codes.
- Page Selection: Supports complex ranges like
1-5, 8, 11-13. - Help-as-Data:
--help-jsonfor automated tool discovery.
Installation
npm install -g pdf-readRequires poppler-utils (for pdftotext and pdftoppm).
Usage
Basic Extraction (JSON)
pdf-read read document.pdfPlain Text Output
pdf-read read document.pdf --textSpecific Pages
pdf-read read document.pdf --pages 1,3-5Force OCR (Deep Mode)
pdf-read read document.pdf --deep --verboseAgent Discovery
pdf-read read --help-jsonExit Codes
0: Success80: Invalid argument91: File not found92: Not a PDF file93: No text extracted106: OCR failed110: Internal error
License
MIT
