npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

pdf-ocr-cli

v1.0.1

Published

A CLI tool for OCR processing of PDF files using Mistral API with optional LLM verification

Readme

PDF-OCR CLI Tool

codecov npm publish npm version License: ISC

Overview

A powerful TypeScript CLI tool that transforms scanned PDFs into searchable documents by:

  • Taking a PDF file input
  • Processing each page with Mistral API's OCR capabilities
  • Optionally verifying and improving text quality with Together.ai's free LLM
  • Reassembling everything into a searchable PDF

Perfect for digitizing paper documents, making image-based PDFs searchable, and extracting text from scanned materials.

Quick Start

Prerequisites

Installation

# Install globally
npm install -g pdf-ocr-cli

# Or use without installing
npx pdf-ocr-cli --input input.pdf --output output.pdf

Set Up API Keys

Create a .env file in your working directory:

echo "MISTRAL_API_KEY=your_mistral_api_key_here" > .env
echo "TOGETHER_API_KEY=your_together_api_key_here" >> .env

Or set environment variables in your shell:

export MISTRAL_API_KEY=your_mistral_api_key_here
export TOGETHER_API_KEY=your_together_api_key_here

Basic Usage

# Process a PDF file
pdf-ocr --input input.pdf --output output.pdf

# With verification to improve OCR quality
pdf-ocr --input input.pdf --output output.pdf --verify

Common Use Cases

Process Large Documents Efficiently

# Process 3 pages at a time
pdf-ocr --input input.pdf --output output.pdf --concurrency 3

Handle Network Issues

# Increase retries and timeout for unstable connections
pdf-ocr --input input.pdf --output output.pdf --retries 5 --timeout 60000

Process Carefully with Detailed Logs

# Process one page at a time with longer pauses and verbose logging
pdf-ocr --input input.pdf --output output.pdf --concurrency 1 --sleep 10000 --verbose

Command Options

Basic Options

| Option | Alias | Description | Default | |--------|-------|-------------|---------| | --input | -i | Input PDF file path | Required | | --output | -o | Output PDF file path | Required | | --concurrency | -c | Pages to process in parallel | 2 | | --max-pages | -m | Maximum pages to process | All | | --help | -h | Display help information | | | --version | -v | Display version information | |

OCR Options

| Option | Alias | Description | Default | |--------|-------|-------------|---------| | --retries | -r | Maximum OCR retry attempts | 3 | | --retry-delay | -d | Delay between retries (ms) | 1000 | | --timeout | -t | OCR API request timeout (ms) | 30000 | | --sleep | -s | Time between processing pages (ms) | 5000 | | --verbose | -v | Enable detailed logging | |

Verification Options

| Option | Description | Default | |--------|-------------|---------| | --verify | Enable LLM verification | | | --max-tokens | Maximum tokens for verification | 1000 | | --temperature | Temperature for verification | 0.7 | | --top-p | Top-p for verification | 0.9 |

Advanced Installation

Install from Source

# Clone and build
git clone https://github.com/luandro/pdf-ocr.git
cd pdf-ocr
npm install
npm run build

# Set up environment
cp .env.example .env
# Edit .env with your API keys

Development

This project follows Test-Driven Development principles:

# Run tests with coverage
npm test

# Run tests in watch mode
npm run test:watch

# Build the project
npm run build

# Run in development mode
npm run dev -- --input input.pdf --output output.pdf

Test Coverage

The project maintains high test coverage (>80%) for quality assurance:

# Run tests with coverage
npm test

# View coverage report
open coverage/lcov-report/index.html

Continuous Integration

GitHub Actions automates testing and publishing:

  • Tests run on every push to main
  • Coverage reports are generated
  • Automatic npm publishing when tests pass

Architecture

The application consists of these key modules:

  1. PDF Splitter (src/splitPdf.ts): Divides PDFs into individual pages
  2. OCR Module (src/ocr.ts): Extracts text using Mistral API
  3. Content Verification (src/contentVerification.ts): Improves text with LLM
  4. Text-to-PDF Converter (src/textToPdf.ts): Converts text back to PDF
  5. PDF Merger (src/mergePdfs.ts): Combines processed pages
  6. CLI (src/cli.ts): Provides the command interface

Processing Pipeline

  1. Split input PDF into individual pages
  2. Process each page sequentially:
    • Extract text with Mistral API OCR
    • Optionally verify/improve text with Together.ai
    • Convert text back to PDF format
  3. Merge all processed pages into final PDF

Troubleshooting

  • API Key Errors: Ensure your .env file contains valid API keys
  • Network Issues: Try increasing --retries, --timeout, and --retry-delay
  • Poor OCR Quality: Enable --verify to improve text with LLM
  • Processing Large Files: Reduce --concurrency and increase --sleep
  • Memory Issues: Process fewer pages at once with --max-pages

Contributing

Please see CONTRIBUTING.md for guidelines on contributing to this project.

License

This project is licensed under the ISC License - see the LICENSE file for details.