ocr-provenance-mcp

v1.6.9

Published

2 months ago

MCP server for document OCR processing, semantic search, document comparison, clustering, and provenance tracking

Downloads

127

0High
0Medium
0Low

cabdru

mcp ocr provenance semantic-search document-clustering sqlite vector-search claude model-context-protocol

OCR Provenance MCP

Your AI can't read documents. This fixes that. One command. Zero configuration.

OCR Provenance MCP gives your AI complete document intelligence — 153 tools for OCR, semantic search, vision AI, compliance, and provenance tracking — all running on local GPU. No cloud APIs. No data leaves your machine. Ever.

npx -y ocr-provenance-mcp install          # Docker (Linux/WSL + NVIDIA GPU)
npx -y ocr-provenance-mcp install --mac    # macOS (Apple Silicon, no Docker)

Works with Claude Code, Claude Desktop, Cursor, and Windsurf. No GPU? Deploy on a cloud GPU in 4 steps.

After installing, tell your AI: "Open the OCR Provenance dashboard." The dashboard is your command center — view documents, manage databases, add funds, and monitor processing in real time at http://localhost:3367.

NEW: Legal-Domain Embedding Model

General-purpose AI search struggles with legal language. "Consideration" in a contract means contractual value, not thought. Case citations, statutory references, and legal terminology need a model trained on that vocabulary.

OCR Provenance now ships with a legal-domain embedding model (FreeLawProject ModernBERT) fine-tuned on court opinions from CourtListener and RECAP. Choose it when creating a database for legal documents:

ocr_db_create { name: "vendor-contracts-2025", embedding_model: "legal" }

After that, all search, RAG, and embedding operations on that database automatically use the legal model. No extra configuration. See details below.

Install

Docker (Linux / WSL2 + NVIDIA GPU)

Requirements: Docker Desktop, Node.js 20+, NVIDIA GPU recommended (CPU works, just slower).

npx -y ocr-provenance-mcp install

That's it. The installer:

Pulls the AI models image (~14 GB, cached after first install)
Pulls the application image (~500 MB)
Starts the container with automatic GPU detection
Provisions your license key
Registers with any detected AI client (Claude Code, Claude Desktop, Cursor, Windsurf)

Restart your AI client. Then tell your AI: "Open the OCR Provenance dashboard" — this opens the web UI where you can view documents, manage databases, add funds, and monitor everything.

macOS (Apple Silicon — No Docker Required)

Requirements:

Node.js 20+ (brew install node)
Python 3.11+ (brew install [email protected])
Apple Silicon Mac (M1/M2/M3/M4) with 12GB+ unified memory
~15 GB free disk space (for AI models, one-time download)

npx -y ocr-provenance-mcp install --mac

The installer:

Creates a Python virtual environment at ~/.ocr-provenance-mcp/venv/
Installs PyTorch with MPS (Metal Performance Shaders) GPU acceleration
Installs Marker OCR, Chandra OCR 2, nomic embeddings, and all dependencies
Downloads model weights (~14 GB total, one-time, cached at ~/.cache/huggingface/hub/)
Installs the dashboard web UI for account management and billing
Registers with any detected AI client (Claude Code, Claude Desktop, Cursor, Windsurf)

Restart your AI client. The first OCR run downloads the Marker/Surya models (~3.3 GB, one-time).

What runs on Mac: When your AI client connects, three services start automatically:

MCP Server — 153 tools via stdio (direct connection to your AI client)
License Server — billing, auth, Stripe payments (port 3000)
Dashboard — web UI at http://localhost:3367 for account management, document viewing, and adding funds

All three start together when your AI connects and stop together when it disconnects.

Optional: Install LibreOffice for Office-to-PDF conversion in the document viewer:

brew install --cask libreoffice

GPU memory guide:

| Mac | Unified Memory | What Works | |-----|---------------|------------| | M1/M2 (8 GB) | 8 GB | OCR + embeddings + search (VLM skipped automatically) | | M1/M2 Pro (16 GB) | 16 GB | Full pipeline. VLM may be skipped on large documents — close other apps for best results | | M3/M4 Pro (18-24 GB) | 18-24 GB | Full pipeline, comfortable headroom | | M4 Max/Ultra (48-192 GB) | 48-192 GB | Full pipeline, fast processing |

Where data is stored on Mac:

~/.ocr-provenance-mcp/
├── venv/              # Python virtual environment
├── databases/         # SQLite databases (your documents, chunks, embeddings)
├── dashboard/         # Dashboard web UI
└── wrapper.json       # Configuration (license key, settings)

~/.cache/huggingface/hub/
├── models--datalab-to--chandra-ocr-2/     # VLM model (~10 GB)
└── models--nomic-ai--nomic-embed-text-v1.5/  # Embedding model (~500 MB)

Mac troubleshooting:

| Issue | Fix | |-------|-----| | Python 3.11+ required | brew install [email protected] | | No module named torch | Re-run npx -y ocr-provenance-mcp install --mac | | VLM skipped (8 GB Mac) | Normal — OCR and search still work, just no image descriptions | | First OCR run is slow | Marker models downloading (~3.3 GB), subsequent runs are fast | | Dashboard not opening | Visit http://localhost:3367 manually in your browser |

Linux Bare-Metal (No Docker)

Same as macOS but for Linux machines without Docker. Auto-detects NVIDIA CUDA GPUs and falls back to CPU.

Requirements: Node.js 20+, Python 3.11+, ~15 GB free disk space.

npx -y ocr-provenance-mcp install --bare

Cloud Deploy (No Local GPU Required)

No GPU? No problem. Deploy on RunPod with a cloud GPU in 4 steps.

Step 1: Create a Template

Go to RunPod → My Templates → New Template. Fill in:

Compute type: Nvidia GPU
Container image: ocrprovenance/ocr-provenance-mcp:latest
Container disk: 20 GB
Volume disk: 100 GB
Volume mount path: /data
HTTP ports: 3000 (License Server), 3366 (MCP Server API), 3367 (Dashboard)
Environment variables: MCP_TRANSPORT=http, TORCH_DEVICE=auto, MCP_HTTP_PORT=3366, PORT=3367

Click Save Template.

Step 2: Deploy a Pod

Go to Pods → Deploy. Select your template and pick a GPU:

| GPU | VRAM | On-Demand | Spot | Best For | |-----|------|-----------|------|----------| | RTX 3090 | 24 GB | ~$0.34/hr | ~$0.20/hr | Budget processing | | RTX 4090 | 24 GB | ~$0.44/hr | ~$0.28/hr | Fast processing | | RTX 5090 | 32 GB | ~$0.89/hr | ~$0.53/hr | Large docs + full VLM |

Set GPU count to 1. Click Deploy.

Step 3: Access Your Dashboard

Wait ~2-3 minutes for startup. Click the 3367 port link to open the dashboard. All 153 MCP tools are available via the 3366 port.

Your data persists on the volume across pod stop/start. Stop (don't delete) the pod when not using it — you only pay while the GPU is running.

Go to https://runpod.io and set up a GPU pod:

1. Go to My Templates → New Template with these settings:
   - Template Name: OCR Provenance MCP
   - Compute type: Nvidia GPU
   - Container Image: ocrprovenance/ocr-provenance-mcp:latest
   - Container Disk: 20 GB
   - Volume Disk: 100 GB
   - Volume Mount Path: /data
   - HTTP Ports: 3000 (License Server), 3366 (MCP Server API), 3367 (Dashboard)
   - Environment Variables: MCP_TRANSPORT=http, TORCH_DEVICE=auto, MCP_HTTP_PORT=3366, PORT=3367
   - Save the template

2. Go to Pods → Deploy:
   - Select the "OCR Provenance MCP" template
   - Choose RTX 5090 (32 GB) GPU, GPU count 1, On-Demand
   - Click Deploy

3. Wait for "Running" status, then click the 3367 port link to verify
   the dashboard loads. Click the 3366 port link and append /health
   to verify {"status":"ok"}.

Create a Vast.ai account, add $5 credits
Create a template: image ocrprovenance/ocr-provenance-mcp:latest, ports 3000 3366 3367, 50GB+ disk
Search for machines with 12GB+ VRAM, select one, click Rent
Access dashboard via the Instance Portal port links

Update

Updates are safe — your data is never deleted. All databases, processed documents, embeddings, and settings live on a persistent Docker volume that is untouched during updates.

Local Docker

npx -y ocr-provenance-mcp update

This pulls the latest application image (~700 MB), recreates the container with the new code, and reconnects your existing data volume. Your databases, license key, and processed documents remain exactly as they were.

macOS / Linux Bare-Metal

pkill -f "ocr-provenance-mcp"              # Stop running server
npm install -g ocr-provenance-mcp@latest    # Update the package

That's it. Your balance, license key, databases, and processed documents persist automatically. Never delete ~/.ocr-provenance-mcp/.machine_id — this file ties your machine to your account and balance.

Cloud (RunPod / Vast.ai)

Stop your pod
Edit the pod → change Container Image to leapable/ocr-provenance-mcp:latest
Start the pod

Your /data volume with all databases and documents is preserved across image updates.

What gets updated vs. what stays

| Updated (new image) | Preserved (data volume) | |---------------------|------------------------| | MCP server code | All SQLite databases | | License server | Processed documents and chunks | | Dashboard UI | Embeddings and vectors | | Python workers | License key and secrets | | LibreOffice | Machine ID and settings | | Bug fixes and features | Provenance chains |

What You Get

153 MCP Tools Across 18 Categories

| Category | Tools | What It Does | |---|---|---| | Document Ingestion | 7 | Ingest files and directories. Process pending documents, retry failures, reprocess with new settings | | Search | 7 | Keyword (BM25), semantic (vector), and hybrid search with cross-encoder reranking. Cross-database, saved searches, RAG context | | Document Management | 10 | List, view, delete, find similar, detect duplicates, version history, structural analysis | | Provenance Tracking | 6 | Full chain-of-custody: get, verify, export (W3C PROV), query, timeline — all SHA-256 backed | | Vision AI (VLM) | 3 | Describe images, charts, diagrams, and figures using local Chandra VLM | | Image Processing | 8 | List, view, search, delete, reanalyze images. Status monitoring and statistics | | Embeddings | 4 | Generate, list, inspect, rebuild 768-dim vector embeddings. Domain-specific models: general (nomic) or legal (ModernBERT) | | Document Comparison | 6 | Side-by-side diff, comparison history, auto-discover similar pairs, batch compare, similarity matrix | | Clustering | 7 | Auto-cluster documents by similarity, inspect clusters, assign, reassign, merge | | Contract Lifecycle | 9 | Extract clauses, track obligations, calendar view, playbook creation, summarization | | Compliance & Audit | 5 | SOC 2, HIPAA, SOX reports. Compliance exports. Full audit trail | | Collaboration | 11 | Annotations, document locking, search alerts, review workflows | | Workflow & Approvals | 8 | Multi-step approval chains, assignment, queue management, state machine | | Database Management | 24 | Multi-database support, backup/restore, clone, merge, snapshot, archive, share, transfer, workspace | | Tags & Organization | 6 | Create, apply, search, remove tags across documents, chunks, images | | Reports & Analytics | 7 | Quality evaluation, cost tracking, performance metrics, error analytics, trend analysis | | Intelligence | 5 | Interactive guide, table extraction, smart recommendations | | System | 20 | Health checks, config, maintenance, license management, live dashboard, webhooks |

Document Viewer

Click any document in the dashboard to open a full-screen viewer with three tabs:

View — Renders the original file in your browser. PDFs, Office docs, spreadsheets, images, text, markdown — all 18 file types
Chunks — See exactly what the OCR engine extracted. Every chunk, searchable, expandable
Info — Full metadata, OCR quality scores, processing timeline, provenance chain verification

Spreadsheets render in landscape with all columns visible. Files auto-cache for 24 hours and clean up automatically.

6 AI Models. All Local.

| Model | Purpose | What It Does | |---|---|---| | Marker-pdf v1.10.2 | Document OCR | Converts PDF, DOCX, images to structured text with full layout preservation | | Chandra OCR 2 v0.2.0 | Vision AI | Describes images, charts, diagrams extracted from your documents | | nomic-embed-text-v1.5 | General Embeddings | 768-dimensional vectors for general-purpose semantic search | | ModernBERT Legal | Legal Embeddings | 768-dimensional vectors fine-tuned on court opinions for legal document search | | HDBSCAN | Document Clustering | Auto-discovers document groups by semantic similarity | | ms-marco-MiniLM-L-12-v2 | Search Reranking | Cross-encoder that re-scores results for maximum relevance |

Models load into GPU VRAM at the start of each processing job and unload when done. The embedding model is selected per-database — general for everyday documents, legal for contracts and court opinions. Reranker falls back to CPU when GPU is unavailable.

How People Use It

"Search my contracts for indemnification clauses that cap liability"

Create a legal database, ingest your contracts, and search with a legal embedding model that understands contractual language. Results ranked by relevance with exact source attribution and page numbers.

"Compare the draft and executed versions of this agreement"

Structured diff showing every addition, deletion, and modification between two document versions. Side-by-side with similarity scores.

"Extract all obligations from this vendor agreement and show me the deadlines"

Contract metadata, financial terms, obligations, renewal dates, termination provisions. Calendar view of upcoming deadlines across all contracts.

"Process the entire /litigation/discovery/ folder"

One command ingests hundreds of documents. OCR, chunking, embeddings, image extraction, VLM descriptions — the full pipeline runs automatically. Legal embedding model handles all the domain-specific semantics.

"What does the chart on page 5 show?"

Chandra VLM analyzes the extracted image and provides a detailed natural-language description.

"Prove this text came from the original PDF"

Cryptographic provenance chain: source document → OCR result → chunk → embedding. SHA-256 verification at every step. Export in W3C PROV format. Admissible chain of custody.

"Generate a HIPAA compliance report"

Structured compliance reports covering access controls, audit trails, and data handling verification. SOC 2, HIPAA, SOX built in.

Legal Document Intelligence

OCR Provenance includes purpose-built tooling for legal professionals. Every feature works with contracts, court opinions, briefs, statutes, and regulatory filings — with domain-specific search that understands legal language.

Why Legal-Specific Embeddings Matter

General-purpose search models (like the ones in ChatGPT, Copilot, and most AI tools) treat legal terms as ordinary English. This causes real problems:

| Term | General AI Interpretation | Legal Meaning | |------|--------------------------|---------------| | "Consideration" | Thinking about something | Value exchanged in a contract | | "Party" | A social event | A person or entity in an agreement | | "Action" | Doing something | A lawsuit or legal proceeding | | "Interest" | Curiosity | A legal right, share, or stake | | "Relief" | Comfort | A court-ordered remedy |

The legal embedding model (FreeLawProject ModernBERT) is fine-tuned on millions of court opinions from the Free Law Project (CourtListener, RECAP). It understands legal vocabulary, case citation patterns, and statutory references natively.

How to Use It

Step 1: Create a legal database

Tell your AI: "Create a database called vendor-contracts with the legal embedding model"

Or via MCP tool:

{ "name": "vendor-contracts", "embedding_model": "legal", "description": "2025 vendor agreements" }

Step 2: Ingest your documents

Tell your AI: "Ingest all files from /contracts/2025/"

Step 3: Search with legal precision

Tell your AI: "Search for indemnification clauses that limit liability to direct damages"

The search engine embeds your query using the legal model, so it understands what "indemnification" and "direct damages" mean in a contractual context — not just as English words.

What Legal Professionals Can Do

| Task | How | |------|-----| | Find specific clauses | "Search for non-compete provisions with geographic restrictions" | | Compare contract versions | "Compare the draft and executed versions of the Smith agreement" | | Track obligations | "List all payment obligations due in Q2 2025 with deadlines" | | Extract terms | "Extract the termination provisions from this vendor agreement" | | Compliance review | "Generate a HIPAA compliance report for the medical records database" | | Prove chain of custody | "Verify the provenance chain for this extracted clause" | | Batch processing | "Process all 200 contracts in the /litigation/discovery/ folder" | | Cross-database search | "Search across all my contract databases for force majeure clauses" |

Model Details

| Property | General Model | Legal Model | |----------|--------------|-------------| | Name | nomic-embed-text-v1.5 | FreeLawProject ModernBERT | | Dimensions | 768 | 768 | | Max tokens | 8,192 | 8,192 | | VRAM | ~1 GB | ~1.5 GB | | License | Apache 2.0 | Apache 2.0 | | Best for | General content, transcripts, research | Contracts, court opinions, statutes, briefs | | Trained on | Web text, Wikipedia, books | Court opinions (CourtListener, RECAP) |

Both models run 100% locally on your GPU. Both are 768-dimensional, so existing databases and the vector search infrastructure work without any changes. You choose the model per database at creation time.

Data Privacy for Legal Work

This system was designed for sensitive documents:

100% local processing — OCR, search, and AI analysis run entirely on your hardware
Zero cloud APIs — no document content is ever transmitted externally
Cryptographic provenance — every extracted clause traces back to its source via SHA-256 hash chains
HIPAA/SOC 2/SOX compliance exports — built-in audit trail reporting
Air-gap capable — can run with --network=none for mathematically provable zero egress
Non-root container — Docker runs as unprivileged user with --cap-drop=ALL

Your client's documents never leave the machine. Period.

Supported File Formats

| Format | Extensions | Viewer Rendering | |---|---|---| | PDF | .pdf | Native browser PDF viewer | | Microsoft Office | .docx, .doc, .pptx, .ppt, .xlsx, .xls | Converted to PDF via LibreOffice | | Spreadsheets | .xlsx, .xls, .csv | Landscape PDF with fit-to-width (all columns visible) | | Images | .png, .jpg, .jpeg, .tiff, .tif, .bmp, .gif, .webp | Direct image display with zoom controls | | Text | .txt | Monospace text viewer | | Markdown | .md | Rendered as styled HTML |

18 file types. Documents are OCR'd with layout preservation. Images get OCR + VLM analysis. All files viewable in the dashboard's built-in document viewer.

Client Registration

If auto-detection missed your AI client during install:

claude mcp add ocr-provenance-mcp -s user -- npx -y ocr-provenance-mcp

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "ocr-provenance-mcp": {
      "command": "npx",
      "args": ["-y", "ocr-provenance-mcp"]
    }
  }
}

Config location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "ocr-provenance-mcp": {
      "command": "npx",
      "args": ["-y", "ocr-provenance-mcp"]
    }
  }
}

{
  "command": "npx",
  "args": ["-y", "ocr-provenance-mcp"]
}

CLI Commands

npx ocr-provenance-mcp install           # Full setup: models + app + register
npx ocr-provenance-mcp update            # Pull latest image, preserve all data
npx ocr-provenance-mcp start             # Start the container
npx ocr-provenance-mcp stop              # Stop the container (data preserved)
npx ocr-provenance-mcp restart           # Restart with latest config
npx ocr-provenance-mcp status            # Health check + endpoints
npx ocr-provenance-mcp logs [N]          # View last N lines of logs
npx ocr-provenance-mcp config show       # Show all configuration
npx ocr-provenance-mcp config set K=V    # Set a config value
npx ocr-provenance-mcp uninstall         # Remove container (data volume kept)

Endpoints

| Endpoint | URL | Description | |---|---|---| | MCP | http://localhost:3366/mcp | MCP protocol endpoint (SSE) | | Health | http://localhost:3366/health | Service health + tool count | | File Upload | http://localhost:3366/api/upload | Multipart file upload with dedup | | Document Viewer | http://localhost:3366/api/viewer/* | Prepare, serve, and cache documents | | Dashboard | http://localhost:3367 | Web UI with document viewer and account management |

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your AI Client                        │
│         (Claude Code / Desktop / Cursor / Windsurf)      │
└────────────────────────┬────────────────────────────────┘
                         │ MCP Protocol (stdio)
┌────────────────────────▼────────────────────────────────┐
│              NPX Wrapper (npm package)                   │
│   stdio ↔ HTTP bridge · Ingest intercept (auto-upload)   │
│   AI client auto-registration · WSL auto-detection       │
└────────────────────────┬────────────────────────────────┘
                         │ HTTP (localhost:3366)
┌────────────────────────▼────────────────────────────────┐
│                  Docker Container                        │
│                                                          │
│  ┌─────────────────────────────────────────────────┐    │
│  │          MCP Server (TypeScript)                 │    │
│  │   153 tools · REST API · Document Viewer         │    │
│  │   File Upload · Multi-session · Rate limiting    │    │
│  └──────┬──────────┬──────────┬────────────────────┘    │
│         │          │          │                           │
│  ┌──────▼───┐ ┌────▼────┐ ┌──▼──────────┐              │
│  │ OCR      │ │ VLM     │ │ Embedding   │  GPU Daemons  │
│  │(Marker)  │ │(Chandra)│ │(nomic/legal)│              │
│  └──────────┘ └─────────┘ └─────────────┘              │
│  ┌──────────┐ ┌─────────────────────────┐              │
│  │ Reranker │ │ Spreadsheet Prep        │  CPU Workers  │
│  │(MiniLM)  │ │ (openpyxl + LibreOffice)│              │
│  └──────────┘ └─────────────────────────┘              │
│                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ SQLite +     │  │   License    │  │  Dashboard   │  │
│  │ sqlite-vec   │  │   Server     │  │  (Next.js)   │  │
│  │ FTS5 + vec   │  │  (port 3000) │  │ (port 3367)  │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘

Provenance Chain

Every piece of extracted data is cryptographically linked to its source:

DOCUMENT(0) → OCR_RESULT(1) → CHUNK(2)     → EMBEDDING(3)
                              → IMAGE(2)     → VLM_DESC(3) → EMBEDDING(4)

Each node carries a SHA-256 hash. Verify any result traces back to its original document with ocr_provenance_verify. Export the full chain in W3C PROV format.

Performance

Benchmarked on NVIDIA RTX 5090 (32 GB VRAM):

| Operation | Speed | Notes | |---|---|---| | OCR | ~2-5 s/page | Layout-preserving with Marker-pdf (daemon mode) | | Embedding | ~12 ms/chunk | Daemon mode — model loaded once, reused for entire batch | | VLM image description | ~2-5 s/image | Chandra generates detailed descriptions | | Semantic search | < 100 ms | sqlite-vec cosine similarity across 10K+ vectors | | Hybrid search | < 200 ms | BM25 + semantic with Reciprocal Rank Fusion + cross-encoder reranking | | Full pipeline (1,150 docs) | ~3 min | OCR + chunk + embed for 1,150 markdown transcripts |

Security

Your data never leaves your machine.

100% local processing — all OCR, VLM, and embedding inference runs on your hardware
Ed25519 signed license tokens — cryptographic authentication with offline verification
SHA-256 provenance chains — every extraction linked to its source document
HMAC-signed balances — tamper detection on all billing operations
Container hardening — --cap-drop=ALL, --security-opt=no-new-privileges, non-root user
Secret isolation — signing keys stripped from dashboard process
Input validation — Zod schema validation on all 153 tool inputs
Path sanitization — directory traversal prevention on all file operations
Magic byte validation — upload rejects files with mismatched extensions and suggests the correct format
Cloud auth protection — dev tokens blocked on non-localhost origins to prevent email spoofing on cloud deployments
Zero telemetry — no analytics, no tracking, no phone-home

Docker Images

Two-image architecture for fast updates:

| Image | Size | Contents | Updates | |---|---|---|---| | leapable/ocr-provenance-models:v2 | ~14 GB | PyTorch, CUDA, all 5 AI models | Rarely | | leapable/ocr-provenance-mcp:latest | ~700 MB | MCP server, license server, dashboard, LibreOffice | Every release |

Models image is pulled once and cached. Updates only download the ~700 MB app layer. Your data volume is never touched during updates.

System Requirements

| Component | Minimum | Recommended | |---|---|---| | Docker | Docker Engine 20+ | Docker Desktop (latest) | | Node.js | 20.0+ | 22+ LTS | | RAM | 8 GB | 16+ GB | | Disk | 30 GB | 50+ GB | | GPU | Optional (CPU works) | NVIDIA RTX 3060+ (8GB+ VRAM) | | OS | Linux, macOS, Windows (WSL2) | Linux with NVIDIA drivers |

No GPU? Use Cloud Deploy to run on a cloud GPU instead.

Development

git clone https://github.com/ChrisRoyse/ocrprovenancelocalmcp.git
cd ocrprovenancelocalmcp
npm install
npm run build
npm test          # 4,067 tests across 144 files
npm run lint:all  # TypeScript + Python
npm run typecheck

License

Dual License — free for non-commercial use (personal, academic, research, non-profit). Commercial use requires a commercial license.

See LICENSE for full terms.

Contact: [email protected]

npx -y ocr-provenance-mcp install