docqa-mcp
v1.0.2
Published
MCP server for DocQA — AI document verification, PDF extraction, OCR, and format conversion via api.agentsconsultants.com
Maintainers
Readme
docqa-mcp
MCP (Model Context Protocol) server for DocQA — verify AI extraction results against original documents, plus PDF extraction, OCR, and format conversion.
Catches what your AI missed: hallucinated invoice totals, wrong dates, arithmetic errors, mismatched field values — all without sending documents to any LLM.
Quick Start
npx docqa-mcpRequires DOCQA_API_KEY environment variable. Get your key at: https://api.agentsconsultants.com
Setup
1. Get an API Key
Request a key at api.agentsconsultants.com. Free trial available.
2. Configure Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"docqa": {
"command": "npx",
"args": ["docqa-mcp"],
"env": {
"DOCQA_API_KEY": "your-api-key-here"
}
}
}
}3. Or install globally
npm install -g docqa-mcpThen configure with command: "docqa-mcp" and your DOCQA_API_KEY env var.
Tools
docqa-verify ⭐ Core Tool
Verify AI-extracted fields against the original document. Detects hallucinations, arithmetic errors, and field mismatches.
Basic tier — Pattern validation + arithmetic checks (no document needed):
{
"extraction": {
"invoice_number": "INV-2024-0042",
"date": "03/15/2024",
"total": "$1,296.00",
"subtotal": "$1,200.00",
"tax": "$96.00"
}
}Standard tier — Full cross-extraction with independent OCR/PDF re-parse:
{
"extraction": { "total": "$1,296.00", "date": "03/15/2024" },
"document": "<base64-encoded-pdf>",
"callerTool": "tesseract"
}Gate mode — Simple pass/fail for agent workflow integration:
{
"extraction": { ... },
"mode": "gate"
}Returns: { "pass": true, "confidence": 0.94, "failReasons": [] }
Pricing:
- Basic: $0.05/verification — pattern + arithmetic checks
- Standard: $0.50/verification — full cross-extraction
docqa-sample
Fetch a sample verification report to understand the output format. Free, no API key needed.
pdf-extract
Extract structured text and tables from PDF documents.
- Input:
file(base64 PDF), optionalfilename - Output: Extracted text, page count, tables, metadata, text quality assessment
- Tip: If
textQuality: "low", the PDF is likely scanned — useocrinstead
ocr
Extract text from images using Tesseract OCR.
- Input:
file(base64 image), optionallanguage(default:eng), optionalfilename - Supports: PNG, JPG, TIFF, BMP, WebP
- Output: Recognized text, confidence score, character count
convert
Convert documents between formats.
- Input formats:
pdf,md,json,txt - Output formats:
txt,md,json - Input:
file(base64),from,to - Output: Converted document (base64) + size stats
Why DocQA?
AI extraction tools (OCR, LLMs, form recognizers) hallucinate. DocQA independently re-extracts and cross-verifies:
| Problem | DocQA Catch | |---------|-------------| | "Invoice total: $1,296" (actual: $1,269) | ✅ Amount mismatch | | "Date: 2024-03-15" (actual: 2024-03-51) | ✅ Date format invalid | | "Tax: $96, Subtotal: $1,200, Total: $1,394" | ✅ Arithmetic mismatch | | LLM invents a line item not in the PDF | ✅ Unreported field detected |
vs. Competitors
| Feature | docqa-mcp | PDF Extraction MCP | Advanced OCR MCP | Mistral OCR MCP | |---------|-----------|-------------------|-------------------|-----------------| | Cross-extraction verification | ✅ | ❌ | ❌ | ❌ | | Hallucination detection | ✅ | ❌ | ❌ | ❌ | | Arithmetic validation | ✅ | ❌ | ❌ | ❌ | | Gate mode for pipelines | ✅ | ❌ | ❌ | ❌ | | PDF extraction | ✅ | ✅ | ❌ | ✅ | | OCR | ✅ | ❌ | ✅ | ✅ | | Price per doc | $0.05-0.50 | varies | varies | varies |
API Base URL
Default: https://api.agentsconsultants.com
Override: OPENCLAW_API_URL=https://your-instance.com
License
MIT
