npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@agenson-horrowitz/document-parser-mcp

v1.0.8

Published

Multi-format document parser MCP server - extract text, tables, and metadata from PDFs, images, HTML, and office documents for AI agents

Readme

Multi-Format Document Parser MCP Server

Smithery npm version Smithery License: MIT MCP Server

A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.

🤖 Why This Exists

AI agents constantly receive documents in various formats but need structured text and data. Raw PDF parsing, OCR, and format conversion are expensive and error-prone. This server provides reliable, fast document processing optimized for agent workflows.

⚡ Key Features

  • Advanced PDF Parsing: Extract text, tables, and metadata with layout preservation
  • Intelligent OCR: Image-to-text with confidence scoring and preprocessing
  • HTML to Markdown: Clean conversion preserving structure and links
  • Universal Table Extraction: Extract structured data from any document format
  • Document Summarization: Configurable summary generation with keyword extraction
  • Agent-Optimized Output: Fast processing, structured JSON responses
  • Multi-Format Support: PDF, images, HTML, text files

🚀 Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/document-parser-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

🛠️ Available Tools

1. parse_pdf

Extract comprehensive information from PDF documents.

Perfect for: Reports, invoices, contracts, research papers, forms

Features:

  • Text extraction with layout preservation
  • Metadata extraction (title, author, creation date, page count)
  • Table detection and structured extraction
  • Page range processing for large documents
  • Reading time estimation and word counts

Example:

{
  "file_path": "/path/to/document.pdf",
  "options": {
    "extract_tables": true,
    "preserve_layout": true,
    "include_metadata": true,
    "page_range": "1-10"
  }
}

2. parse_image_text

Perform high-quality OCR on images with confidence scoring.

Perfect for: Screenshots, scanned documents, photos of text, receipts

Features:

  • Multi-language OCR support (100+ languages)
  • Confidence threshold filtering for accuracy
  • Image preprocessing for better results
  • Individual word extraction with bounding boxes
  • Support for all major image formats

Example:

{
  "image_path": "/path/to/screenshot.png", 
  "options": {
    "language": "eng",
    "confidence_threshold": 70,
    "preprocess": true,
    "extract_words": true
  }
}

3. html_to_markdown

Convert HTML documents to clean, structured markdown.

Perfect for: Web pages, HTML emails, documentation, blog posts

Features:

  • Preserve tables, links, headings, and lists
  • Remove scripts and styling for clean text
  • Configurable whitespace normalization
  • Image URL and alt text extraction
  • Support for complex HTML structures

Example:

{
  "html_content": "<html>...</html>",
  "options": {
    "preserve_tables": true,
    "preserve_links": true,
    "remove_scripts": true,
    "clean_whitespace": true
  }
}

4. extract_tables

Extract structured table data from any document format.

Perfect for: Pricing lists, data reports, spreadsheets, forms

Features:

  • Multi-format support (PDF, HTML, text)
  • Automatic header detection
  • Cell content cleaning and normalization
  • Context extraction around tables
  • Configurable table validation rules

Example:

{
  "file_path": "/path/to/report.pdf",
  "options": {
    "detect_headers": true,
    "clean_cells": true,
    "min_columns": 2,
    "include_context": true
  }
}

5. summarize_document

Generate intelligent summaries of any document type.

Perfect for: Long reports, research papers, articles, documentation

Features:

  • Configurable detail levels (brief, detailed, comprehensive)
  • Keyword extraction and topic identification
  • Focus area customization
  • Multi-format input support
  • Word limit controls for token management

Example:

{
  "file_path": "/path/to/research.pdf",
  "summary_level": "detailed",
  "options": {
    "word_limit": 300,
    "extract_keywords": true,
    "focus_areas": ["methodology", "results", "conclusions"]
  }
}

💰 Pricing

Free Tier

  • 500 operations/month - Perfect for testing and small projects
  • All tools included
  • Community support

Pro Tier - $9/month

  • 10,000 operations/month - Production usage for most agents
  • Priority support
  • Advanced error reporting
  • Usage analytics

Scale Tier - $29/month

  • 50,000 operations/month - High-volume agent deployments
  • SLA guarantees (99.5% uptime)
  • Custom rate limits
  • Direct technical support

Overage pricing: $0.02 per operation beyond your plan limits

🔐 Authentication & Payment

MCPize (Easiest)

  • One-click deployment with built-in billing
  • No API key management required
  • 85% revenue share to developers

Direct API Access

Crypto Micropayments

  • Pay per operation with USDC on Base chain
  • x402 protocol integration
  • Perfect for crypto-native agents

📊 Performance

  • Average processing time: < 3 seconds for typical documents
  • Uptime SLA: 99.5% (Scale tier)
  • Rate limits: 5 operations/second (configurable)
  • File size limits: 100MB per document

🧪 Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test

See Also

🤝 Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "document-parser-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

🔧 API Reference

All tools return consistent response formats:

{
  "success": true,
  "file_path": "/path/to/document.pdf",
  "content": "extracted text...",
  "metadata": {
    "processing_time_ms": 2500,
    "word_count": 1200,
    "confidence": 95
  }
}

Error responses:

{
  "success": false,
  "file_path": "/path/to/document.pdf", 
  "error": "Detailed error message",
  "tool": "parse_pdf"
}

🛟 Support

📝 License

MIT License - feel free to use in commercial AI agent deployments.

🏗️ Built With


Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.