npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@paradyno/pdf-mcp-server

v0.1.1

Published

MCP server for PDF processing - text extraction, search, and outline extraction

Readme

📄 PDF MCP Server

A high-performance MCP server for PDF processing, built in Rust.

License CI codecov

Give your AI agents powerful PDF capabilities — extract text, search, split, merge, encrypt, and more. All dependencies are Apache 2.0 licensed, keeping your project clean and permissive.

✨ Features

| Category | Tools | |----------|-------| | 📖 Reading | extract_text · extract_metadata · extract_outline · extract_annotations · extract_links · extract_form_fields | | 🔍 Search & Discovery | search · list_pdfs · get_page_info · summarize_structure | | 🖼️ Media | Image extraction (via extract_text) · convert_page_to_image | | ✂️ Manipulation | split_pdf · merge_pdfs · compress_pdf · fill_form | | 🔒 Security | protect_pdf · unprotect_pdf · Password-protected PDF support | | 📦 Resources | Expose PDFs as MCP Resources for direct client access | | ⚡ Performance | Batch processing · LRU caching · Operation chaining via cache keys |

🚀 Installation

npm (Recommended)

npm install -g @paradyno/pdf-mcp-server

Pre-built Binaries

Download from GitHub Releases:

| Platform | x86_64 | ARM64 | |----------|--------|-------| | 🐧 Linux | pdf-mcp-server-linux-x64 | pdf-mcp-server-linux-arm64 | | 🍎 macOS | pdf-mcp-server-darwin-x64 | pdf-mcp-server-darwin-arm64 | | 🪟 Windows | pdf-mcp-server-windows-x64.exe | — |

From Source

cargo install --git https://github.com/paradyno/pdf-mcp-server

⚙️ Configuration

Claude Desktop

Add to your claude_desktop_config.json:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "pdf": {
      "command": "npx",
      "args": ["@paradyno/pdf-mcp-server"]
    }
  }
}

Claude Code

claude mcp add pdf -- npx @paradyno/pdf-mcp-server

VS Code

{
  "mcp.servers": {
    "pdf": {
      "command": "npx",
      "args": ["@paradyno/pdf-mcp-server"]
    }
  }
}

🛠️ Tools

Source Types

All tools accept PDF sources in multiple formats:

{ "path": "/documents/file.pdf" }
{ "base64": "JVBERi0xLjQK..." }
{ "url": "https://example.com/document.pdf" }
{ "cache_key": "abc123" }

📖 extract_text

Extract text content with LLM-optimized formatting (paragraph detection, multi-column reordering, watermark removal).

{
  "sources": [{ "path": "/documents/report.pdf" }],
  "pages": "1-10",
  "include_metadata": true
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | pages | string | No | all | Page selection (e.g., "1-5,10,15-20") | | include_metadata | boolean | No | true | Include PDF metadata | | include_images | boolean | No | false | Include extracted images (base64 PNG) | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

📖 extract_outline

Extract PDF bookmarks / table of contents.

{
  "sources": [{ "path": "/documents/book.pdf" }]
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

Response:

{
  "results": [{
    "source": "/documents/book.pdf",
    "outline": [
      {
        "title": "Chapter 1: Introduction",
        "page": 1,
        "children": [
          { "title": "1.1 Background", "page": 3, "children": [] }
        ]
      }
    ]
  }]
}

📖 extract_metadata

Extract PDF metadata (author, title, dates, etc.) without loading full content.

{
  "sources": [{ "path": "/documents/report.pdf" }]
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

📖 extract_annotations

Extract highlights, comments, underlines, and other annotations.

{
  "sources": [{ "path": "/documents/report.pdf" }],
  "annotation_types": ["highlight", "text"],
  "pages": "1-5"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | annotation_types | array | No | all | Filter by types (highlight, underline, text, etc.) | | pages | string | No | all | Page selection | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

📖 extract_links

Extract hyperlinks and internal page navigation links.

{
  "sources": [{ "path": "/documents/paper.pdf" }],
  "pages": "1-10"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | pages | string | No | all | Page selection | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

Response:

{
  "results": [{
    "source": "/documents/paper.pdf",
    "links": [
      { "page": 1, "url": "https://example.com", "text": "Click here" },
      { "page": 3, "dest_page": 10, "text": "See Chapter 5" }
    ],
    "total_count": 2
  }]
}

📖 extract_form_fields

Read form field names, types, current values, and properties from PDF forms.

{
  "sources": [{ "path": "/documents/form.pdf" }],
  "pages": "1"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | pages | string | No | all | Page selection | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

Response:

{
  "results": [{
    "source": "/documents/form.pdf",
    "fields": [
      {
        "page": 1,
        "name": "full_name",
        "field_type": "text",
        "value": "John Doe",
        "is_read_only": false,
        "is_required": true,
        "properties": { "is_multiline": false, "is_password": false }
      },
      {
        "page": 1,
        "name": "agree_terms",
        "field_type": "checkbox",
        "is_checked": true,
        "is_read_only": false,
        "is_required": false,
        "properties": {}
      }
    ],
    "total_fields": 2
  }]
}

🖼️ convert_page_to_image

Render PDF pages as PNG images (base64). Enables Vision LLMs to understand visual layouts, charts, and diagrams.

{
  "sources": [{ "path": "/documents/chart.pdf" }],
  "pages": "1-3",
  "width": 1200
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | pages | string | No | all | Page selection | | width | integer | No | 1200 | Target width in pixels | | height | integer | No | — | Target height in pixels | | scale | float | No | — | Scale factor (overrides width/height) | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

Response:

{
  "results": [{
    "source": "/documents/chart.pdf",
    "pages": [
      {
        "page": 1,
        "width": 1200,
        "height": 1553,
        "data_base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      }
    ]
  }]
}

🔍 search

Full-text search within PDFs with surrounding context.

{
  "sources": [{ "path": "/documents/manual.pdf" }],
  "query": "error handling",
  "context_chars": 100
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | query | string | Yes | — | Search query | | case_sensitive | boolean | No | false | Case-sensitive search | | max_results | integer | No | 100 | Maximum results to return | | context_chars | integer | No | 50 | Characters of context around match | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

🔍 get_page_info

Get page dimensions, word/char counts, token estimates, and file sizes. Useful for planning LLM context usage.

{
  "sources": [{ "path": "/documents/report.pdf" }]
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching | | skip_file_sizes | boolean | No | false | Skip file size calculation (faster) |

Response:

{
  "results": [{
    "source": "/documents/report.pdf",
    "pages": [{
      "page": 1,
      "width": 612.0, "height": 792.0,
      "rotation": 0, "orientation": "portrait",
      "char_count": 2500, "word_count": 450,
      "estimated_token_count": 625,
      "file_size": 102400
    }],
    "total_pages": 10,
    "total_chars": 25000,
    "total_words": 4500,
    "total_estimated_token_count": 6250
  }]
}

Note: Token counts are model-dependent approximations (~4 chars/token for Latin, ~2 tokens/char for CJK). Use as rough guidance only.

🔍 summarize_structure

One-call comprehensive overview of a PDF's structure. Helps LLMs decide how to process a document.

{
  "sources": [{ "path": "/documents/report.pdf" }]
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources | | password | string | No | — | PDF password if encrypted | | cache | boolean | No | false | Enable caching |

Response:

{
  "results": [{
    "source": "/documents/report.pdf",
    "page_count": 25,
    "file_size": 1048576,
    "metadata": { "title": "Annual Report", "author": "Acme Corp" },
    "has_outline": true,
    "outline_items": 12,
    "total_chars": 50000,
    "total_words": 9000,
    "total_estimated_tokens": 12500,
    "pages": [
      { "page": 1, "width": 612.0, "height": 792.0, "char_count": 2000, "word_count": 360, "has_images": true, "has_links": false, "has_annotations": false }
    ],
    "total_images": 5,
    "total_links": 3,
    "total_annotations": 2,
    "has_form": false,
    "form_field_count": 0,
    "form_field_types": {},
    "is_encrypted": false
  }]
}

🔍 list_pdfs

Discover PDF files in a directory with optional filtering.

{
  "directory": "/documents",
  "recursive": true,
  "pattern": "invoice*.pdf"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | directory | string | Yes | — | Directory to search | | recursive | boolean | No | false | Search subdirectories | | pattern | string | No | — | Filename pattern (e.g., "report*.pdf") |

✂️ split_pdf

Extract specific pages from a PDF to create a new PDF.

{
  "source": { "path": "/documents/book.pdf" },
  "pages": "1-10,15,20-z",
  "output_path": "/output/excerpt.pdf"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | source | object | Yes | — | PDF source | | pages | string | Yes | — | Page range (see syntax below) | | output_path | string | No | — | Save output to file | | password | string | No | — | PDF password if encrypted |

Page Range Syntax:

| Syntax | Description | |--------|-------------| | 1-5 | Pages 1 through 5 | | 1,3,5 | Specific pages | | z | Last page | | r1 | Last page (reverse) | | 5-z | Page 5 to end | | z-1 | All pages reversed | | 1-z:odd | Odd pages only | | 1-z:even | Even pages only | | 1-10,x5 | Pages 1–10 except page 5 |

✂️ merge_pdfs

Merge multiple PDFs into a single file.

{
  "sources": [
    { "path": "/documents/chapter1.pdf" },
    { "path": "/documents/chapter2.pdf" }
  ],
  "output_path": "/output/complete-book.pdf"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | sources | array | Yes | — | PDF sources to merge (in order) | | output_path | string | No | — | Save output to file |

✂️ compress_pdf

Reduce PDF file size using stream optimization, object deduplication, and compression.

{
  "source": { "path": "/documents/large-report.pdf" },
  "compression_level": 9,
  "output_path": "/output/compressed.pdf"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | source | object | Yes | — | PDF source | | object_streams | string | No | "generate" | "generate" (best) · "preserve" · "disable" | | compression_level | integer | No | 9 | 1–9 (higher = better compression) | | output_path | string | No | — | Save output to file | | password | string | No | — | PDF password if encrypted |

Response:

{
  "results": [{
    "source": "/documents/large-report.pdf",
    "original_size": 5242880,
    "compressed_size": 2097152,
    "compression_ratio": 0.4,
    "bytes_saved": 3145728
  }]
}

✂️ fill_form

Write values into existing PDF form fields and produce a new PDF.

{
  "source": { "path": "/documents/form.pdf" },
  "field_values": [
    { "name": "full_name", "value": "Jane Smith" },
    { "name": "agree_terms", "checked": true }
  ],
  "output_path": "/output/filled-form.pdf"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | source | object | Yes | — | PDF source | | field_values | array | Yes | — | Fields to fill (see below) | | output_path | string | No | — | Save output to file | | password | string | No | — | PDF password if encrypted |

Field value format:

| Field | Type | Description | |-------|------|-------------| | name | string | Field name (use extract_form_fields to discover names) | | value | string | Text value (for text fields) | | checked | boolean | Checked state (for checkbox/radio fields) |

Supported field types: Text fields, checkboxes, radio buttons. ComboBox/ListBox selection is read-only.

🔒 protect_pdf

Add password protection using 256-bit AES encryption.

{
  "source": { "path": "/documents/confidential.pdf" },
  "user_password": "secret123",
  "allow_print": "none",
  "allow_copy": false
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | source | object | Yes | — | PDF source | | user_password | string | Yes | — | Password to open the PDF | | owner_password | string | No | user_password | Password to change permissions | | allow_print | string | No | "full" | "full" · "low" · "none" | | allow_copy | boolean | No | true | Allow copying text/images | | allow_modify | boolean | No | true | Allow modifying the document | | output_path | string | No | — | Save output to file | | password | string | No | — | Password for source PDF if encrypted |

🔓 unprotect_pdf

Remove password protection from an encrypted PDF.

{
  "source": { "path": "/documents/protected.pdf" },
  "password": "secret123",
  "output_path": "/output/unprotected.pdf"
}

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | source | object | Yes | — | PDF source | | password | string | Yes | — | Password for the encrypted PDF | | output_path | string | No | — | Save output to file |

📦 MCP Resources

Expose PDFs from configured directories as MCP Resources for direct client discovery and reading.

Enabling Resources

# Command line
pdf-mcp-server --resource-dir /documents --resource-dir /data/pdfs

# Short form
pdf-mcp-server -r /documents -r /data/pdfs

# Environment variable (colon-separated)
PDF_RESOURCE_DIRS=/documents:/data/pdfs pdf-mcp-server

Claude Desktop with resources:

{
  "mcpServers": {
    "pdf": {
      "command": "npx",
      "args": ["@paradyno/pdf-mcp-server", "--resource-dir", "/documents"],
      "env": {
        "PDF_RESOURCE_DIRS": "/data/pdfs:/shared/documents"
      }
    }
  }
}

Both methods can be combined — command line arguments are added to environment variable paths.

Resource URIs

PDFs are exposed with file:// URIs:

file:///documents/report.pdf
file:///documents/2024/invoice.pdf

Operations

  • resources/list — Returns all PDFs with URI, name, MIME type, size, and description
  • resources/read — Returns extracted text content, formatted for LLM consumption

Resources vs Tools vs Caching

| Feature | Purpose | Use Case | |---------|---------|----------| | Resources | Passive file discovery | Browse and preview available PDFs | | Tools | Active PDF processing | Extract, search, manipulate PDFs | | CacheRef | Tool chaining | Pass output between operations |

🔗 Caching

When cache: true is specified, the server returns a cache_key for use in subsequent requests:

// Step 1: Extract with caching
{ "sources": [{ "path": "/documents/large.pdf" }], "cache": true }

// Step 2: Use cache_key from response
{ "sources": [{ "cache_key": "a1b2c3d4" }], "pages": "50-60" }

🏗️ Architecture

block-beta
  columns 1
  block:server["MCP Server (rmcp)"]
    columns 3
    extract_text search split_pdf
  end
  block:common["Common Layer"]
    columns 3
    Cache["Cache Manager"] Source["Source Resolver"] Batch["Batch Executor"]
  end
  block:pdf["PDF Processing"]
    columns 2
    PDFium["pdfium-render\n(reading)"] qpdf["qpdf FFI\n(manipulation)"]
  end

  server --> common --> pdf

⚡ Performance

Benchmarked with a 14-page technical paper (tracemonkey.pdf, ~1 MB) on Docker (Apple Silicon):

| Operation | Time | What it means | |-----------|------|---------------| | Extract text (14 pages) | 170 ms | Process ~80 documents per minute | | Metadata only | 0.26 ms | ~4,000 documents per second | | Search | 0.01 ms | Instant results on extracted text | | 100 files batch | 4.8 s | ~21 documents per second |

Key takeaways

  • Fast enough for interactive use — Text extraction completes in under 200ms
  • Metadata is nearly instant — Use extract_metadata or summarize_structure to quickly assess documents before full processing
  • Search is blazing fast — Once text is extracted, searching is essentially free
  • Batch processing scales linearly — No significant overhead when processing many files

Run benchmarks yourself:

docker compose --profile dev run --rm bench

🧑‍💻 Development

# Build
docker compose --profile dev run --rm dev cargo build

# Run tests
docker compose --profile dev run --rm test

# Run tests with coverage
docker compose --profile dev run --rm coverage

# Format code
docker compose --profile dev run --rm dev cargo fmt --all

# Lint
docker compose --profile dev run --rm clippy

# Performance benchmarks
docker compose --profile dev run --rm bench

# Build production image (~120MB)
docker compose --profile prod build production

# Clean up
docker compose --profile dev down --rmi local

Requires PDFium installed locally. Download from pdfium-binaries and set PDFIUM_PATH.

cargo build --release
cargo test
cargo bench
cargo llvm-cov --html
src/
├── main.rs              # Entry point, CLI args
├── lib.rs               # Library root
├── server.rs            # MCP server & tool handlers
├── error.rs             # Error types
├── pdf/
│   ├── reader.rs        # PDFium wrapper (text, metadata, outline)
│   ├── annotations.rs   # Annotation extraction
│   ├── images.rs        # Image extraction
│   └── qpdf.rs          # qpdf FFI (split, merge, encrypt)
└── source/
    ├── resolver.rs      # Path/URL/Base64 resolution
    └── cache.rs         # LRU caching layer

🗺️ Roadmap

Phase 1: Core Reading ✅

extract_text · extract_outline · search · extract_metadata · extract_annotations · Image extraction · Batch processing · Caching

Phase 2: PDF Manipulation ✅

split_pdf · merge_pdfs · protect_pdf · unprotect_pdf · compress_pdf · extract_links · get_page_info

Phase 2.5: LLM-Optimized Text ✅

Dynamic thresholds · Paragraph detection · Multi-column layout · Watermark removal

Phase 2.6: Discovery & Resources ✅

list_pdfs · MCP Resources · Resource directory configuration

Phase 2.7: Vision & Forms ✅

convert_page_to_image · extract_form_fields · fill_form · summarize_structure

Phase 3: Advanced Features (Planned)

  • rotate_pages — Rotate specific pages
  • extract_tables — Structured table extraction
  • add_watermark — Text/image watermarks
  • linearize_pdf — Web optimization
  • OCR support · PDF/A validation · Digital signature verification
  • Large file upload — MCP lacks a standard API for uploading large files (>20MB). Discussed in #1197, #1220, #1659.
  • Chunked file transfer — No standard mechanism exists yet.

Current workarounds: shared filesystem (path), object storage with pre-signed URLs (url), or base64 encoding.

These provide limited value for LLM use cases:

  • Hyphenation merging — LLMs understand hyphenated words
  • Fixed-pitch mode — Limited use cases
  • Bounding box output — LLMs don't need coordinates
  • Invisible text removal — Not supported by pdfium-render API

📄 License

Apache License 2.0

🙏 Acknowledgments

  • PDFium — PDF rendering engine (Apache 2.0)
  • pdfium-render — Rust PDFium bindings (Apache 2.0)
  • qpdf — PDF transformation library, vendored via FFI (Apache 2.0)
  • rmcp — Rust MCP SDK