@nova-mind-cloud/pdf-parser-mcp

v1.0.4

Published

9 months ago

MCP Server for PDF parsing and content extraction

0High
0Medium
0Low

gdm-pixel

mcp model-context-protocol pdf parser pdf-extraction document-parsing claude gdm-pixel

@gdm-pixel/pdf-parser-mcp

🔐 Subscription-Based PDF Parser MCP

PDF parsing and text extraction.

⚠️ Requires Nova-Mind Cloud subscription - Starting at €39/month

💎 Open Source Code + Cloud Services

✅ Code is open - Audit, learn, modify freely
🔐 Usage requires subscription - Backend authentication & infrastructure

👉 View pricing | Plans: €39 / €89 / €149 per month

✨ Features

📄 PDF Text Extraction - Extract all text content from PDF files
📊 Metadata Extraction - Get PDF metadata (title, author, pages, etc.)
📂 Batch Processing - Parse multiple PDFs in a directory
🔍 Page Limiting - Extract specific page ranges
⚡ Fast Processing - Efficient parsing engine
🌐 Cross-platform - Works on Windows, macOS, Linux

📦 Installation

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-parser": {
      "command": "npx",
      "args": ["-y", "@gdm-pixel/pdf-parser-mcp@latest"]
    }
  }
}

Config location:

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

🚀 Usage

Extract Text from PDF

Extract text from this PDF: C:\Documents\report.pdf

Returns:

Full text content
Page count
Metadata (if available)

Extract with Page Limit

Extract first 10 pages from C:\Documents\long-document.pdf

Useful for:

Large documents
Quick previews
Specific chapters

Get PDF Metadata

Analyze this PDF file: C:\Documents\report.pdf

Returns:

Title
Author
Creator
Creation date
Number of pages
PDF version

List PDFs in Directory

List all PDF files in C:\Documents\Reports

Returns:

All PDF files found
File sizes
File paths

📋 Available Tools

`parse-pdf`

Extract text content from a PDF file.

Parameters:

filePath (required) - Absolute path to PDF file
options (optional) - Parsing options
- extractMetadata (boolean) - Extract metadata (default: true)
- maxPages (number) - Maximum pages to extract

Example:

{
  "filePath": "C:\\Documents\\report.pdf",
  "options": {
    "extractMetadata": true,
    "maxPages": 10
  }
}

`analyze-pdf`

Get detailed PDF information without extracting text.

Parameters:

filePath (required) - Absolute path to PDF file

Returns:

Metadata (title, author, dates)
Page count
PDF version
File size

`list-pdf-files`

List all PDF files in a directory.

Parameters:

directory (required) - Directory path to scan

Returns:

Array of PDF file paths
File sizes
File names

🔧 Troubleshooting

File Not Found

Problem: "File not found" or "Cannot read PDF" errors

Solutions:

Verify file path is absolute (not relative)
Check file exists at specified location
Ensure file has .pdf extension
Verify you have read permissions

Extraction Fails

Problem: PDF opens but text extraction fails

Solutions:

Check PDF is not password protected
Verify PDF contains actual text (not scanned images)
For image-based PDFs, you need OCR (not included)
Try extracting fewer pages with maxPages

Garbled Text

Problem: Extracted text is unreadable or has weird characters

Solutions:

PDF may have non-standard encoding
Try different PDF reader to verify content
Check if PDF is corrupted
Some encrypted PDFs may produce garbled output

Performance Issues

Problem: Large PDFs take too long to parse

Solutions:

Use maxPages option to limit extraction
Process PDFs in smaller chunks
Close other applications to free up memory
Consider splitting large PDFs into smaller files

💡 Use Cases

Document Analysis

Extract and summarize this PDF report: C:\Reports\Q4-2024.pdf

Batch Processing

List all PDFs in C:\Documents then extract text from each

Research & Data Extraction

Extract first 5 pages from C:\Papers\research.pdf and find key findings

Content Migration

Extract all text from old PDFs in C:\Archive for new system

Metadata Inspection

Analyze these PDFs and show their metadata: C:\Downloads\*.pdf

📝 Notes

Supported PDF Types

✅ Text-based PDFs (created from Word, LaTeX, etc.)
✅ PDFs with embedded fonts
✅ Multi-page documents
❌ Image-only PDFs (requires OCR)
❌ Password-protected PDFs
⚠️ Scanned documents (may need OCR)

Performance Tips

Use maxPages for large documents
Process PDFs in batches
Extract metadata first to check page count
Close other applications for large files

Limitations

No OCR support (image-based PDFs)
No password-protected PDF support
No PDF editing/creation
Text extraction only (no images)

🔒 Security

This tool:

✅ Only reads PDF files (no modifications)
✅ Works with local files only
✅ No data sent to external services
✅ No file system modifications
✅ Requires explicit file paths

📄 License

🔗 Links

🙏 Credits

Created by Charles Annoni (GDM-Pixel)

Part of the Nova-Mind ecosystem - AI-powered coaching platform.

🆘 Need OCR?

For image-based PDFs or scanned documents, you'll need an OCR solution. Consider:

Adobe Acrobat Pro (commercial)
Tesseract OCR (open source)
Online OCR services
Dedicated OCR MCP server (coming soon)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@gdm-pixel/pdf-parser-mcp

🔐 Subscription-Based PDF Parser MCP

💎 Open Source Code + Cloud Services

✨ Features

📦 Installation

With Claude Desktop

🚀 Usage

Extract Text from PDF

Extract with Page Limit

Get PDF Metadata

List PDFs in Directory

📋 Available Tools

parse-pdf

analyze-pdf

list-pdf-files

🔧 Troubleshooting

File Not Found

Extraction Fails

Garbled Text

Performance Issues

💡 Use Cases

Document Analysis

Batch Processing

Research & Data Extraction

Content Migration

Metadata Inspection

📝 Notes

Supported PDF Types

Performance Tips

Limitations

🔒 Security

📄 License

🔗 Links

🙏 Credits

🆘 Need OCR?

`parse-pdf`

`analyze-pdf`

`list-pdf-files`