@nova-mind-cloud/pdf-parser-mcp
v1.0.4
Published
MCP Server for PDF parsing and content extraction
Downloads
9
Maintainers
Readme
@gdm-pixel/pdf-parser-mcp
🔐 Subscription-Based PDF Parser MCP
PDF parsing and text extraction.
⚠️ Requires Nova-Mind Cloud subscription - Starting at €39/month
💎 Open Source Code + Cloud Services
✅ Code is open - Audit, learn, modify freely
🔐 Usage requires subscription - Backend authentication & infrastructure
👉 View pricing | Plans: €39 / €89 / €149 per month
✨ Features
- 📄 PDF Text Extraction - Extract all text content from PDF files
- 📊 Metadata Extraction - Get PDF metadata (title, author, pages, etc.)
- 📂 Batch Processing - Parse multiple PDFs in a directory
- 🔍 Page Limiting - Extract specific page ranges
- ⚡ Fast Processing - Efficient parsing engine
- 🌐 Cross-platform - Works on Windows, macOS, Linux
📦 Installation
With Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"pdf-parser": {
"command": "npx",
"args": ["-y", "@gdm-pixel/pdf-parser-mcp@latest"]
}
}
}Config location:
- Windows:
%APPDATA%\Claude\claude_desktop_config.json - macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
🚀 Usage
Extract Text from PDF
Extract text from this PDF: C:\Documents\report.pdfReturns:
- Full text content
- Page count
- Metadata (if available)
Extract with Page Limit
Extract first 10 pages from C:\Documents\long-document.pdfUseful for:
- Large documents
- Quick previews
- Specific chapters
Get PDF Metadata
Analyze this PDF file: C:\Documents\report.pdfReturns:
- Title
- Author
- Creator
- Creation date
- Number of pages
- PDF version
List PDFs in Directory
List all PDF files in C:\Documents\ReportsReturns:
- All PDF files found
- File sizes
- File paths
📋 Available Tools
parse-pdf
Extract text content from a PDF file.
Parameters:
filePath(required) - Absolute path to PDF fileoptions(optional) - Parsing optionsextractMetadata(boolean) - Extract metadata (default: true)maxPages(number) - Maximum pages to extract
Example:
{
"filePath": "C:\\Documents\\report.pdf",
"options": {
"extractMetadata": true,
"maxPages": 10
}
}analyze-pdf
Get detailed PDF information without extracting text.
Parameters:
filePath(required) - Absolute path to PDF file
Returns:
- Metadata (title, author, dates)
- Page count
- PDF version
- File size
list-pdf-files
List all PDF files in a directory.
Parameters:
directory(required) - Directory path to scan
Returns:
- Array of PDF file paths
- File sizes
- File names
🔧 Troubleshooting
File Not Found
Problem: "File not found" or "Cannot read PDF" errors
Solutions:
- Verify file path is absolute (not relative)
- Check file exists at specified location
- Ensure file has
.pdfextension - Verify you have read permissions
Extraction Fails
Problem: PDF opens but text extraction fails
Solutions:
- Check PDF is not password protected
- Verify PDF contains actual text (not scanned images)
- For image-based PDFs, you need OCR (not included)
- Try extracting fewer pages with
maxPages
Garbled Text
Problem: Extracted text is unreadable or has weird characters
Solutions:
- PDF may have non-standard encoding
- Try different PDF reader to verify content
- Check if PDF is corrupted
- Some encrypted PDFs may produce garbled output
Performance Issues
Problem: Large PDFs take too long to parse
Solutions:
- Use
maxPagesoption to limit extraction - Process PDFs in smaller chunks
- Close other applications to free up memory
- Consider splitting large PDFs into smaller files
💡 Use Cases
Document Analysis
Extract and summarize this PDF report: C:\Reports\Q4-2024.pdfBatch Processing
List all PDFs in C:\Documents then extract text from eachResearch & Data Extraction
Extract first 5 pages from C:\Papers\research.pdf and find key findingsContent Migration
Extract all text from old PDFs in C:\Archive for new systemMetadata Inspection
Analyze these PDFs and show their metadata: C:\Downloads\*.pdf📝 Notes
Supported PDF Types
- ✅ Text-based PDFs (created from Word, LaTeX, etc.)
- ✅ PDFs with embedded fonts
- ✅ Multi-page documents
- ❌ Image-only PDFs (requires OCR)
- ❌ Password-protected PDFs
- ⚠️ Scanned documents (may need OCR)
Performance Tips
- Use
maxPagesfor large documents - Process PDFs in batches
- Extract metadata first to check page count
- Close other applications for large files
Limitations
- No OCR support (image-based PDFs)
- No password-protected PDF support
- No PDF editing/creation
- Text extraction only (no images)
🔒 Security
This tool:
- ✅ Only reads PDF files (no modifications)
- ✅ Works with local files only
- ✅ No data sent to external services
- ✅ No file system modifications
- ✅ Requires explicit file paths
📄 License
MIT © Charles Annoni
🔗 Links
🙏 Credits
Created by Charles Annoni (GDM-Pixel)
Part of the Nova-Mind ecosystem - AI-powered coaching platform.
🆘 Need OCR?
For image-based PDFs or scanned documents, you'll need an OCR solution. Consider:
- Adobe Acrobat Pro (commercial)
- Tesseract OCR (open source)
- Online OCR services
- Dedicated OCR MCP server (coming soon)
