@hole-foundation/mcp-server-pdf
v0.1.0
Published
MCP server for PDF processing with Ghostscript - merge, split, and optimize PDFs for legal exhibits
Maintainers
Readme
MCP PDF Server - Ghostscript-based PDF Processing
A production-grade MCP (Model Context Protocol) server for advanced PDF processing using Ghostscript. Designed for legal document management with tools for merging, splitting, optimizing, and compressing PDFs into exhibit-ready formats.
Made by The HOLE Foundation
Features
✨ PDF Optimization - Reduce file size while maintaining quality 📄 Merge PDFs - Combine multiple documents with optional bookmarking ✂️ Split PDFs - Extract individual pages or page ranges with custom naming 🔐 Compression - Aggressive compression optimized for legal filing (150 DPI) ⚡ Fast Processing - Leverages Ghostscript for high-performance operations 📊 Legal-Ready Output - Meets document exhibit requirements
Installation
Prerequisites
- Node.js 18+
- Ghostscript installed and available in PATH
Install Ghostscript
macOS (with Homebrew):
brew install ghostscriptLinux (Debian/Ubuntu):
sudo apt-get install ghostscriptLinux (RedHat/CentOS):
sudo yum install ghostscriptWindows: Download from ghostscript.com
Install MCP PDF Server
Via npm:
npm install @hole-foundation/mcp-pdf-serverOr globally:
npm install -g @hole-foundation/mcp-pdf-serverQuick Start
As an MCP Server (for Claude, IDE integrations)
Configure in your Claude Code settings or IDE:
{
"mcp-pdf-server": {
"command": "mcp-pdf-server",
"args": ["server"]
}
}Command Line
Optimize a PDF
mcp-pdf-server optimize input.pdf output.pdf --compression mediumOptions:
--compression low|medium|high- Compression intensity (default: medium)--preserve-images- Keep original image quality
Merge PDFs
mcp-pdf-server merge combined.pdf doc1.pdf doc2.pdf doc3.pdf --bookmarksOptions:
--bookmarks- Create bookmarks for each merged document
Split PDF
# Extract all pages individually
mcp-pdf-server split document.pdf ./pages
# Extract specific page ranges
mcp-pdf-server split document.pdf ./pages --range 1-5,10-15
# Custom output pattern
mcp-pdf-server split document.pdf ./pages --all --pattern "doc_%03d.pdf"Options:
--range <range>- Page range (e.g., "1-5,7,10-15") - if not specified, extracts all--pattern <pattern>- Output filename pattern (default: page_%03d.pdf)
Compress PDF
mcp-pdf-server compress input.pdf output.pdf --dpi 150 --quality mediumOptions:
--dpi <number>- Target DPI for images (default: 150, minimum: 72)--quality low|medium|high- Compression quality (default: medium)--keep-metadata- Preserve PDF metadata (removed by default)
API Usage (Programmatic)
import {
optimizePDF,
mergePDFs,
splitPDF,
compressPDF,
} from '@hole-foundation/mcp-pdf-server';
// Optimize a PDF
const result = await optimizePDF('input.pdf', 'output.pdf', {
compressionLevel: 'high',
preserveImages: false,
});
// Merge PDFs
const merged = await mergePDFs(
['doc1.pdf', 'doc2.pdf', 'doc3.pdf'],
'combined.pdf',
{ bookmarkPages: true }
);
// Split PDF
const split = await splitPDF('document.pdf', './pages', {
splitType: 'range',
pageRange: '1-5,10-15',
outputPattern: 'page_%03d.pdf',
});
// Compress for legal filing
const compressed = await compressPDF('input.pdf', 'output.pdf', {
targetDpi: 150,
removeMetadata: true,
quality: 'medium',
});Configuration
Environment Variables
# Path to Ghostscript binary (auto-detected if not set)
GHOSTSCRIPT_PATH=/usr/bin/gs
# Maximum file size allowed (bytes)
MAX_FILE_SIZE=500000000
# Temporary directory for processing
TEMP_DIR=/tmp
# Debug mode
DEBUG=falsedotenvx Integration
This server integrates with dotenvx for secrets management. Add to .env.enc:
GHOSTSCRIPT_API_KEY=your_key_hereLoad with:
dotenvx run -- mcp-pdf-server serverMCP Tools
Tool: optimize_pdf
Optimize a PDF for size and quality, suitable for legal exhibits.
Parameters:
input_path(string, required) - Path to PDF fileoutput_path(string, required) - Output path for optimized PDFcompression_level(string, default: "medium") - low|medium|highpreserve_images(boolean, default: false) - Keep original image quality
Returns: Operation result with file sizes and compression ratio
Tool: merge_pdfs
Merge multiple PDF files into a single document.
Parameters:
input_paths(array, required) - Array of PDF file pathsoutput_path(string, required) - Output path for merged PDFbookmark_pages(boolean, default: false) - Create bookmarks
Returns: Operation result with merged file info
Tool: split_pdf
Split a PDF into separate documents by page or range.
Parameters:
input_path(string, required) - Path to PDF fileoutput_dir(string, required) - Directory for split filessplit_type(string, default: "all") - all|rangepage_range(string) - Page range for split_type=rangeoutput_pattern(string) - Filename pattern
Returns: Array of output files with operation details
Tool: compress_pdf
Aggressively compress PDF for legal filing requirements.
Parameters:
input_path(string, required) - Path to PDF fileoutput_path(string, required) - Output pathtarget_dpi(number, default: 150) - Target DPI (min: 72)remove_metadata(boolean, default: true) - Remove metadataquality(string) - low|medium|high
Returns: Compression result with ratio and file sizes
Legal Document Guidelines
For legal documents and court filings:
- DPI Settings: Use 150 DPI for text-heavy documents, 200 DPI for scans
- Compression: Use "medium" or "high" for filing, "low" for archival
- Metadata Removal: Enable by default for privacy (unless e-discovery needed)
- File Size Limits: Most court systems accept up to 50MB; this server maintains exhibits under 20MB
- Quality Preservation: Text remains OCR-readable at 72+ DPI
Performance
- Merge: ~100 pages/sec
- Split: ~200 pages/sec
- Optimize: ~50 pages/sec
- Compress: ~30 pages/sec
Actual performance varies based on PDF complexity and system resources.
Docker / Cloudflare Containers
Build and deploy:
# Build Docker image
docker build -t mcp-pdf-server .
# Run locally
docker run -p 3000:3000 mcp-pdf-server
# Deploy to Cloudflare Containers
wrangler deploySee DEPLOYMENT.md for detailed Cloudflare Containers setup.
Testing
# Run all tests
npm test
# Run with coverage
npm run test:coverage
# Run specific test suite
npm run test:unit
npm run test:integration
npm run test:e2eContributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Security
- Input validation on all file paths
- Runs in isolated Ghostscript process
- Temporary files cleaned up automatically
- Non-root user in Docker
- No external network calls
For security concerns, see SECURITY.md
Troubleshooting
"Ghostscript not found"
Ensure Ghostscript is installed and in your PATH:
which gs"Permission denied" errors
Ensure the output directory is writable:
chmod 755 /output/directoryLarge file processing
For files >100MB, consider splitting first or increasing system memory:
mcp-pdf-server split huge.pdf ./parts --range 1-50
mcp-pdf-server split huge.pdf ./parts --range 51-100License
MIT - See LICENSE for details
Support
- GitHub Issues: Report bugs
- Documentation: Full docs
- HOLE Foundation: theholefoundation.org
Acknowledgments
Built on top of:
- Ghostscript - PDF rendering engine
- Model Context Protocol - MCP specification
- The HOLE Foundation - Transparency & FOIA tools
Made with ❤️ for legal transparency and document access
