botrun-mcp-rust-pdf
v5.1124.1715
Published
High-performance PDF processing MCP Server built with Rust
Downloads
12
Maintainers
Readme
botrun-mcp-rust-pdf
🤖 High-performance PDF processing MCP Server built with Rust, with optional CLI tools.
🚀 快速開始
在 Claude Code 中安裝 MCP Server
編輯你的 Claude Code 設定檔 ~/.claude.json,加入以下設定:
{
"/your/project/path": {
"allowedTools": [],
"mcpContextUris": [],
"mcpServers": {
"botrun-mcp-rust-pdf": {
"type": "stdio",
"command": "npx",
"args": [
"[email protected]"
],
"env": {}
}
}
}
}設定完成後,在 Claude Code 互動模式中驗證:
use botrun-mcp-rust-pdf
上網閱讀 PDF 檔案 https://www-api.moda.gov.tw/File/Get/moda/zh-tw/Q36yy38R4W7QI02
我想知道他的目錄是什麼Claude 將自動下載、解析 PDF 並回答您的問題!
✨ Features
- 📥 Dual Source Support: Process both remote URLs and local files (new in v5.11.0!)
- 📄 PDF Metadata: Get page count, file size, token estimation
- 📝 Text Extraction: Extract text from specific pages or entire PDFs
- 🔍 Keyword Search: Search with context (like ripgrep for PDFs)
- 🧩 Chunked Extraction: Handle large PDFs without token limits
- 🔒 Path Security: Multi-layer protection against directory traversal attacks
- 🤖 MCP Server: Integrate with Claude Code and other AI assistants
- ⚡ Blazing Fast: Built with Rust for maximum performance
- 🌍 Cross-Platform: macOS (Intel & Apple Silicon) and Linux
📦 Installation
MCP Server (主要用途)
See the 快速開始 section above for .claude.json configuration.
CLI 工具 (選用)
# NPX (無需安裝)
npx [email protected] metadata --url https://example.com/doc.pdf
# NPM 全域安裝
npm install -g botrun-mcp-rust-pdf
botrun-mcp-rust-pdf-cli metadata --url https://example.com/doc.pdf🎯 Usage
📥 Source Support (Remote URL + Local Files)
Remote URL (Internet PDF):
npx [email protected] metadata --url https://example.com/doc.pdfLocal File (Your computer):
npx [email protected] metadata --url ./data/report.pdfSecurity: Local files must start with ./ (current directory only) to prevent path traversal attacks.
✅ Allowed: ./data/report.pdf, ./reports/2024/annual.pdf
❌ Blocked: ../secret.pdf, /etc/passwd, ~/documents/file.pdf
CLI Mode
1. Get PDF Metadata
Remote URL:
npx [email protected] metadata --url https://www-api.moda.gov.tw/File/Get/moda/zh-tw/Q36yy38R4W7QI02Local File:
npx [email protected] metadata --url ./data/sample.pdfOutput:
{
"total_pages": 90,
"file_size_mb": 24.19,
"estimated_chars_per_page": 2000,
"estimated_total_chars": 180000,
"estimated_tokens": 45000,
"suggestion": "Use chunked extraction (large PDF)"
}2. Extract Text from Specific Pages
Remote URL:
npx [email protected] extract \
--url https://example.com/doc.pdf \
--start-page 1 \
--end-page 10 \
--max-chars 20000Local File:
npx [email protected] extract \
--url ./reports/annual-report.pdf \
--start-page 5 \
--end-page 153. Extract Large PDF in Chunks
# Get metadata first to see total chunks
npx [email protected] metadata --url ./data/large-document.pdf
# Extract chunk by chunk
npx [email protected] chunked --url ./data/large-document.pdf --chunk-index 0
npx [email protected] chunked --url ./data/large-document.pdf --chunk-index 1
# ...4. Search Keywords
Remote URL:
npx [email protected] search \
--url https://example.com/doc.pdf \
--keyword "artificial intelligence" \
--context 3 \
--max-results 10Local File:
npx [email protected] search \
--url ./data/technical-doc.pdf \
--keyword "API" \
--context 3Advanced Search:
# Case-sensitive search
npx [email protected] search --url ./data/doc.pdf --keyword "AI" --case-sensitive
# Regex search
npx [email protected] search --url ./data/doc.pdf --keyword "AI|ML|DL" --regex
# Custom context
npx [email protected] search --url ./data/doc.pdf --keyword "error" --context-before 5 --context-after 2MCP Server Mode (主要用途)
Available MCP Tools
All tools support both remote URLs and local files (with security protection):
get_pdf_metadata - Get PDF information (use FIRST)
- Remote:
{"url": "https://example.com/file.pdf"} - Local:
{"url": "./data/sample.pdf"}
- Remote:
extract_pdf_text - Extract specific page ranges
- Remote:
{"url": "https://...", "start_page": 1, "end_page": 10} - Local:
{"url": "./reports/report.pdf", "max_chars": 10000}
- Remote:
extract_pdf_chunked - Handle large PDFs in chunks
- Remote:
{"url": "https://...", "chunk_index": 0} - Local:
{"url": "./data/large.pdf", "chunk_index": 1}
- Remote:
search_pdf_keyword - Search with context
- Remote:
{"url": "https://...", "keyword": "AI", "context": 3} - Local:
{"url": "./data/doc.pdf", "keyword": "error", "regex": true}
- Remote:
🔒 Security Note: Local files must use ./ prefix (current directory only) for safety.
🌍 Supported Platforms
| Platform | Architecture | Status | |----------|-------------|--------| | macOS | Apple Silicon (arm64) | ✅ Supported | | macOS | Intel (x64) | ✅ Supported | | Linux | x64 (Cloud Run, GCP VM) | ✅ Supported | | Windows | x64 | ❌ Not yet |
🐳 Docker / Cloud Run
Dockerfile Example
FROM node:18-slim
# Install botrun-mcp-rust-pdf (will automatically select linux-x64)
RUN npx [email protected] --help
# Your app setup
COPY . /app
WORKDIR /app
# Run MCP server
CMD ["npx", "[email protected]"]Cloud Run Deployment
# Build and deploy
gcloud run deploy rust-pdf-mcp \
--source . \
--platform managed \
--region us-central1 \
--allow-unauthenticated🔒 Security Features
Path Traversal Protection
Local file access is restricted to the current working directory:
✅ Allowed Patterns:
./data/sample.pdf- Current directory./reports/2024/annual.pdf- Subdirectories
❌ Blocked Patterns:
/etc/passwd- Absolute paths../secrets.pdf- Parent directory traversal./../../etc/passwd- Path traversal attempts~/documents/file.pdf- Home directory expansion
4-Layer Security Architecture:
- ✅ Format validation (must start with
./orhttps://) - ✅ Path content check (no
..allowed) - ✅ Canonicalization + boundary verification
- ✅ File extension validation (
.pdfonly)
🔧 Advanced Configuration
Environment Variables
RUST_LOG: Set log level (e.g.,info,debug)PDF_CACHE_DIR: Custom cache directory for downloaded PDFs
Cache Management
PDFs are cached in:
- macOS:
~/Library/Caches/rust-pdf/ - Linux:
~/.cache/rust-pdf/
Clear cache:
rm -rf ~/.cache/rust-pdf/ # Linux
rm -rf ~/Library/Caches/rust-pdf/ # macOS🛠️ Development
Build from Source
git clone https://github.com/bohachu/botrun-mcp-8.git
cd botrun-mcp-8
# Build binaries for all platforms
./scripts/build-cross-platform.sh
# Test locally
cd npm-packages/rust-pdf
npm link
npx rust-pdf --helpRun Tests
npm test📊 Performance
- Binary Size: ~6-9 MB (includes all dependencies)
- Startup Time: < 100ms
- Memory Usage: 5-10 MB
- Processing Speed: 1000+ pages/sec (text extraction)
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md.
📝 License
MIT License - see LICENSE
🔗 Links
- GitHub: https://github.com/bohachu/botrun-mcp-8
- NPM: https://www.npmjs.com/package/botrun-mcp-rust-pdf
- Issues: https://github.com/bohachu/botrun-mcp-8/issues
- MCP Protocol: https://modelcontextprotocol.io/
📧 Support
Made with ❤️ by bohachu | Powered by Rust 🦀
