botrun-pdf-multimodal
v1.0.2
Published
PDF multimodal conversion MCP tool for Claude Code and Gemini CLI
Maintainers
Readme
Botrun PDF Multimodal
A Model Context Protocol (MCP) tool that converts PDF files to Markdown text format using Google's Gemini AI for multimodal processing.
Features
- 🚀 Parallel processing of PDF pages (default: 50 concurrent)
- 📄 Converts each PDF page to Markdown format
- 🖼️ Handles both text and images with AI-powered extraction
- 📊 Preserves table structures in Markdown format
- 🔄 Smart retry mechanism for API failures
- 💻 Works as both CLI tool and MCP server
Installation
# Install globally via npm
npm install -g botrun-pdf-multimodal
# Or use directly with npx
npx botrun-pdf-multimodal ./input.pdfSetup
1. Get Gemini API Key
- Visit Google AI Studio
- Create a new API key
- Set it as an environment variable:
# Add to your shell profile (.bashrc, .zshrc, etc.)
export GEMINI_API_KEY="your-api-key-here"
# Or create a .env file in your project
echo "GEMINI_API_KEY=your-api-key-here" > .env2. Install for Claude Code
Add to your Claude Code configuration:
# Find your config file location:
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%\Claude\claude_desktop_config.jsonEdit the config file and add:
{
"mcpServers": {
"botrun-pdf-multimodal": {
"command": "npx",
"args": ["-y", "botrun-pdf-multimodal"]
}
}
}Note: The tool will automatically read the GEMINI_API_KEY from your system environment variables or .env file. Make sure you've set it up as described in step 1.
After editing, restart Claude Code for the changes to take effect.
3. Install for Gemini CLI
First, ensure your API key is set in your environment:
# Option 1: Set in your shell profile (~/.bashrc, ~/.zshrc, etc.)
export GEMINI_API_KEY="your-api-key-here"
# Option 2: Create .env file in your project root
echo "GEMINI_API_KEY=your-api-key-here" > .envThen add to your Gemini CLI settings:
# Project-specific: .gemini/settings.json
# Global: ~/.gemini/settings.jsonEdit the settings file and add:
{
"mcpServers": {
"botrun-pdf-multimodal": {
"command": "npx",
"args": ["-y", "botrun-pdf-multimodal"]
}
}
}The tool will automatically read the API key from your environment variables or .env file.
Then restart Gemini CLI by using /quit and reopening it.
Usage
CLI Mode
Process a single PDF file:
# Using global installation
botrun-pdf-multimodal ./input/document.pdf
# Using npx
npx botrun-pdf-multimodal ./input/document.pdf
# Specify custom output directory
npx botrun-pdf-multimodal ./input/document.pdf ./custom-outputMCP Tool Mode
Once configured in Claude Code or Gemini CLI, you can use natural language:
In Claude Code:
Process the PDF file at /path/to/document.pdfIn Gemini CLI:
gemini "convert the PDF at ./report.pdf to markdown"API Usage
import { processPDF } from 'botrun-pdf-multimodal';
// Process a PDF programmatically
const result = await processPDF('/path/to/document.pdf');
console.log(`Processed ${result.pageCount} pages to ${result.outputDir}`);Output Structure
data/output/
└── document-name/
├── page_001.md
├── page_002.md
├── page_003.md
└── ...Each page is converted to a comprehensive Markdown file with:
- Complete text extraction
- Table preservation in Markdown format
- Detailed image descriptions
- Maintained heading hierarchy
- Special formatting preservation
Configuration
Default settings in src/config.ts:
- Model:
gemini-2.5-flash-lite - Concurrent processing: 50 pages
- Output format: Markdown
- Retry attempts: 3 with exponential backoff
Environment Variables
GEMINI_API_KEY- Required. Your Google Gemini API keyPDF_OUTPUT_PATH- Optional. Default output directory (default:./data/output)PDF_CONCURRENCY- Optional. Number of pages to process in parallel (default: 50)
Development
# Clone the repository
git clone https://github.com/bohachu/botrun-pdf-multimodal.git
cd botrun-pdf-multimodal
# Install dependencies
npm install
# Run in development mode
npm run dev
# Build TypeScript
npm run build
# Run tests
npm testRequirements
- Node.js 18+
- Google Gemini API key
- TypeScript 5+ (for development)
Troubleshooting
API Key Issues
- Ensure your API key is valid and has access to
gemini-2.5-flash-lite - Check that the environment variable is properly set
- Try running with
GEMINI_API_KEY=your-key npx botrun-pdf-multimodal ./test.pdf
MCP Connection Issues
- Restart Claude Code/Gemini CLI after configuration changes
- Check logs:
- Claude Code:
~/Library/Logs/Claude/mcp*.log(macOS) - Gemini CLI: Check terminal output
- Claude Code:
Processing Errors
- Large PDFs may hit API rate limits - the tool will automatically retry
- Ensure sufficient disk space for output files
- Check that input PDF is not corrupted
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Changelog
v1.0.1
- Improved security: API keys now read from environment variables instead of config files
- Updated installation instructions for better security practices
v1.0.0
- Initial release with MCP support
- Support for Claude Code and Gemini CLI
- Comprehensive PDF to Markdown conversion
- Parallel processing with rate limiting
