npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

botrun-pdf-multimodal

v1.0.2

Published

PDF multimodal conversion MCP tool for Claude Code and Gemini CLI

Readme

Botrun PDF Multimodal

A Model Context Protocol (MCP) tool that converts PDF files to Markdown text format using Google's Gemini AI for multimodal processing.

Features

  • 🚀 Parallel processing of PDF pages (default: 50 concurrent)
  • 📄 Converts each PDF page to Markdown format
  • 🖼️ Handles both text and images with AI-powered extraction
  • 📊 Preserves table structures in Markdown format
  • 🔄 Smart retry mechanism for API failures
  • 💻 Works as both CLI tool and MCP server

Installation

# Install globally via npm
npm install -g botrun-pdf-multimodal

# Or use directly with npx
npx botrun-pdf-multimodal ./input.pdf

Setup

1. Get Gemini API Key

  1. Visit Google AI Studio
  2. Create a new API key
  3. Set it as an environment variable:
# Add to your shell profile (.bashrc, .zshrc, etc.)
export GEMINI_API_KEY="your-api-key-here"

# Or create a .env file in your project
echo "GEMINI_API_KEY=your-api-key-here" > .env

2. Install for Claude Code

Add to your Claude Code configuration:

# Find your config file location:
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%\Claude\claude_desktop_config.json

Edit the config file and add:

{
  "mcpServers": {
    "botrun-pdf-multimodal": {
      "command": "npx",
      "args": ["-y", "botrun-pdf-multimodal"]
    }
  }
}

Note: The tool will automatically read the GEMINI_API_KEY from your system environment variables or .env file. Make sure you've set it up as described in step 1.

After editing, restart Claude Code for the changes to take effect.

3. Install for Gemini CLI

First, ensure your API key is set in your environment:

# Option 1: Set in your shell profile (~/.bashrc, ~/.zshrc, etc.)
export GEMINI_API_KEY="your-api-key-here"

# Option 2: Create .env file in your project root
echo "GEMINI_API_KEY=your-api-key-here" > .env

Then add to your Gemini CLI settings:

# Project-specific: .gemini/settings.json
# Global: ~/.gemini/settings.json

Edit the settings file and add:

{
  "mcpServers": {
    "botrun-pdf-multimodal": {
      "command": "npx",
      "args": ["-y", "botrun-pdf-multimodal"]
    }
  }
}

The tool will automatically read the API key from your environment variables or .env file.

Then restart Gemini CLI by using /quit and reopening it.

Usage

CLI Mode

Process a single PDF file:

# Using global installation
botrun-pdf-multimodal ./input/document.pdf

# Using npx
npx botrun-pdf-multimodal ./input/document.pdf

# Specify custom output directory
npx botrun-pdf-multimodal ./input/document.pdf ./custom-output

MCP Tool Mode

Once configured in Claude Code or Gemini CLI, you can use natural language:

In Claude Code:

Process the PDF file at /path/to/document.pdf

In Gemini CLI:

gemini "convert the PDF at ./report.pdf to markdown"

API Usage

import { processPDF } from 'botrun-pdf-multimodal';

// Process a PDF programmatically
const result = await processPDF('/path/to/document.pdf');
console.log(`Processed ${result.pageCount} pages to ${result.outputDir}`);

Output Structure

data/output/
└── document-name/
    ├── page_001.md
    ├── page_002.md
    ├── page_003.md
    └── ...

Each page is converted to a comprehensive Markdown file with:

  • Complete text extraction
  • Table preservation in Markdown format
  • Detailed image descriptions
  • Maintained heading hierarchy
  • Special formatting preservation

Configuration

Default settings in src/config.ts:

  • Model: gemini-2.5-flash-lite
  • Concurrent processing: 50 pages
  • Output format: Markdown
  • Retry attempts: 3 with exponential backoff

Environment Variables

  • GEMINI_API_KEY - Required. Your Google Gemini API key
  • PDF_OUTPUT_PATH - Optional. Default output directory (default: ./data/output)
  • PDF_CONCURRENCY - Optional. Number of pages to process in parallel (default: 50)

Development

# Clone the repository
git clone https://github.com/bohachu/botrun-pdf-multimodal.git
cd botrun-pdf-multimodal

# Install dependencies
npm install

# Run in development mode
npm run dev

# Build TypeScript
npm run build

# Run tests
npm test

Requirements

  • Node.js 18+
  • Google Gemini API key
  • TypeScript 5+ (for development)

Troubleshooting

API Key Issues

  • Ensure your API key is valid and has access to gemini-2.5-flash-lite
  • Check that the environment variable is properly set
  • Try running with GEMINI_API_KEY=your-key npx botrun-pdf-multimodal ./test.pdf

MCP Connection Issues

  • Restart Claude Code/Gemini CLI after configuration changes
  • Check logs:
    • Claude Code: ~/Library/Logs/Claude/mcp*.log (macOS)
    • Gemini CLI: Check terminal output

Processing Errors

  • Large PDFs may hit API rate limits - the tool will automatically retry
  • Ensure sufficient disk space for output files
  • Check that input PDF is not corrupted

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Changelog

v1.0.1

  • Improved security: API keys now read from environment variables instead of config files
  • Updated installation instructions for better security practices

v1.0.0

  • Initial release with MCP support
  • Support for Claude Code and Gemini CLI
  • Comprehensive PDF to Markdown conversion
  • Parallel processing with rate limiting