npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@shtse8/pdf-reader-mcp

v0.3.9

Published

An MCP server providing tools to read PDF files.

Readme

PDF Reader MCP Server (@shtse8/pdf-reader-mcp)

npm version Docker Pulls

Empower your AI agents (like Cline/Claude) with the ability to read and extract information from PDF files within your project, using a single, flexible tool.

This Node.js server implements the Model Context Protocol (MCP) to provide a consolidated read_pdf tool for interacting with PDF documents (local or URL) located within a defined project root directory.


⭐ Why Use This Server?

  • 🛡️ Secure Project Root Focus:
    • All local file operations are strictly confined to the project root directory (determined by the server's launch context), preventing unauthorized access.
    • Uses relative paths for local files. Important: The server determines its project root from its own Current Working Directory (cwd) at launch. The process starting the server (e.g., your MCP host) must set the cwd to your intended project directory.
  • 🌐 URL Support: Can directly process PDFs from public URLs.
  • ⚡ Efficient PDF Processing:
    • Leverages the pdf-parse library for extracting text, metadata, and page information.
  • 🔧 Flexible & Consolidated Tool:
    • A single read_pdf tool handles various extraction needs via parameters, simplifying agent interaction.
  • 🚀 Easy Integration: Get started quickly using npx with minimal configuration.
  • 🐳 Containerized Option: Also available as a Docker image for consistent deployment environments.
  • ✅ Robust Validation: Uses Zod schemas to validate all incoming tool arguments.

🚀 Quick Start: Usage with MCP Host (Recommended: npx)

The simplest way is via npx, configured in your MCP host (e.g., mcp_settings.json).

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "npx",
      "args": [
        "@shtse8/pdf-reader-mcp"
      ],
      "name": "PDF Reader (npx)"
    }
  }
}

(Alternative) Using bunx:

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "bunx",
      "args": [
        "@shtse8/pdf-reader-mcp"
      ],
      "name": "PDF Reader (bunx)"
    }
  }
}

Important: Ensure your MCP Host launches the command with the cwd set to your project's root directory for local file access.


✨ The read_pdf Tool

This server provides a single, powerful tool: read_pdf.

  • Description: Reads content, metadata, or page count from a PDF file (local or URL), controlled by parameters.
  • Input: An object containing:
    • sources (array): Required. An array of source objects. Each object must contain either path (string, relative path to local PDF) or url (string, URL of PDF). Each source object can optionally include:
      • pages (string | number[], optional): Extract text only from specific pages (1-based) or ranges (e.g., [1, 3, 5] or '1,3-5,7') for this specific source. If provided, the global include_full_text flag is ignored for this source.
    • include_full_text (boolean, optional, default false): Include the full text content for each PDF. Ignored if pages is provided.
    • include_metadata (boolean, optional, default true): Include metadata (info and metadata objects) for each PDF.
    • include_page_count (boolean, optional, default true): Include the total number of pages (num_pages) for each PDF.
  • Output: An object containing a results array. Each element corresponds to a source in the input sources array. Processing continues even if some sources fail. Each result object has the following structure:
    • source (string): The original path or URL provided for identification.
    • success (boolean): Indicates if processing this specific source was successful.
    • error (string, optional): Provides an error message if success is false for this source.
    • data (object, optional): Contains the extracted data if success is true for this source:
      • full_text (string, optional)
      • page_texts (array, optional): Array of { page: number, text: string }.
      • missing_pages (array, optional)
      • info (object, optional)
      • metadata (object, optional)
      • num_pages (number, optional)
      • warnings (array, optional): Non-critical warnings for this source (e.g., requested page out of bounds).
  1. Get metadata and page count for multiple files:

    {
      "sources": [
        { "path": "report.pdf" },
        { "url": "http://example.com/another.pdf" },
        { "path": "nonexistent.pdf" }
      ]
    }

    (Example Output: { "results": [ { "source": "report.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 10 } }, { "source": "http://example.com/another.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 5 } }, { "source": "nonexistent.pdf", "success": false, "error": "File not found..." } ] })

  2. Get full text for one file:

    {
      "sources": [{ "url": "http://example.com/document.pdf" }],
      "include_full_text": true,
      "include_metadata": false,
      "include_page_count": false
    }

    (Example Output: { "results": [ { "source": "http://example.com/document.pdf", "success": true, "data": { "full_text": "..." } } ] })

  3. Get text from different pages for different files:

    {
      "sources": [
        { "path": "manual.pdf", "pages": "1-2" },
        { "url": "http://example.com/report.pdf", "pages": [5] }
      ],
      "include_metadata": false, /* Default is true, explicitly set false */
      "include_page_count": false /* Default is true, explicitly set false */
    }

    (Example Output: { "results": [ { "source": "manual.pdf", "success": true, "data": { "page_texts": [...] } }, { "source": "http://example.com/report.pdf", "success": true, "data": { "page_texts": [...] } } ] })


🐳 Alternative Usage: Docker

Configure your MCP Host to run the Docker container, mounting your project directory to /app.

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-v",
        "/path/to/your/project:/app",
        "shtse8/pdf-reader-mcp:latest"
      ],
      "name": "PDF Reader (Docker)"
    }
  }
}

Note on Volume Mount Path: Instead of hardcoding /path/to/your/project, you can often use shell variables to automatically use the current working directory:

  • Linux/macOS: -v "$PWD:/app"
  • Windows Cmd: -v "%CD%:/app"
  • Windows PowerShell: -v "${PWD}:/app"
  • VS Code Tasks/Launch: You might be able to use ${workspaceFolder} if supported by your MCP host integration.

🛠️ Other Usage Options

Local Build (For Development)

  1. Clone: git clone https://github.com/shtse8/pdf-reader-mcp.git
  2. Install: cd pdf-reader-mcp && npm install
  3. Build: npm run build
  4. Configure MCP Host:
    {
      "mcpServers": {
        "pdf-reader-mcp": {
          "command": "node",
          "args": ["/path/to/cloned/repo/pdf-reader-mcp/build/index.js"],
          "name": "PDF Reader (Local Build)"
        }
      }
    }

💻 Development

  1. Clone, npm install, npm run build.
  2. npm run watch for auto-recompile.

🚢 Publishing (via GitHub Actions)

Uses GitHub Actions (.github/workflows/publish.yml) to publish to npm and Docker Hub on pushes to main. Requires NPM_TOKEN, DOCKERHUB_USERNAME, DOCKERHUB_TOKEN secrets.


🙌 Contributing

Contributions welcome! Open an issue or PR.