npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pdf-ppt-mcp

v1.1.0

Published

MCP server for reading PDF, Word, Excel, and PowerPoint files with context-efficient tools

Readme

Document Reader MCP Server

An MCP (Model Context Protocol) server that lets AI assistants read PDF, Word, Excel, and PowerPoint files without blowing up the context window.

npm license: MIT


Why this exists

Most AI tools try to dump an entire document into the prompt at once. This server solves that with context-efficient reading by exposing tools that:

  • Get document metadata first (page count, sheet names)
  • Read one page / slide / sheet at a time
  • Search across the entire document and return only relevant snippets

This keeps your context window clean while still giving the AI access to large documents.


Supported File Types

| Format | Extension(s) | How it's read | |--------|-------------|---------------| | PDF | .pdf | Page-by-page (native PDF pagination) | | Word | .docx | Paragraph-aware chunking | | Excel | .xlsx, .xls | Sheet-by-sheet (full CSV per sheet) | | PowerPoint | .pptx, .ppt | Slide-by-slide (true slide extraction) |


Quick Start

Run instantly with npx (recommended)

npx -y pdf-ppt-mcp

Or install globally

npm install -g pdf-ppt-mcp
mcp-document-server

Or clone and run locally

git clone https://github.com/abdo544445/pdf-ppt-mcp.git
cd pdf-ppt-mcp
npm install
npm run build
npm start

Integration Guide

VS Code (Cline / RooCode / Copilot MCP)

Add to your MCP config file (usually ~/.cline/mcp_settings.json or .vscode/mcp.json):

{
  "mcpServers": {
    "document-reader": {
      "command": "npx",
      "args": ["-y", "pdf-ppt-mcp"]
    }
  }
}

Cursor

Open Settings -> MCP and add:

{
  "mcpServers": {
    "document-reader": {
      "command": "npx",
      "args": ["-y", "pdf-ppt-mcp"]
    }
  }
}

Antigravity / Claude Desktop

In your claude_desktop_config.json:

{
  "mcpServers": {
    "document-reader": {
      "command": "npx",
      "args": ["-y", "pdf-ppt-mcp"]
    }
  }
}

Local path (if you cloned the repo)

{
  "mcpServers": {
    "document-reader": {
      "command": "node",
      "args": ["/absolute/path/to/pdf-ppt-mcp/build/index.js"]
    }
  }
}

Available MCP Tools

1. get_document_info

Always call this first. Returns metadata so the AI knows the document's size before trying to read it.

Input: | Parameter | Type | Description | |-----------|------|-------------| | filePath | string | Absolute path to the document |

Example output:

report.pdf — PDF document
Total pages: 42

data.xlsx — Excel workbook
Sheets (3): "Summary", "Q1 Data", "Q2 Data"

slides.pptx — PowerPoint presentation
Total slides: 12

2. read_document_page

Read one specific page, chunk, or sheet at a time to avoid loading the whole document.

Input: | Parameter | Type | Description | |-----------|------|-------------| | filePath | string | Absolute path to the document | | pageOrSheet | string | PDF/Word/PPT: page/chunk/slide number (e.g. "3"). Excel: sheet name (e.g. "Sheet1") |

Example calls:

// Read page 5 of a PDF
{ "filePath": "/docs/report.pdf", "pageOrSheet": "5" }

// Read slide 2 of a PPTX
{ "filePath": "/docs/deck.pptx", "pageOrSheet": "2" }

// Read the "Sales" sheet of an Excel file
{ "filePath": "/docs/data.xlsx", "pageOrSheet": "Sales" }

// Read chunk 3 of a Word document
{ "filePath": "/docs/contract.docx", "pageOrSheet": "3" }

3. search_document

Search the entire document for a keyword. Returns matching pages/slides/chunks with surrounding context snippets.

Input: | Parameter | Type | Description | |-----------|------|-------------| | filePath | string | Absolute path to the document | | query | string | Search term (case-insensitive) |

Example output:

Found 2 match(es) for "revenue":

[Page 7]:
...total revenue for Q3 was $4.2M, representing a 12% increase over...

[Page 23]:
...projected revenue targets were exceeded in all regions except...

4. list_directory

List all supported document files in a folder. Useful for discovering what documents are available.

Input: | Parameter | Type | Description | |-----------|------|-------------| | directoryPath | string | Absolute path to the directory |

Example output:

Found 4 document(s):

annual_report.pdf  (2,341.2 KB)
budget_2025.xlsx   (128.5 KB)
proposal.docx      (54.8 KB)
presentation.pptx  (8,902.1 KB)

5. read_full_document

Read the entire document at once. Best for small documents only.

Warning: Use read_document_page or search_document for large documents to avoid filling the context window.

Input: | Parameter | Type | Description | |-----------|------|-------------| | filePath | string | Absolute path to the document | | maxChunks | number | Max pages/chunks to read (default: 10, max: 50) |


Recommended Usage Pattern for AI Assistants

When an AI assistant uses this server, the recommended flow is:

1. list_directory("/path/to/folder")                   -> discover available documents
2. get_document_info("/path/to/doc.pdf")               -> learn total pages
3. search_document("/path/to/doc.pdf", "key term")     -> find relevant pages
4. read_document_page("/path/to/doc.pdf", "7")         -> read specific page

This approach typically uses less than 5% of context compared to loading the whole document.


Architecture

The server connects to your AI assistant (via MCP protocol) and routes requests to specialized parsing services under the hood.

[ 🤖 AI Assistant ] <── MCP Protocol ──> [ 📄 Document Reader Server ]
(Claude, Cursor)                                      │
                                                      │ (routes commands)
                                                      ▼
                                                [ MCP Tools ]
                                                      │
        ┌──────────────────────┬──────────────────────┼──────────────────────┐
        ▼                      ▼                      ▼                      ▼
[ read_page ]          [ search_doc ]         [ get_info ]          [ read_full ]
        │                      │                      │                      │
        └──────────────────────┴────────┬─────────────┴──────────────────────┘
                                        │
                                        ▼
                             [ ⚙️  Parsing Services ]
                                        │
             ┌──────────────┬───────────┴───┬──────────────┐
             ▼              ▼               ▼              ▼
       PDFService      WordService    ExcelService    PptService
      (pdf-parse)       (mammoth)        (xlsx)     (officeparser)
             │              │               │              │
             └──────────────┴───────┬───────┴──────────────┘
                                    │
                                    ▼
                         [ 📁 Local File System ]

Folder Structure

pdf-ppt-mcp/
├── src/
│   ├── index.ts                  # MCP Server - tool definitions and routing
│   └── services/
│       ├── pdf.service.ts        # PDF parsing (pdf-parse v2)
│       ├── word.service.ts       # Word parsing (mammoth)
│       ├── excel.service.ts      # Excel parsing (xlsx)
│       └── ppt.service.ts        # PPT parsing (officeparser)
├── build/                        # Compiled output (auto-generated)
├── package.json
├── tsconfig.json
└── README.md

Libraries Used

| Library | Purpose | |---------|---------| | @modelcontextprotocol/sdk | MCP server/client protocol | | pdf-parse | PDF text extraction (v2, page-by-page) | | mammoth | Word .docx text extraction | | xlsx | Excel .xlsx/.xls reading | | officeparser | PowerPoint .pptx/.ppt slide extraction |


Development

# Clone the repo
git clone https://github.com/abdo544445/pdf-ppt-mcp.git
cd pdf-ppt-mcp

# Install dependencies
npm install

# Build TypeScript
npm run build

# Start the server (stdio mode for MCP clients)
npm start

Requirements

  • Node.js >= 20.16.0
  • An MCP-compatible client (VS Code with Cline/RooCode, Cursor, Antigravity, Claude Desktop, etc.)

Roadmap

  • [x] Published to npm — install with npx -y pdf-ppt-mcp
  • [ ] VS Code Extension wrapper for GUI-based document selection
  • [x] Support for .csv files (direct reading)
  • [x] Support for password-protected PDFs
  • [x] OCR for scanned/image-based PDFs

Contributing

Pull requests are welcome. For major changes, please open an issue first.


License

MIT (c) abdo544445