@shtse8/pdf-reader-mcp
v0.3.9
Published
An MCP server providing tools to read PDF files.
Readme
PDF Reader MCP Server (@shtse8/pdf-reader-mcp)
Empower your AI agents (like Cline/Claude) with the ability to read and extract information from PDF files within your project, using a single, flexible tool.
This Node.js server implements the
Model Context Protocol (MCP) to
provide a consolidated read_pdf tool for interacting with PDF documents (local
or URL) located within a defined project root directory.
⭐ Why Use This Server?
- 🛡️ Secure Project Root Focus:
- All local file operations are strictly confined to the project root directory (determined by the server's launch context), preventing unauthorized access.
- Uses relative paths for local files. Important: The server
determines its project root from its own Current Working Directory (
cwd) at launch. The process starting the server (e.g., your MCP host) must set thecwdto your intended project directory.
- 🌐 URL Support: Can directly process PDFs from public URLs.
- ⚡ Efficient PDF Processing:
- Leverages the
pdf-parselibrary for extracting text, metadata, and page information.
- Leverages the
- 🔧 Flexible & Consolidated Tool:
- A single
read_pdftool handles various extraction needs via parameters, simplifying agent interaction.
- A single
- 🚀 Easy Integration: Get started quickly using
npxwith minimal configuration. - 🐳 Containerized Option: Also available as a Docker image for consistent deployment environments.
- ✅ Robust Validation: Uses Zod schemas to validate all incoming tool arguments.
🚀 Quick Start: Usage with MCP Host (Recommended: npx)
The simplest way is via npx, configured in your MCP host (e.g.,
mcp_settings.json).
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": [
"@shtse8/pdf-reader-mcp"
],
"name": "PDF Reader (npx)"
}
}
}(Alternative) Using bunx:
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "bunx",
"args": [
"@shtse8/pdf-reader-mcp"
],
"name": "PDF Reader (bunx)"
}
}
}Important: Ensure your MCP Host launches the command with the cwd set to
your project's root directory for local file access.
✨ The read_pdf Tool
This server provides a single, powerful tool: read_pdf.
- Description: Reads content, metadata, or page count from a PDF file (local or URL), controlled by parameters.
- Input: An object containing:
sources(array): Required. An array of source objects. Each object must contain eitherpath(string, relative path to local PDF) orurl(string, URL of PDF). Each source object can optionally include:pages(string | number[], optional): Extract text only from specific pages (1-based) or ranges (e.g.,[1, 3, 5]or'1,3-5,7') for this specific source. If provided, the globalinclude_full_textflag is ignored for this source.
include_full_text(boolean, optional, defaultfalse): Include the full text content for each PDF. Ignored ifpagesis provided.include_metadata(boolean, optional, defaulttrue): Include metadata (infoandmetadataobjects) for each PDF.include_page_count(boolean, optional, defaulttrue): Include the total number of pages (num_pages) for each PDF.
- Output: An object containing a
resultsarray. Each element corresponds to a source in the inputsourcesarray. Processing continues even if some sources fail. Each result object has the following structure:source(string): The original path or URL provided for identification.success(boolean): Indicates if processing this specific source was successful.error(string, optional): Provides an error message ifsuccessis false for this source.data(object, optional): Contains the extracted data ifsuccessis true for this source:full_text(string, optional)page_texts(array, optional): Array of{ page: number, text: string }.missing_pages(array, optional)info(object, optional)metadata(object, optional)num_pages(number, optional)warnings(array, optional): Non-critical warnings for this source (e.g., requested page out of bounds).
Get metadata and page count for multiple files:
{ "sources": [ { "path": "report.pdf" }, { "url": "http://example.com/another.pdf" }, { "path": "nonexistent.pdf" } ] }(Example Output:
{ "results": [ { "source": "report.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 10 } }, { "source": "http://example.com/another.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 5 } }, { "source": "nonexistent.pdf", "success": false, "error": "File not found..." } ] })Get full text for one file:
{ "sources": [{ "url": "http://example.com/document.pdf" }], "include_full_text": true, "include_metadata": false, "include_page_count": false }(Example Output:
{ "results": [ { "source": "http://example.com/document.pdf", "success": true, "data": { "full_text": "..." } } ] })Get text from different pages for different files:
{ "sources": [ { "path": "manual.pdf", "pages": "1-2" }, { "url": "http://example.com/report.pdf", "pages": [5] } ], "include_metadata": false, /* Default is true, explicitly set false */ "include_page_count": false /* Default is true, explicitly set false */ }(Example Output:
{ "results": [ { "source": "manual.pdf", "success": true, "data": { "page_texts": [...] } }, { "source": "http://example.com/report.pdf", "success": true, "data": { "page_texts": [...] } } ] })
🐳 Alternative Usage: Docker
Configure your MCP Host to run the Docker container, mounting your project
directory to /app.
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-v",
"/path/to/your/project:/app",
"shtse8/pdf-reader-mcp:latest"
],
"name": "PDF Reader (Docker)"
}
}
}Note on Volume Mount Path: Instead of hardcoding /path/to/your/project,
you can often use shell variables to automatically use the current working
directory:
- Linux/macOS:
-v "$PWD:/app" - Windows Cmd:
-v "%CD%:/app" - Windows PowerShell:
-v "${PWD}:/app" - VS Code Tasks/Launch: You might be able to use
${workspaceFolder}if supported by your MCP host integration.
🛠️ Other Usage Options
Local Build (For Development)
- Clone:
git clone https://github.com/shtse8/pdf-reader-mcp.git - Install:
cd pdf-reader-mcp && npm install - Build:
npm run build - Configure MCP Host:
{ "mcpServers": { "pdf-reader-mcp": { "command": "node", "args": ["/path/to/cloned/repo/pdf-reader-mcp/build/index.js"], "name": "PDF Reader (Local Build)" } } }
💻 Development
- Clone,
npm install,npm run build. npm run watchfor auto-recompile.
🚢 Publishing (via GitHub Actions)
Uses GitHub Actions (.github/workflows/publish.yml) to publish to npm and
Docker Hub on pushes to main. Requires NPM_TOKEN, DOCKERHUB_USERNAME,
DOCKERHUB_TOKEN secrets.
🙌 Contributing
Contributions welcome! Open an issue or PR.
