n8n-nodes-mopdf

v0.3.1

Published

a month ago

n8n community node for PDF processing - convert PDF to images, extract text and run OCR

0High
0Medium
0Low

kayjix

n8n-community-node-package n8n-community-node n8n pdf ocr text-extraction pdf-to-image mupdf tesseract self-hosted

n8n-nodes-mopdf

MoPDF is an n8n community node for local PDF processing. It extracts selectable text, converts PDF pages to images, and runs OCR locally with MuPDF and Tesseract.js.

The project is intended for self-hosted n8n. Document processing does not depend on remote OCR services or SaaS APIs. Depending on the runtime setup, Tesseract.js language assets may need to be available locally or downloaded separately.

Installation

Requirements:

self-hosted n8n
Node.js 18+

Install from Settings -> Community Nodes -> Install:

n8n-nodes-mopdf

Operations

| Operation | Input | Output | Notes | | --- | --- | --- | --- | | PDF to Images | PDF binary | PNG or JPEG binaries | Supports per-page export and page selection | | OCR | Image binary | Text or Markdown | Optional word and line coordinates | | Extract Text | PDF binary | Text, Markdown, JSON, or HTML | Uses direct PDF text extraction only | | Text + OCR Fallback | PDF binary | Text, Markdown, JSON, or HTML | Falls back to OCR only for pages without selectable text |

Output formats

| Format | Description | | --- | --- | | Plain text | Clean extracted text without layout markup | | Markdown | Compact, structure-aware output for humans and LLM pipelines | | JSON | Structured extraction output with layout information | | HTML | Raw HTML-style layout export |

Important n8n note:

In n8n Schema and JSON views, multiline strings are shown with escaped \n sequences because the UI displays serialized JSON.
The stored field value still contains real newline characters.

Project docs

Detailed architecture, development, publishing and community docs live in the GitHub repository.

Repository home
See docs/ for technical documentation
See .github/ for contribution and security guidance

Local development

Install dependencies once:

npm install

Common commands:

npm run build
npm run dev:docker:up
npm run dev:docker:reload
npm run dev:docker:logs
npm run fixtures:build
npm run fixtures:generate:windows

The Docker workflow mounts this repository directly into a local n8n container, so normal code iterations do not require reinstalling the package through the Community Nodes UI.

Detailed setup, environment variables, fixture generation and manual validation steps are documented in the GitHub repository under docs/ and tests/manual-test-plan.md.

npm package scope

The published npm package is intentionally minimal:

built runtime files from dist/
package metadata from package.json
this README.md
LICENSE

Repository-only assets such as source TypeScript files, Docker setup, fixtures, tests, scripts and GitHub community files stay in GitHub.

Licensing

This package depends on:

MuPDF - AGPL v3
Tesseract.js - Apache 2.0

The repository source is licensed under MIT. The installed npm package also depends on MuPDF (AGPL v3) and Tesseract.js (Apache 2.0), so review upstream obligations before redistribution. The bundled LICENSE file includes the repository license text and a dependency notice.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

n8n-nodes-mopdf

Installation

Operations

Output formats

Project docs

Local development

npm package scope

Licensing