n8n-nodes-pdf-utils
v1.2.4
Published
Custom n8n node for PDF inspection, splitting, and decryption
Maintainers
Readme
n8n-nodes-pdf-utils
Custom n8n node for PDF inspection, splitting, and decryption.
Features
🔍 Inspect Operation
- Analyzes PDF structure
- Counts pages
- Detects if PDF is encrypted (returns
isEncrypted: truewithout failing) - Detects if PDF is vectorial (text-based) or rasterized (image-based)
- Extracts text from first page
- Performance: Very fast (tens of milliseconds)
✂️ Split Operation
- Splits multi-page PDFs into individual pages
- Creates one output item per page
- Preserves PDF quality and structure
🔓 Decrypt Operation
- Removes password protection from encrypted PDFs
- Supports user and owner passwords
- Outputs a clean, unencrypted PDF
- Requires
qpdfinstalled on the host (see System Requirements)
System Requirements
The Decrypt operation requires qpdf to be installed on the host running n8n:
# Linux / Docker
apt-get install qpdf
# macOS
brew install qpdfAll other operations (Inspect, Split, Inspect and Split) have no system-level dependencies — they use pure npm packages only.
Installation
Option 1: Install from npm
npm install n8n-nodes-pdf-utilsOption 2: Install locally for development
- Clone this repository
- Install dependencies:
npm install - Build the node:
npm run build - Link to your n8n installation:
npm link cd ~/.n8n/nodes npm link n8n-nodes-pdf-utils - Restart n8n
Option 3: Install in n8n using community nodes
- Go to Settings > Community Nodes
- Click Install
- Enter:
n8n-nodes-pdf-utils - Click Install
Usage
Inspect Operation
Input: Binary data containing a PDF file
Parameters:
Binary Property: Name of the binary property (default: "data")Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)
Output: Single item with analysis + original PDF binary
If the PDF is not encrypted:
{
"json": {
"isEncrypted": false,
"pageCount": 5,
"isMultiPage": true,
"isVectorial": false,
"textLength": 23,
"firstPageText": "Preview of first 200 characters..."
},
"binary": {
"data": "<original PDF>"
}
}If the PDF is encrypted (no password needed to detect it):
{
"json": {
"isEncrypted": true
},
"binary": {
"data": "<original PDF>"
}
}Example workflow:
HTTP Request (download PDF)
→ PDF Utils (Inspect)
→ IF (isEncrypted)
→ PDF Utils (Decrypt) → PDF Utils (Inspect again)
→ IF (isVectorial)
→ Route A (text processing with PDF)
→ Route B (OCR processing with PDF)Inspect and Split Operation
Input: Binary data containing a PDF file
Parameters:
Binary Property: Name of the binary property (default: "data")Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)Output Binary Property: Name for output binary property (default: "data")
Output:
- If vectorial: Single item with analysis + original PDF (pass-through)
- If not vectorial: Multiple items, one per page (split)
Example workflow:
HTTP Request (download PDF)
→ PDF Utils (Inspect and Split)
→ Vectorial PDFs pass through as-is
→ Scanned PDFs split into pages automaticallyUse case: Automatically handle different PDF types without manual branching:
- Text-based PDFs (vectorial) → process as whole document
- Scanned PDFs (non-vectorial) → OCR each page individually
Decrypt Operation
Input: Binary data containing a password-protected PDF
Requires
qpdfinstalled on the host — see System Requirements.
Parameters:
Binary Property: Name of the input binary property (default: "data")Password: User or owner password to decrypt the PDFOutput Binary Property: Name for the output binary property (default: "data")
Output: Single item with the decrypted PDF binary
{
"json": {
"decrypted": true,
"originalFileName": "document.pdf"
},
"binary": {
"data": "<decrypted PDF>"
}
}Example workflow:
HTTP Request (download encrypted PDF)
→ PDF Utils (Decrypt)
→ PDF Utils (Inspect or Split)Split Operation
Input: Binary data containing a multi-page PDF
Parameters:
Binary Property: Name of the input binary property (default: "data")Output Binary Property: Name for output binary property (default: "data")
Output: Multiple items, one per page
- Each item contains binary data with a single-page PDF
- JSON includes
pageNumberandoriginalFileName
Example workflow:
HTTP Request (download PDF)
→ PDF Utils (Split)
→ Loop Over Items
→ Process each page individuallyTechnical Details
Dependencies
pdfjs-dist(v5.4.394): For PDF analysis and text extraction (uses legacy build for Node.js)pdf-lib(v1.17.1): For PDF manipulation and splittingqpdf(system binary): Required only for the Decrypt operation
Why These Libraries?
- pdfjs-dist: Mozilla's PDF.js library - battle-tested, used in Firefox (headless mode, no canvas needed). We use the legacy build (
pdfjs-dist/legacy/build/pdf.mjs) which is specifically designed for Node.js environments without DOM dependencies. - pdf-lib: Pure JavaScript, no native dependencies, excellent for manipulation
- qpdf: The gold standard for PDF decryption — handles AES-128, AES-256, and RC4 encryption. Must be installed on the host system (not bundled in npm).
Performance
- Inspect: Very fast (~10-50ms for typical PDFs)
- Split: Fast, scales linearly with page count (~50-200ms per page)
- Decrypt: Depends on qpdf and PDF size (~100-500ms typical)
Development
# Install dependencies
npm install
# Build
npm run build
# Watch mode for development
npm run dev
# Lint
npm run lint
# Format code
npm run formatTroubleshooting
n8n doesn't detect the node
- Ensure n8n is restarted after installation
- Check that the node is in
~/.n8n/nodesor installed globally - Verify
package.jsonhas correctn8n.nodesconfiguration
"qpdf is not installed" error
Install qpdf on the host running n8n:
apt-get install qpdf # Linux / Docker
brew install qpdf # macOSIf running n8n in Docker, add it to your Dockerfile:
RUN apt-get update && apt-get install -y qpdf && rm -rf /var/lib/apt/lists/*"pdfjs-dist" errors
If you encounter issues with pdfjs-dist, ensure you're using Node.js 16 or higher:
node --version # Should be v16.0.0 or higherLicense
MIT
Author
Roberto Michelena - INFINITEK S.A.C.
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Roadmap
- [x] Decrypt password-protected PDFs
- [ ] Add merge operation
- [ ] Add extract pages by range
- [ ] Add rotate pages operation
- [ ] Add compress PDF operation
- [ ] Add watermark operation
