n8n-nodes-pdf-utils
v1.1.0
Published
Custom n8n node for PDF inspection and splitting using pure npm packages
Maintainers
Readme
n8n-nodes-pdf-utils
Custom n8n node for PDF inspection and splitting using pure npm packages.
Features
🔍 Inspect Operation
- Analyzes PDF structure
- Counts pages
- Detects if PDF is vectorial (text-based) or rasterized (image-based)
- Extracts text from first page
- Performance: Very fast (tens of milliseconds)
✂️ Split Operation
- Splits multi-page PDFs into individual pages
- Creates one output item per page
- Preserves PDF quality and structure
Installation
Option 1: Install from npm (when published)
npm install n8n-nodes-pdf-utilsOption 2: Install locally for development
- Clone this repository
- Install dependencies:
npm install - Build the node:
npm run build - Link to your n8n installation:
npm link cd ~/.n8n/nodes npm link n8n-nodes-pdf-utils - Restart n8n
Option 3: Install in n8n using community nodes
- Go to Settings > Community Nodes
- Click Install
- Enter:
n8n-nodes-pdf-utils - Click Install
Usage
Inspect Operation
Input: Binary data containing a PDF file
Parameters:
Binary Property: Name of the binary property (default: "data")Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)
Output: Single item with analysis + original PDF binary
{
"json": {
"pageCount": 5,
"isMultiPage": true,
"isVectorial": false,
"textLength": 23,
"firstPageText": "Preview of first 200 characters..."
},
"binary": {
"data": "<original PDF>"
}
}Example workflow:
HTTP Request (download PDF)
→ PDF Utils (Inspect)
→ IF (isVectorial)
→ Route A (text processing with PDF)
→ Route B (OCR processing with PDF)Inspect and Split Operation
Input: Binary data containing a PDF file
Parameters:
Binary Property: Name of the binary property (default: "data")Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)Output Binary Property: Name for output binary property (default: "data")
Output:
- If vectorial: Single item with analysis + original PDF (pass-through)
- If not vectorial: Multiple items, one per page (split)
Example workflow:
HTTP Request (download PDF)
→ PDF Utils (Inspect and Split)
→ Vectorial PDFs pass through as-is
→ Scanned PDFs split into pages automaticallyUse case: Automatically handle different PDF types without manual branching:
- Text-based PDFs (vectorial) → process as whole document
- Scanned PDFs (non-vectorial) → OCR each page individually
Split Operation
Input: Binary data containing a multi-page PDF
Parameters:
Binary Property: Name of the input binary property (default: "data")Output Binary Property: Name for output binary property (default: "data")
Output: Multiple items, one per page
- Each item contains binary data with a single-page PDF
- JSON includes
pageNumberandoriginalFileName
Example workflow:
HTTP Request (download PDF)
→ PDF Utils (Split)
→ Loop Over Items
→ Process each page individuallyTechnical Details
Dependencies
pdfjs-dist(v5.4.394): For PDF analysis and text extraction (uses legacy build for Node.js)pdf-lib(v1.17.1): For PDF manipulation and splitting
Why These Libraries?
- pdfjs-dist: Mozilla's PDF.js library - battle-tested, used in Firefox (headless mode, no canvas needed). We use the legacy build (
pdfjs-dist/legacy/build/pdf.mjs) which is specifically designed for Node.js environments without DOM dependencies. - pdf-lib: Pure JavaScript, no native dependencies, excellent for manipulation
- 100% npm packages: No system-level dependencies (like Poppler, Ghostscript) and no canvas/native modules!
Performance
- Inspect: Very fast (~10-50ms for typical PDFs)
- Split: Fast, scales linearly with page count (~50-200ms per page)
Development
# Install dependencies
npm install
# Build
npm run build
# Watch mode for development
npm run dev
# Lint
npm run lint
# Format code
npm run formatTroubleshooting
n8n doesn't detect the node
- Ensure n8n is restarted after installation
- Check that the node is in
~/.n8n/nodesor installed globally - Verify
package.jsonhas correctn8n.nodesconfiguration
"pdfjs-dist" errors
If you encounter issues with pdfjs-dist, ensure you're using Node.js 16 or higher:
node --version # Should be v16.0.0 or higherLicense
MIT
Author
Roberto Michelena - INFINITEK S.A.C.
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Roadmap
- [ ] Add merge operation
- [ ] Add extract pages by range
- [ ] Add rotate pages operation
- [ ] Add compress PDF operation
- [ ] Add watermark operation
