npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-pdf-utils

v1.1.0

Published

Custom n8n node for PDF inspection and splitting using pure npm packages

Readme

n8n-nodes-pdf-utils

Custom n8n node for PDF inspection and splitting using pure npm packages.

Features

🔍 Inspect Operation

  • Analyzes PDF structure
  • Counts pages
  • Detects if PDF is vectorial (text-based) or rasterized (image-based)
  • Extracts text from first page
  • Performance: Very fast (tens of milliseconds)

✂️ Split Operation

  • Splits multi-page PDFs into individual pages
  • Creates one output item per page
  • Preserves PDF quality and structure

Installation

Option 1: Install from npm (when published)

npm install n8n-nodes-pdf-utils

Option 2: Install locally for development

  1. Clone this repository
  2. Install dependencies:
    npm install
  3. Build the node:
    npm run build
  4. Link to your n8n installation:
    npm link
    cd ~/.n8n/nodes
    npm link n8n-nodes-pdf-utils
  5. Restart n8n

Option 3: Install in n8n using community nodes

  1. Go to Settings > Community Nodes
  2. Click Install
  3. Enter: n8n-nodes-pdf-utils
  4. Click Install

Usage

Inspect Operation

Input: Binary data containing a PDF file

Parameters:

  • Binary Property: Name of the binary property (default: "data")
  • Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)

Output: Single item with analysis + original PDF binary

{
  "json": {
    "pageCount": 5,
    "isMultiPage": true,
    "isVectorial": false,
    "textLength": 23,
    "firstPageText": "Preview of first 200 characters..."
  },
  "binary": {
    "data": "<original PDF>"
  }
}

Example workflow:

HTTP Request (download PDF)
  → PDF Utils (Inspect)
    → IF (isVectorial)
      → Route A (text processing with PDF)
      → Route B (OCR processing with PDF)

Inspect and Split Operation

Input: Binary data containing a PDF file

Parameters:

  • Binary Property: Name of the binary property (default: "data")
  • Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)
  • Output Binary Property: Name for output binary property (default: "data")

Output:

  • If vectorial: Single item with analysis + original PDF (pass-through)
  • If not vectorial: Multiple items, one per page (split)

Example workflow:

HTTP Request (download PDF)
  → PDF Utils (Inspect and Split)
    → Vectorial PDFs pass through as-is
    → Scanned PDFs split into pages automatically

Use case: Automatically handle different PDF types without manual branching:

  • Text-based PDFs (vectorial) → process as whole document
  • Scanned PDFs (non-vectorial) → OCR each page individually

Split Operation

Input: Binary data containing a multi-page PDF

Parameters:

  • Binary Property: Name of the input binary property (default: "data")
  • Output Binary Property: Name for output binary property (default: "data")

Output: Multiple items, one per page

  • Each item contains binary data with a single-page PDF
  • JSON includes pageNumber and originalFileName

Example workflow:

HTTP Request (download PDF)
  → PDF Utils (Split)
    → Loop Over Items
      → Process each page individually

Technical Details

Dependencies

  • pdfjs-dist (v5.4.394): For PDF analysis and text extraction (uses legacy build for Node.js)
  • pdf-lib (v1.17.1): For PDF manipulation and splitting

Why These Libraries?

  1. pdfjs-dist: Mozilla's PDF.js library - battle-tested, used in Firefox (headless mode, no canvas needed). We use the legacy build (pdfjs-dist/legacy/build/pdf.mjs) which is specifically designed for Node.js environments without DOM dependencies.
  2. pdf-lib: Pure JavaScript, no native dependencies, excellent for manipulation
  3. 100% npm packages: No system-level dependencies (like Poppler, Ghostscript) and no canvas/native modules!

Performance

  • Inspect: Very fast (~10-50ms for typical PDFs)
  • Split: Fast, scales linearly with page count (~50-200ms per page)

Development

# Install dependencies
npm install

# Build
npm run build

# Watch mode for development
npm run dev

# Lint
npm run lint

# Format code
npm run format

Troubleshooting

n8n doesn't detect the node

  1. Ensure n8n is restarted after installation
  2. Check that the node is in ~/.n8n/nodes or installed globally
  3. Verify package.json has correct n8n.nodes configuration

"pdfjs-dist" errors

If you encounter issues with pdfjs-dist, ensure you're using Node.js 16 or higher:

node --version  # Should be v16.0.0 or higher

License

MIT

Author

Roberto Michelena - INFINITEK S.A.C.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Roadmap

  • [ ] Add merge operation
  • [ ] Add extract pages by range
  • [ ] Add rotate pages operation
  • [ ] Add compress PDF operation
  • [ ] Add watermark operation