npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-doc4ai

v1.0.4

Published

n8n community node for Doc4AI: Extract text from PDF, DOC, DOCX, XLS, XLSX, CSV, and convert text back to documents.

Downloads

494

Readme

n8n-nodes-doc4ai

Developed and maintained by Jay Nguyen (Nguyễn Thiệu Toàn).

🛡️ Verified n8n Creator | 💼 CEO/Founder of GenStaff

Connect with me:
LinkedIn Facebook Website Email


Doc4AI is a high-performance n8n community node designed to convert diverse document formats into clean, structured Markdown, HTML, or Plain Text optimized for LLMs, RAG pipelines, and AI agents. It also allows you to generate standard files (XLSX, CSV, HTML, TXT) back from text or JSON data.

Features

1. Extract Text from File (Binary to Text)

Extract text from various document formats and convert them into one of the three desired formats: Markdown, HTML, or Plain Text.

  • Supported Formats: PDF (all versions), DOC, DOCX, XLS, XLSX, CSV, ODS.
  • Markdown Preservation: Converts Word tables, headings, bold/italic structures, and Excel sheets into GitHub Flavored Markdown (GFM) tables and elements, perfect for LLM ingestion.
  • Advanced Features:
    • PDF Page Range: Extract only specific pages (e.g., 1-3, 5, 8-10).
    • Excel Sheet Selection: Parse specific sheets or combine all sheets.
    • Header Control: Use the first row as columns or autogenerate Excel letter columns (A, B, C...).
    • File Size Protection: Restrict processing using a maximum file size limit in MB (defaults to 30 MB, larger files will be rejected).
    • Metrics: Returns file byte size, raw extracted character count, and formatted/converted character count.

2. Convert Text to File (Text/JSON to Binary)

Generate binary files directly from text strings or JSON payloads.

  • Output Formats: CSV, Excel (XLSX), HTML, Plain Text (TXT).
  • Spreadsheet Generation: Auto-converts JSON arrays of objects or 2D arrays into sheet rows.
  • Customization: Configure custom Excel sheet names or CSV delimiters.

Why Doc4AI? (Anti-Collision Architecture)

Unlike standard document nodes (e.g., nodes relying on officeparser or unbundled loaders), Doc4AI completely prevents runtime version collision errors such as: PDF processing error: [OfficeParser]: The API version "5.6.205" does not match the Worker version "5.3.31".

This is achieved using a self-contained, compiled bundling system (using esbuild) that encapsulates all parsing engines (pdf-parse, mammoth, word-extractor, xlsx) in absolute isolation. This ensures zero dependency pollution and zero runtime collisions with the main n8n core packages.


Installation

Via n8n UI (Recommended)

  1. Go to Settings > Community nodes > Install.
  2. Enter the package name: n8n-nodes-doc4ai.
  3. Agree to the terms and click Install.
  4. Restart your n8n instance if self-hosting.

Via Command Line

Navigate to your n8n directory (usually ~/.n8n/) and run:

npm install n8n-nodes-doc4ai

Restart n8n to load the node.


License

This project is licensed under the MIT License.