npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-docx-converter-enhanced

v1.0.0

Published

Enhanced n8n community node for DOCX to text conversion with RAG capabilities, page-aware chunking, and metadata extraction. Fork of n8n-nodes-docx-converter with advanced features for AI/ML workflows.

Readme

n8n-nodes-docx-converter-enhanced

🚀 Enhanced fork of n8n-nodes-docx-converter with advanced RAG capabilities!

This is an enhanced n8n community node that provides powerful DOCX to text conversion with RAG (Retrieval-Augmented Generation) capabilities, page-aware chunking, and comprehensive metadata extraction for AI/ML workflows.

✨ New Features (Enhanced Version)

  • 📄 Page-Aware Chunking: Intelligent text chunking that preserves page boundaries
  • 🧠 RAG-Ready Output: Optimized for AI/ML and RAG systems
  • 📊 Metadata Extraction: Document properties, word count, estimated pages
  • 🏗️ Structure Analysis: Heading detection and document structure mapping
  • 🔄 Multiple Output Modes: Legacy text-only, enhanced metadata, or RAG chunks
  • Backward Compatible: Works with existing workflows

n8n is a fair-code licensed workflow automation platform.

📋 Table of Contents

Installation
Operations
Enhanced Features
Credentials
Compatibility
Usage
Attribution
Resources
Version History

Installation

Follow the installation guide in the n8n community nodes documentation.

Operations

DOCX to Text (Legacy)

  • Convert DOCX file to plain text (backward compatible)

DOCX to Text Enhanced

  • Convert DOCX with metadata extraction
  • Page-aware chunking for RAG systems
  • Document structure analysis
  • Multiple output formats

Enhanced Features

🎯 Output Modes

  1. Text Only (Legacy): Simple text extraction for backward compatibility
  2. Enhanced with Metadata: Text + document metadata + structure analysis
  3. RAG-Ready Chunks: Page-aware chunks optimized for AI/ML workflows

📊 Metadata Extraction

  • Document title, author, creation/modification dates
  • Word count and estimated page count
  • Subject and description fields

🧩 Page-Aware Chunking

  • Configurable chunk size (words)
  • Overlapping chunks for context preservation
  • Page boundary preservation
  • Section and heading awareness

🏗️ Structure Analysis

  • Heading detection and hierarchy
  • Section counting
  • Document outline extraction

Credentials

No credentials are required for this node.

Compatibility

This node requires n8n version 1.0.0 or higher. It has been tested with the latest version of n8n.

Usage

Basic Usage (Legacy Mode)

  1. Add the "DOCX to Text" or "DOCX to Text Enhanced" node to your workflow
  2. Configure the input binary field containing your DOCX file
  3. Choose "Text Only (Legacy)" output mode for simple text extraction

Enhanced Usage (RAG Mode)

  1. Add the "DOCX to Text Enhanced" node
  2. Set output mode to "RAG-Ready Chunks"
  3. Configure chunk size (default: 300 words)
  4. Set chunk overlap (default: 50 words)
  5. Enable HTML conversion for better structure preservation

Output Examples

Enhanced Mode Output:

{
  "text": "Full document text...",
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "wordCount": 1250,
    "pageCount": 5
  },
  "structure": {
    "headings": ["Introduction", "Methods", "Results"],
    "sections": 3,
    "estimatedPages": 5
  }
}

RAG Chunks Output:

{
  "chunks": [
    {
      "content": "Chunk text content...",
      "pageStart": 1,
      "pageEnd": 1,
      "section": "Introduction",
      "chunkIndex": 0,
      "position": { "start": 0, "end": 300 }
    }
  ],
  "metadata": { ... },
  "totalChunks": 15
}

Attribution

🙏 This project is a fork of n8n-nodes-docx-converter by Blake Martin.

Original Repository: https://github.com/cre8tiv/n8n-docx-converter
Original Author: Blake Martin ([email protected])
License: MIT

We extend our gratitude to the original author for creating the foundation that made these enhancements possible.

Resources

Version History

1.0.0 (Enhanced Fork)

  • 🚀 Major Enhancement Release
  • ✨ Added RAG-ready chunking with page awareness
  • 📊 Comprehensive metadata extraction
  • 🏗️ Document structure analysis
  • 🔄 Multiple output modes (legacy, enhanced, RAG chunks)
  • 📄 Page boundary preservation in chunks
  • 🧠 Optimized for AI/ML workflows
  • ⚡ Maintained backward compatibility
  • 🛠️ Added new dependencies: jszip, cheerio
  • 📝 Enhanced documentation and examples

0.1.3 (Original)

  • Use input and output destinations

0.1.0 (Original)

  • Initial release by Blake Martin