npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-power-document-extractor

v0.13.7

Published

Power Document Extractor – universal local document parser for n8n

Readme

Power Document Extractor for n8n

n8n npm version License: Proprietary

📖 Overview

Power Document Extractor is a comprehensive n8n community node for extracting structured content from 17+ document formats, entirely locally on your server. No external APIs, no cloud services—just reliable, privacy-focused document parsing.

Perfect for document processing workflows, content analysis, data migration, AI-powered document understanding, and automated information extraction pipelines.

☕ Support Development

If you find this node useful, consider supporting its development:

Donation Links:


✨ Features

  • 🔒 100% Local Processing - All document parsing happens on your server
  • 📄 17+ Supported Formats - PDF, DOCX, DOC, XLSX, XLS, CSV, TXT, RTF, EPUB, FB2, Markdown, HTML, XML, PPT, PPTX, ODS, ODG
  • 🎯 Auto-Detection - Universal Extractor automatically identifies document format
  • 🧩 Structured Output - Extracts content as structured blocks (paragraphs, headings, tables, lists)
  • 🎚️ Flexible Detail Levels - Choose between Raw, Basic, or Full structured output
  • 📊 Rich Metadata - Extracts document metadata (author, title, page count, dates, etc.)
  • LibreOffice Integration - Optional LibreOffice server for legacy formats (DOC, RTF, PPT)

📦 Installation

Via n8n Community Nodes

  1. Go to Settings > Community Nodes in your n8n instance
  2. Click Install and enter: n8n-nodes-power-document-extractor
  3. Click Install

Via npm

npm install n8n-nodes-power-document-extractor

Manual Installation

cd ~/.n8n/nodes
git clone https://github.com/ZBlaZe/n8n-nodes-power-document-extractor.git
cd n8n-nodes-power-document-extractor
npm install
npm run build

🚀 Supported Formats

| Format | Extension | Native Support | LibreOffice Required | Status | |--------|-----------|----------------|---------------------|---------| | PDF | .pdf | ✅ Yes | ❌ No | ⚠️ Beta | | Plain Text | .txt | ✅ Yes | ❌ No | ✅ Stable | | CSV | .csv | ✅ Yes | ❌ No | ✅ Stable | | Markdown | .md | ✅ Yes | ❌ No | ✅ Stable | | HTML | .html, .htm | ✅ Yes | ❌ No | ✅ Stable | | XML | .xml | ✅ Yes | ❌ No | ✅ Stable | | Excel | .xlsx, .xls | ✅ Yes | ❌ No | ✅ Stable | | Word (Modern) | .docx | ⚠️ Partial | ✅ Yes (recommended) | ⚠️ Beta | | Word (Legacy) | .doc | ❌ No | ✅ Yes | ⚠️ Beta | | RTF | .rtf | ❌ No | ✅ Yes | ⚠️ Beta | | PowerPoint | .ppt, .pptx | ❌ No | ✅ Yes | ⚠️ Beta | | OpenDocument | .ods, .odg | ⚠️ Partial | ✅ Yes (recommended) | ⚠️ Beta | | EPUB | .epub | ✅ Yes | ❌ No | ⚠️ Beta | | FictionBook | .fb2 | ✅ Yes | ❌ No | ⚠️ Beta |

Legend:

  • ✅ Stable - Fully tested and production-ready
  • ⚠️ Beta - Functional but may have edge cases
  • 🚧 In Development - Work in progress

🎮 Usage

Basic Example

  1. Add Power Document Extractor node to your workflow
  2. Connect a node that provides binary file data (e.g., HTTP Request, Read Binary File)
  3. Configure the node:
    • Operation: Universal Extractor (auto-detects format)
    • Binary Property: data (or your binary property name)
    • Structured Level: Full (recommended)

Operations

Universal Extractor (Recommended)

Automatically detects document format and extracts content using the optimal parser.

Format-Specific Extractors

Available for all supported formats if you want explicit control:

  • Extract PDF
  • Extract DOCX
  • Extract XLSX
  • Extract TXT
  • Extract CSV
  • Extract Markdown
  • Extract HTML
  • ... and more

Structured Levels

Choose the level of detail in extracted content:

  • Raw - Single text string with minimal formatting
  • Basic - Paragraphs and basic structure
  • Full - Complete structure with headings, tables, lists, metadata (recommended)

📤 Output Format

Example Output

{
  "blocks": [
    {
      "type": "heading",
      "level": 1,
      "text": "Annual Report 2024",
      "id": "h-1",
      "page": 1
    },
    {
      "type": "paragraph",
      "text": "This report provides an overview of our company's performance...",
      "id": "p-1",
      "page": 1
    },
    {
      "type": "table",
      "headers": ["Quarter", "Revenue", "Growth"],
      "rows": [
        ["Q1", "$1.2M", "15%"],
        ["Q2", "$1.5M", "25%"],
        ["Q3", "$1.8M", "20%"],
        ["Q4", "$2.1M", "17%"]
      ],
      "id": "table-1",
      "page": 2
    }
  ],
  "metadata": {
    "fileName": "annual_report_2024.pdf",
    "fileSize": 245680,
    "fileType": "pdf",
    "mimeType": "application/pdf",
    "pageCount": 12,
    "author": "John Smith",
    "title": "Annual Report 2024",
    "creationDate": "2024-01-15",
    "modificationDate": "2024-11-20"
  }
}

Block Types

  • paragraph - Text paragraphs
  • heading - Document headings (with level 1-6)
  • table - Tables with headers and rows
  • list - Bulleted or numbered lists
  • image - Image references (planned for future versions)

🐳 LibreOffice Server Setup (Optional)

For best results with DOC, RTF, PPT, PPTX formats, set up a LibreOffice server using Docker.

Quick Start with Docker

docker run -d \
  --name libreoffice-server \
  -p 33101:2004 \
  ghcr.io/unoconv/unoserver-docker:latest

Configuration in n8n

In the Power Document Extractor node:

  • LibreOffice Server URL: http://localhost:33101 (or your server IP)

Important Notes

  • ⚠️ Do not expose LibreOffice port to the internet - it's not secured by default
  • 🔒 Use firewall rules or Docker networks to restrict access
  • 🚀 LibreOffice container should run on the same network as n8n for best performance

For detailed setup instructions, see unoserver documentation.


⚙️ Configuration

Node Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | Operation | Select | Universal Extractor | Extraction method (auto or format-specific) | | Binary Property | String | data | Name of the binary property containing the file | | Structured Level | Select | Full | Level of detail in output (Raw/Basic/Full) | | LibreOffice Server URL | String | (empty) | Optional LibreOffice server URL for legacy formats |


🎯 Use Cases

  • 📊 Data Migration - Extract content from legacy documents for database import
  • 🤖 AI/LLM Integration - Prepare document content for AI analysis and processing
  • 🔍 Document Indexing - Build searchable document databases
  • 📝 Content Management - Automated document processing workflows
  • 📧 Email Attachment Processing - Extract and analyze attachments automatically
  • 🗄️ Archive Digitization - Convert old documents to structured data

⚠️ Known Limitations

  • 🖼️ Image Extraction - Not yet supported (planned for v0.10.0)
  • 🔤 Text Encoding - Some legacy documents may have encoding issues
  • 📄 Complex Layouts - Advanced page layouts may not be fully preserved
  • ⏱️ Large Files - Very large files (>100MB) may take longer to process

Development Status: This node is actively maintained and under continuous improvement. Bug reports and feature requests are welcome!


🗺️ Roadmap

Version 0.11.0-0.12.0 (Planned)

  • 🖼️ Base64 image extraction from documents
  • 🔤 Improved text encoding detection and handling
  • 🎨 Better formatting preservation for complex documents

Future Versions

  • 📊 Advanced table structure detection
  • 🔗 Hyperlink extraction
  • 📝 Document annotations and comments
  • 🌍 Multi-language OCR support
  • ⚡ Performance optimizations for large files

🐛 Troubleshooting

Common Issues

"LibreOffice conversion failed"

Solution:

  1. Ensure LibreOffice server is running: docker ps | grep libreoffice
  2. Check URL is correct: http://localhost:33101 (not https://)
  3. Verify server is accessible from n8n container

Text appears garbled or with wrong characters

Possible causes:

  • Legacy encoding in old documents
  • Font substitution issues
  • Temporary workaround: Try using LibreOffice server for conversion

Node execution timeout

For large files:

  • Increase n8n execution timeout in settings
  • Consider splitting large documents
  • Use LibreOffice server for faster processing

Empty output for supported format

Check:

  • File is not password-protected
  • File is not corrupted
  • File contains actual text (not just images)

🤝 Contributing

Contributions are welcome! If you find a bug or have a feature request:

  1. Check existing issues
  2. Create a new issue with detailed description
  3. Submit a pull request with your improvements

📜 License

Proprietary / All Rights Reserved
This project is closed-source.
The source code is not available for public viewing or modification.


🙏 Acknowledgments

Built with:


📞 Support


Made with ❤️ for the n8n community

n8n Community