n8n-nodes-liteparse-local

v0.3.2

Published

6 days ago

Full-featured document parsing node for n8n — PDF, DOCX, XLSX, PPTX, images, OCR, markdown output. No native dependencies.

0High
0Medium
0Low

raevon

n8n-community-node-package n8n pdf parser ocr document-parsing markdown text-extraction docx xlsx

n8n-nodes-liteparse-local

Full-featured document parsing node for n8n. No native dependencies — works in any Docker container.

Features

Operations

| Operation | Description | |-----------|-------------| | Parse Document | Extract text, tables, and structure from documents | | OCR Image | Extract text from images using Tesseract OCR | | Extract Tables | Extract tables as structured JSON | | Convert to Markdown | Convert document to clean markdown | | Merge PDFs | Combine multiple PDF files into one | | Split PDF | Split PDF into separate pages | | Extract Entities | Extract emails, phones, URLs, dates from text |

Supported Formats

| Format | Extensions | Parse | Tables | Markdown | |--------|-----------|-------|--------|----------| | PDF | .pdf | ✅ | ✅ | ✅ | | Word | .docx, .doc | ✅ | ❌ | ✅ | | Excel | .xlsx, .xls | ✅ | ✅ | ❌ | | PowerPoint | .pptx, .ppt | ✅ | ❌ | ❌ | | HTML | .html, .htm | ✅ | ❌ | ✅ | | CSV | .csv | ✅ | ❌ | ❌ | | Images | .png, .jpg, .tiff, .bmp, .webp | ✅ OCR | ❌ | ❌ | | Text | .txt, .md, .json, .xml | ✅ | ❌ | ❌ |

OCR Languages

English, Arabic, Chinese (Simplified/Traditional), French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, Urdu

Entity Extraction

Email addresses
Phone numbers
URLs
Dates
IP addresses
Currency amounts

Installation

In n8n, go to Settings → Community Nodes and install:

n8n-nodes-liteparse-local

Usage Examples

Parse PDF to Markdown

Read Binary File → point to PDF
DocuParse → Operation: Parse Document, Output Format: Markdown
Output: json.text contains the markdown

OCR a Scanned Document

Read Binary File → point to image/PDF
DocuParse → Operation: OCR Image, Language: English
Output: json.text contains extracted text

Extract Tables from Excel

Read Binary File → point to XLSX
DocuParse → Operation: Extract Tables
Output: json.tables contains array of tables

Merge Multiple PDFs

Read Binary File → first PDF (field: data)
Read Binary File → second PDF (field: data1)
DocuParse → Operation: Merge PDFs, Additional Fields: data1
Output: merged PDF in binary.merged_pdf

Dependencies (All Pure JavaScript)

pdfjs-dist — Mozilla's PDF.js for PDF parsing
tesseract.js — OCR via WebAssembly
mammoth — DOCX parsing
xlsx — Excel parsing
cheerio — HTML parsing
csv-parse — CSV parsing
pdf-lib — PDF manipulation (merge/split)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

n8n-nodes-liteparse-local

Features

Operations

Supported Formats

OCR Languages

Entity Extraction

Installation

Usage Examples

Parse PDF to Markdown

OCR a Scanned Document

Extract Tables from Excel

Merge Multiple PDFs

Dependencies (All Pure JavaScript)

License