n8n-nodes-liteparse-local
v0.3.2
Published
Full-featured document parsing node for n8n — PDF, DOCX, XLSX, PPTX, images, OCR, markdown output. No native dependencies.
Maintainers
Readme
n8n-nodes-liteparse-local
Full-featured document parsing node for n8n. No native dependencies — works in any Docker container.
Features
Operations
| Operation | Description | |-----------|-------------| | Parse Document | Extract text, tables, and structure from documents | | OCR Image | Extract text from images using Tesseract OCR | | Extract Tables | Extract tables as structured JSON | | Convert to Markdown | Convert document to clean markdown | | Merge PDFs | Combine multiple PDF files into one | | Split PDF | Split PDF into separate pages | | Extract Entities | Extract emails, phones, URLs, dates from text |
Supported Formats
| Format | Extensions | Parse | Tables | Markdown |
|--------|-----------|-------|--------|----------|
| PDF | .pdf | ✅ | ✅ | ✅ |
| Word | .docx, .doc | ✅ | ❌ | ✅ |
| Excel | .xlsx, .xls | ✅ | ✅ | ❌ |
| PowerPoint | .pptx, .ppt | ✅ | ❌ | ❌ |
| HTML | .html, .htm | ✅ | ❌ | ✅ |
| CSV | .csv | ✅ | ❌ | ❌ |
| Images | .png, .jpg, .tiff, .bmp, .webp | ✅ OCR | ❌ | ❌ |
| Text | .txt, .md, .json, .xml | ✅ | ❌ | ❌ |
OCR Languages
English, Arabic, Chinese (Simplified/Traditional), French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, Urdu
Entity Extraction
- Email addresses
- Phone numbers
- URLs
- Dates
- IP addresses
- Currency amounts
Installation
In n8n, go to Settings → Community Nodes and install:
n8n-nodes-liteparse-localUsage Examples
Parse PDF to Markdown
- Read Binary File → point to PDF
- DocuParse → Operation: Parse Document, Output Format: Markdown
- Output:
json.textcontains the markdown
OCR a Scanned Document
- Read Binary File → point to image/PDF
- DocuParse → Operation: OCR Image, Language: English
- Output:
json.textcontains extracted text
Extract Tables from Excel
- Read Binary File → point to XLSX
- DocuParse → Operation: Extract Tables
- Output:
json.tablescontains array of tables
Merge Multiple PDFs
- Read Binary File → first PDF (field:
data) - Read Binary File → second PDF (field:
data1) - DocuParse → Operation: Merge PDFs, Additional Fields:
data1 - Output: merged PDF in
binary.merged_pdf
Dependencies (All Pure JavaScript)
pdfjs-dist— Mozilla's PDF.js for PDF parsingtesseract.js— OCR via WebAssemblymammoth— DOCX parsingxlsx— Excel parsingcheerio— HTML parsingcsv-parse— CSV parsingpdf-lib— PDF manipulation (merge/split)
License
MIT
