n8n-nodes-vector-store-processor

v1.8.15

Published

8 months ago

n8n node for intelligent document chunking and processing for vector store ingestion with Smart Qdrant Vector Store supporting Ollama and OpenAI embeddings

0High
0Medium
0Low

rdawgemfl

n8n-community-node-package n8n vector-store vector embeddings chunking document-processing markdown rag ai

n8n-nodes-vector-store-processor

This is an n8n community node that intelligently processes and chunks documents for vector store ingestion with enhanced structure analysis and markdown support. Perfect for RAG (Retrieval-Augmented Generation) workflows and AI applications.

n8n is a fair-code licensed workflow automation platform.

Features

Intelligent Document Chunking: Splits documents into semantically meaningful chunks optimized for vector embeddings
Markdown Support: Parses markdown headings, lists, and structure for better organization
Structure Analysis: Automatically detects chapters, sections, and content hierarchy
Flexible Processing Modes:
- Run once for all items (combine multiple documents into one knowledge base)
- Run once for each item (process documents separately)
Rich Metadata: Includes document title, chapter, section, content type, chunk indices, and more
ASCII Sanitization: Ensures namespace compatibility with all vector stores
Binary File Support: Process text from binary files (.txt, .md, .pdf) or text fields
Configurable Chunk Size: Control the maximum size of text chunks for optimal embedding
Global Chunk Indexing: Maintains sequential chunk numbering across entire documents
Content Type Classification: Automatically categorizes content (examples, basics, advanced, etc.)

Installation

Follow the installation guide in the n8n community nodes documentation.

Community Nodes (Recommended)

Go to Settings > Community Nodes
Select Install
Enter n8n-nodes-vector-store-processor in Enter npm package name
Agree to the risks and select Install

Manual Installation

To install manually, navigate to your n8n installation directory and run:

npm install n8n-nodes-vector-store-processor

⚠️ Memory Management Requirements

IMPORTANT: For optimal memory management when using the Smart Qdrant Vector Store node with large documents, you must start n8n with the --expose-gc flag to enable garbage collection:

# For systemd service (recommended)
sudo systemctl edit n8n
# Add this line under [Service]:
Environment="NODE_OPTIONS=--expose-gc"

# Or start n8n directly with:
NODE_OPTIONS="--expose-gc" n8n start

# Or for Docker:
docker run -e NODE_OPTIONS="--expose-gc" n8nio/n8n

Why is this needed?

The Smart Qdrant Vector Store processes documents in batches and triggers garbage collection after each batch
This prevents memory buildup when processing large documents or many documents
Without --expose-gc, memory will still be managed by Node.js but less efficiently
The "Clear Memory" option in the node will work best with this flag enabled

Operations

The Vector Store Processor node provides the following configuration options:

Mode

Run Once for All Items: Combines all input items into a single document before processing
Run Once for Each Item: Processes each input item as a separate document

Input Type

Text Field: Process text from a JSON field
Binary File: Process text from a binary file (supports .txt, .md, .pdf text extraction, etc.)

Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | Mode | Options | Run Once for Each Item | Processing mode | | Input Type | Options | Text Field | Source of text data | | Text Field | String | text | Name of the field containing text (when Input Type is Text Field) | | Binary Property | String | data | Name of the binary property (when Input Type is Binary File) | | Document Title | String | (auto-detect) | Override the document title | | Chunk Size | Number | 2000 | Maximum characters per chunk | | Namespace | String | (auto-generate) | Namespace for vector store organization | | Parse Markdown | Boolean | true | Enable markdown structure parsing |

Usage

Basic Example: Process a Single Document for Vector Store

1. Add a "Read Binary File" node or "HTTP Request" node to get your document
2. Add the "Vector Store Processor" node
3. Configure:
   - Mode: Run Once for Each Item
   - Input Type: Binary File (or Text Field if you have text)
   - Parse Markdown: true
   - Chunk Size: 2000
4. Connect to a Vector Store node (Pinecone, Qdrant, Supabase Vector, etc.)
5. Connect to an embeddings node (OpenAI Embeddings, etc.)

Example: Combine Multiple Documents into One Knowledge Base

1. Add a node that outputs multiple items (e.g., "Read Files From Folder")
2. Add the "Vector Store Processor" node
3. Configure:
   - Mode: Run Once for All Items
   - Input Type: Binary File
   - Chunk Size: 2000
4. All documents will be combined and chunked together as one knowledge base
5. Connect to your vector store for ingestion

Example Output

Each chunk produces an output item with:

{
  "pageContent": "This is the actual text content of the chunk...",
  "metadata": {
    "document_title": "My Document",
    "chapter": "Introduction",
    "section": "Getting Started",
    "content_type": "overview",
    "chunk_index": 0,
    "local_chunk_index": 0,
    "chapter_index": 0,
    "total_chunks": 15,
    "namespace": "my-document",
    "source_file": "document.md",
    "character_count": 1850,
    "processing_timestamp": "2025-01-15T10:30:00.000Z"
  },
  "document_title": "My Document",
  "document_title_clean": "my-document",
  "chapter": "Introduction",
  "section": "Getting Started",
  "chapter_clean": "introduction",
  "section_clean": "getting-started",
  "namespace": "my-document"
}

Markdown Support

When Parse Markdown is enabled, the node recognizes:

Headings: #, ##, ###, etc. for chapter and section detection
Structure: Automatically organizes content by heading hierarchy
Lists: Preserves list formatting in chunks
Code Blocks: Keeps code blocks intact when possible

How It Works

Title Extraction: Automatically detects document title from:
- Metadata fields (title, info.Title, metadata['dc:title'])
- File name
- First heading in markdown
- First meaningful line of text
Structure Analysis:
- Detects chapters (H1, H2 headings or specific patterns)
- Identifies sections (H3-H6 headings or subsection patterns)
- Classifies content type (examples, basics, advanced, etc.)
Intelligent Chunking:
- Splits by paragraphs first
- Falls back to sentence splitting for long paragraphs
- Respects chunk size limits
- Filters out very short chunks
Metadata Enrichment:
- Global chunk indexing across entire document
- Local chunk indexing within sections
- Content type classification
- Timestamp and source tracking

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

n8n-nodes-vector-store-processor

Features

Installation

Community Nodes (Recommended)

Manual Installation

⚠️ Memory Management Requirements

Operations

Mode

Input Type

Parameters

Usage

Basic Example: Process a Single Document for Vector Store

Example: Combine Multiple Documents into One Knowledge Base

Example Output

Markdown Support

How It Works

Compatibility

Resources

License