n8n-nodes-vector-store-processor
v1.8.15
Published
n8n node for intelligent document chunking and processing for vector store ingestion with Smart Qdrant Vector Store supporting Ollama and OpenAI embeddings
Maintainers
Readme
n8n-nodes-vector-store-processor
This is an n8n community node that intelligently processes and chunks documents for vector store ingestion with enhanced structure analysis and markdown support. Perfect for RAG (Retrieval-Augmented Generation) workflows and AI applications.
n8n is a fair-code licensed workflow automation platform.
Features
- Intelligent Document Chunking: Splits documents into semantically meaningful chunks optimized for vector embeddings
- Markdown Support: Parses markdown headings, lists, and structure for better organization
- Structure Analysis: Automatically detects chapters, sections, and content hierarchy
- Flexible Processing Modes:
- Run once for all items (combine multiple documents into one knowledge base)
- Run once for each item (process documents separately)
- Rich Metadata: Includes document title, chapter, section, content type, chunk indices, and more
- ASCII Sanitization: Ensures namespace compatibility with all vector stores
- Binary File Support: Process text from binary files (.txt, .md, .pdf) or text fields
- Configurable Chunk Size: Control the maximum size of text chunks for optimal embedding
- Global Chunk Indexing: Maintains sequential chunk numbering across entire documents
- Content Type Classification: Automatically categorizes content (examples, basics, advanced, etc.)
Installation
Follow the installation guide in the n8n community nodes documentation.
Community Nodes (Recommended)
- Go to Settings > Community Nodes
- Select Install
- Enter
n8n-nodes-vector-store-processorin Enter npm package name - Agree to the risks and select Install
Manual Installation
To install manually, navigate to your n8n installation directory and run:
npm install n8n-nodes-vector-store-processor⚠️ Memory Management Requirements
IMPORTANT: For optimal memory management when using the Smart Qdrant Vector Store node with large documents, you must start n8n with the --expose-gc flag to enable garbage collection:
# For systemd service (recommended)
sudo systemctl edit n8n
# Add this line under [Service]:
Environment="NODE_OPTIONS=--expose-gc"
# Or start n8n directly with:
NODE_OPTIONS="--expose-gc" n8n start
# Or for Docker:
docker run -e NODE_OPTIONS="--expose-gc" n8nio/n8nWhy is this needed?
- The Smart Qdrant Vector Store processes documents in batches and triggers garbage collection after each batch
- This prevents memory buildup when processing large documents or many documents
- Without
--expose-gc, memory will still be managed by Node.js but less efficiently - The "Clear Memory" option in the node will work best with this flag enabled
Operations
The Vector Store Processor node provides the following configuration options:
Mode
- Run Once for All Items: Combines all input items into a single document before processing
- Run Once for Each Item: Processes each input item as a separate document
Input Type
- Text Field: Process text from a JSON field
- Binary File: Process text from a binary file (supports .txt, .md, .pdf text extraction, etc.)
Parameters
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | Mode | Options | Run Once for Each Item | Processing mode | | Input Type | Options | Text Field | Source of text data | | Text Field | String | text | Name of the field containing text (when Input Type is Text Field) | | Binary Property | String | data | Name of the binary property (when Input Type is Binary File) | | Document Title | String | (auto-detect) | Override the document title | | Chunk Size | Number | 2000 | Maximum characters per chunk | | Namespace | String | (auto-generate) | Namespace for vector store organization | | Parse Markdown | Boolean | true | Enable markdown structure parsing |
Usage
Basic Example: Process a Single Document for Vector Store
1. Add a "Read Binary File" node or "HTTP Request" node to get your document
2. Add the "Vector Store Processor" node
3. Configure:
- Mode: Run Once for Each Item
- Input Type: Binary File (or Text Field if you have text)
- Parse Markdown: true
- Chunk Size: 2000
4. Connect to a Vector Store node (Pinecone, Qdrant, Supabase Vector, etc.)
5. Connect to an embeddings node (OpenAI Embeddings, etc.)Example: Combine Multiple Documents into One Knowledge Base
1. Add a node that outputs multiple items (e.g., "Read Files From Folder")
2. Add the "Vector Store Processor" node
3. Configure:
- Mode: Run Once for All Items
- Input Type: Binary File
- Chunk Size: 2000
4. All documents will be combined and chunked together as one knowledge base
5. Connect to your vector store for ingestionExample Output
Each chunk produces an output item with:
{
"pageContent": "This is the actual text content of the chunk...",
"metadata": {
"document_title": "My Document",
"chapter": "Introduction",
"section": "Getting Started",
"content_type": "overview",
"chunk_index": 0,
"local_chunk_index": 0,
"chapter_index": 0,
"total_chunks": 15,
"namespace": "my-document",
"source_file": "document.md",
"character_count": 1850,
"processing_timestamp": "2025-01-15T10:30:00.000Z"
},
"document_title": "My Document",
"document_title_clean": "my-document",
"chapter": "Introduction",
"section": "Getting Started",
"chapter_clean": "introduction",
"section_clean": "getting-started",
"namespace": "my-document"
}Markdown Support
When Parse Markdown is enabled, the node recognizes:
- Headings:
#,##,###, etc. for chapter and section detection - Structure: Automatically organizes content by heading hierarchy
- Lists: Preserves list formatting in chunks
- Code Blocks: Keeps code blocks intact when possible
How It Works
Title Extraction: Automatically detects document title from:
- Metadata fields (title, info.Title, metadata['dc:title'])
- File name
- First heading in markdown
- First meaningful line of text
Structure Analysis:
- Detects chapters (H1, H2 headings or specific patterns)
- Identifies sections (H3-H6 headings or subsection patterns)
- Classifies content type (examples, basics, advanced, etc.)
Intelligent Chunking:
- Splits by paragraphs first
- Falls back to sentence splitting for long paragraphs
- Respects chunk size limits
- Filters out very short chunks
Metadata Enrichment:
- Global chunk indexing across entire document
- Local chunk indexing within sections
- Content type classification
- Timestamp and source tracking
Compatibility
- Tested with n8n version 1.0.0+
- Works with all vector store nodes (Pinecone, Qdrant, Supabase, etc.)
- Compatible with LangChain nodes
