rag-system-pgvector
v2.4.9
Published
A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers, structured data queries, and chat history - supports OpenAI, Anthropic, HuggingFace, Azure, Goog
Maintainers
Keywords
Readme
RAG System Package
A production-ready Retrieval-Augmented Generation (RAG) system package built with PostgreSQL pgvector, LangChain, and LangGraph. Supports multiple AI providers including OpenAI, Anthropic, HuggingFace, Azure, Google AI, and local models.
🚀 Features
- 📦 Easy Integration: Simple npm install and ready-to-use API
- 🤖 Multi-Provider Support: OpenAI, Anthropic, HuggingFace, Azure, Google AI, Ollama
- 📚 Multi-format Support: PDF, DOCX, TXT, HTML, Markdown, JSON
- 🔍 Vector Search: High-performance similarity search with pgvector
- 🎯 Structured Data Queries: Accept JSON data for precise, contextual responses
- 💬 Chat History Support: Full conversation memory with summarization
- ⚡ Production Ready: Error handling, connection pooling, monitoring
- 🔧 Flexible Configuration: Choose your preferred embedding and LLM providers
- 💾 Buffer Processing: Process documents directly from memory buffers
- 🌐 URL Processing: Download and process documents from web URLs
- 📊 Batch Operations: Efficient processing of multiple documents
📦 Installation
npm install rag-system-pgvector
# Choose your AI provider (one or more):
npm install @langchain/openai # For OpenAI
npm install @langchain/anthropic # For Anthropic Claude
npm install @langchain/azure-openai # For Azure OpenAI
npm install @langchain/google-genai # For Google AI
npm install @langchain/community # For HuggingFace, Ollama, etc.🚀 Quick Start
OpenAI Provider (Traditional)
import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
// Create provider instances
const embeddings = new OpenAIEmbeddings({
openAIApiKey: 'your-openai-api-key',
modelName: 'text-embedding-ada-002',
});
const llm = new ChatOpenAI({
openAIApiKey: 'your-openai-api-key',
modelName: 'gpt-4',
temperature: 0.7,
});
// Initialize RAG system
const rag = new RAGSystem({
database: {
host: 'localhost',
database: 'your_db',
username: 'postgres',
password: 'your_password'
},
embeddings: embeddings,
llm: llm,
embeddingDimensions: 1536,
});
await rag.initialize();
// Add documents and query
await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']);
// Simple query
const result = await rag.query("What is the main topic?");
console.log(result.answer);
// Query with structured data for precise responses
const structuredResult = await rag.query("Tell me about iPhone features", {
structuredData: {
intent: "product_information",
entities: { product: "iPhone", category: "smartphone" },
constraints: ["Focus on latest features", "Include specifications"],
responseFormat: "structured_list"
}
});
console.log(structuredResult.answer);Mixed Providers (Advanced)
import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
// Use OpenAI for embeddings, Anthropic for chat
const embeddings = new OpenAIEmbeddings({
openAIApiKey: 'your-openai-api-key',
modelName: 'text-embedding-ada-002',
});
const llm = new ChatAnthropic({
anthropicApiKey: 'your-anthropic-api-key',
modelName: 'claude-3-haiku-20240307',
temperature: 0.7,
});
const rag = new RAGSystem({
database: { /* your config */ },
embeddings: embeddings,
llm: llm,
embeddingDimensions: 1536,
});Local Models (Privacy-First)
import { RAGSystem } from 'rag-system-pgvector';
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';
// Use local models (no API keys required)
const embeddings = new HuggingFaceTransformersEmbeddings({
modelName: 'sentence-transformers/all-MiniLM-L6-v2',
});
const llm = new Ollama({
baseUrl: 'http://localhost:11434',
model: 'llama2',
});
const rag = new RAGSystem({
database: { /* your config */ },
embeddings: embeddings,
llm: llm,
embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions
});Buffer Processing (New in v1.1.0)
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const processor = new DocumentProcessor();
// Process document from Buffer
const buffer = fs.readFileSync('document.pdf');
const result = await processor.processDocumentFromBuffer(
buffer,
'document.pdf',
'pdf',
{ source: 'api-upload', category: 'research' }
);
console.log(result.chunks); // Processed chunks with embeddingsURL Processing (New in v1.1.0)
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const processor = new DocumentProcessor();
// Process single URL
const result = await processor.processDocumentFromUrl(
'https://example.com/document.pdf',
{ source: 'web-crawl', priority: 'high' }
);
// Process multiple URLs
const urls = [
'https://example.com/doc1.pdf',
'https://example.com/doc2.html',
'https://example.com/doc3.md'
];
const results = await processor.processDocumentsFromUrls(urls, {
source: 'batch-import',
maxConcurrent: 3
});
console.log(`Processed ${results.successful.length} documents`);🎯 Structured Data Queries (New in v2.2.0)
The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses.
Basic Structured Query
const result = await rag.query("Tell me about iPhone features", {
structuredData: {
intent: "product_information",
entities: {
product: "iPhone",
category: "smartphone",
brand: "Apple"
},
constraints: [
"Focus on latest model features",
"Include technical specifications"
],
context: {
userType: "potential_buyer",
priceRange: "premium"
},
responseFormat: "structured_list"
}
});Troubleshooting Query
const result = await rag.query("My device won't connect to WiFi", {
structuredData: {
intent: "troubleshooting",
entities: {
issue_type: "connectivity",
device_category: "mobile",
problem_area: "wifi"
},
constraints: [
"Provide step-by-step solution",
"Include alternative methods"
],
responseFormat: "step_by_step_guide"
}
});Comparison Query
const result = await rag.query("Compare iPhone vs Samsung Galaxy", {
structuredData: {
intent: "comparison",
entities: {
item1: "iPhone",
item2: "Samsung Galaxy"
},
constraints: [
"Compare key specifications",
"Highlight main differences"
],
responseFormat: "comparison_table"
}
});Combined with Chat History
const result = await rag.query("What about the camera quality?", {
chatHistory: [
{ role: 'user', content: 'Tell me about iPhone features' },
{ role: 'assistant', content: 'The iPhone offers excellent features...' }
],
structuredData: {
intent: "follow_up_question",
entities: {
topic: "camera",
context_reference: "previous_iphone_discussion"
},
responseFormat: "detailed_explanation"
}
});Structured Data Schema
interface StructuredData {
intent: string; // Query intent/category (required)
entities?: { // Named entities and values
[key: string]: string | number;
};
constraints?: string[]; // Requirements/constraints
context?: { // Additional context
[key: string]: string | number | boolean;
};
responseFormat?: string; // Desired response format
}Common Intents
product_information- Product details and specificationstroubleshooting- Problem-solving and technical supportcomparison- Comparing multiple itemshow_to_guide- Step-by-step instructionsexplanation- Detailed explanationsfollow_up_question- Context-aware follow-ups
Response Formats
structured_list- Organized bullet pointsstep_by_step_guide- Numbered instructionscomparison_table- Side-by-side comparisondetailed_explanation- Comprehensive explanationbullet_points- Simple bullet formatjson_format- Structured JSON response
Advanced Filtering (New in v2.1.0)
import RAGSystem from 'rag-system-pgvector';
const rag = new RAGSystem(config);
await rag.initialize();
// Add documents with user/knowledgebot metadata
const documentData = await processor.processDocumentFromBuffer(
buffer,
'user-manual.pdf',
'pdf',
{
userId: 'user_123',
knowledgebotId: 'tech_support_bot',
department: 'engineering',
priority: 'high'
}
);
await rag.documentStore.saveDocument(documentData);
// Query with user filtering
const userResults = await rag.query('What technical info is available?', {
userId: 'user_123',
limit: 5
});
// Query with knowledgebot filtering
const botResults = await rag.query('Help with technical issues', {
knowledgebotId: 'tech_support_bot'
});
// Query with multiple filters
const filteredResults = await rag.query('Show important documents', {
userId: 'user_123',
filter: {
priority: 'high',
department: 'engineering'
}
});
// Direct search with filtering
const searchResults = await rag.searchDocumentsByUserId(
'documentation',
'user_123'
);
// Get all documents for a specific user
const userDocs = await rag.getDocumentsByUserId('user_123');Chat History & Session Persistence (New in v2.3.0)
Enable multi-turn conversations with persistent chat history stored in PostgreSQL.
Basic Chat History
// First query
const result1 = await rag.query('What is machine learning?');
// Follow-up with context
const result2 = await rag.query('Can you give me examples?', {
chatHistory: result1.chatHistory
});
// Another follow-up
const result3 = await rag.query('Which one is most popular?', {
chatHistory: result2.chatHistory
});Session Persistence
const sessionId = 'user_conversation_123';
// Query with automatic session save/load
const result = await rag.query('What is machine learning?', {
sessionId: sessionId,
persistSession: true, // Auto-save after query
userId: 'user_456',
knowledgebotId: 'tech_bot'
});
// Continue conversation (automatically loads history)
const result2 = await rag.query('Tell me more', {
sessionId: sessionId,
persistSession: true
});
// Load session manually
const session = await rag.loadSession(sessionId);
console.log(`Session has ${session.messageCount} messages`);
// Get all user sessions
const userSessions = await rag.getUserSessions('user_456');
console.log(`User has ${userSessions.length} sessions`);
// Get session statistics
const stats = await rag.getSessionStats({ userId: 'user_456' });
console.log(`Total messages: ${stats.totalMessages}`);History Summarization
// Long conversations are automatically managed
const result = await rag.query('Complex question', {
sessionId: sessionId,
persistSession: true,
maxHistoryLength: 20 // Keeps recent 20 messages
});Testing Chat Features
# Basic chat history
npm run test:chat:basic
# Session management
npm run test:chat:session
# History summarization
npm run test:chat:summarization
# Session persistence
npm run test:chat:persistenceDocumentation:
📚 API Documentation
DocumentProcessor Class
The DocumentProcessor class provides powerful document processing capabilities for files, buffers, and URLs.
Buffer Processing Methods
processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})
Process a document directly from a memory buffer.
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const processor = new DocumentProcessor();
const buffer = Buffer.from('This is a test document', 'utf8');
const result = await processor.processDocumentFromBuffer(
buffer,
'test.txt',
'txt',
{ source: 'api', category: 'test' }
);
// Returns:
// {
// title: 'Test Document',
// content: 'This is a test document',
// chunks: [...], // Array of processed chunks with embeddings
// metadata: { ... },
// fileType: 'txt',
// filePath: 'test.txt'
// }Parameters:
buffer(Buffer): The document content as a Buffer objectfileName(string): Name of the file (used for metadata)fileType(string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json')metadata(object): Additional metadata to attach to the document
Supported Buffer Types:
- TXT: Plain text files
- HTML: HTML documents (extracts text content)
- Markdown: Markdown files
- JSON: JSON files (converts to readable text)
extractTextFromBuffer(buffer, fileType)
Extract raw text from a buffer without processing into chunks.
const text = await processor.extractTextFromBuffer(buffer, 'html');
console.log(text); // Extracted plain textURL Processing Methods
processDocumentFromUrl(url, metadata = {})
Download and process a document from a URL.
const result = await processor.processDocumentFromUrl(
'https://example.com/document.pdf',
{
source: 'web-crawl',
priority: 'high',
category: 'research'
}
);
// Automatically detects file type from URL and content headers
// Downloads to temp directory and processesParameters:
url(string): HTTP/HTTPS URL to download frommetadata(object): Additional metadata for the document
Features:
- Automatic file type detection from URL extension and Content-Type headers
- Temporary file handling (auto-cleanup)
- Support for redirects and various HTTP response types
- Comprehensive error handling
processDocumentsFromUrls(urls, options = {})
Process multiple URLs in parallel with concurrency control.
const urls = [
'https://site1.com/doc1.pdf',
'https://site2.com/doc2.html',
'https://site3.com/doc3.md'
];
const results = await processor.processDocumentsFromUrls(urls, {
maxConcurrent: 3, // Process up to 3 URLs simultaneously
metadata: { batch: 'import-2024' },
timeout: 30000, // 30 second timeout per URL
retries: 2 // Retry failed downloads
});
// Returns:
// {
// successful: [...], // Array of successfully processed documents
// failed: [...], // Array of failed URLs with error details
// total: 3,
// successCount: 2,
// failureCount: 1
// }Options:
maxConcurrent(number): Maximum concurrent downloads (default: 5)metadata(object): Metadata applied to all documentstimeout(number): Timeout per URL in millisecondsretries(number): Number of retry attempts for failed downloads
Error Handling
All methods include comprehensive error handling:
try {
const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf');
} catch (error) {
if (error.message.includes('Buffer is empty')) {
console.log('Empty buffer provided');
} else if (error.message.includes('Unsupported file type')) {
console.log('File type not supported for buffer processing');
} else {
console.log('Processing error:', error.message);
}
}Integration with RAG System
Use processed documents with the RAG system:
import RAGSystem from 'rag-system-pgvector';
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const rag = new RAGSystem(config);
const processor = new DocumentProcessor();
await rag.initialize();
// Process from buffer
const buffer = fs.readFileSync('document.pdf');
const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf');
// Add to RAG system
await rag.documentStore.saveDocument(processed);
// Process from URL and add to RAG
const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html');
await rag.documentStore.saveDocument(urlProcessed);
// Now query across all documents
const answer = await rag.query('What information is available?');🌐 With Web Interface
const rag = new RAGSystem({
// ... configuration
server: { port: 3000, enableWebUI: true }
});
await rag.initialize();
await rag.startServer();
// Visit http://localhost:3000📖 Documentation
- 📚 Complete Package Documentation - Full API reference and examples
- 🔧 Integration Guide - Step-by-step integration examples
- 🎯 Examples - Ready-to-run examples
⚡ Quick Examples
Run the included examples:
# Basic usage example
npm run example:basic
# Web server example
npm run example:server
# Advanced integration example
npm run example:advanced
# Usage patterns overview
npm run example:patterns🛠️ Development & Contributing
For local development and contributions:
Prerequisites
- Node.js v18+
- PostgreSQL v12+ with pgvector extension
- OpenAI API Key
Setup
# Clone and install
git clone https://github.com/yourusername/rag-system-pgvector.git
cd rag-system-pgvector
npm install
# Configure environment
cp .env.example .env
# Edit .env with your credentials
# Initialize database
npm run setup
# Start development
npm run devTesting
# Run examples
npm run example:basic
# Run with web interface
npm run example:servercurl -X POST http://localhost:3000/documents/upload \
-F "document=@path/to/your/document.pdf" \
-F "title=My Document"Process Document from File Path
curl -X POST http://localhost:3000/documents/process \
-H "Content-Type: application/json" \
-d '{
"filePath": "/path/to/document.pdf",
"title": "My Document"
}'Search/Query
curl -X POST http://localhost:3000/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is the main topic of the document?",
"sessionId": "optional-session-id"
}'Get All Documents
curl http://localhost:3000/documentsGet Specific Document
curl http://localhost:3000/documents/{document-id}Delete Document
curl -X DELETE http://localhost:3000/documents/{document-id}Command Line Tools
Process Documents from Directory
npm run process-docs /path/to/documents/folderInteractive Search
npm run searchSingle Query Search
npm run search "Your question here"🏗️ Architecture
System Components
Document Processor (
src/utils/documentProcessor.js)- Extracts text from various file formats
- Splits documents into chunks with configurable overlap
- Generates embeddings using OpenAI
Document Store (
src/services/documentStore.js)- Manages document and chunk storage in PostgreSQL
- Performs vector similarity search using pgvector
- Handles CRUD operations
RAG Workflow (
src/workflows/ragWorkflow.js)- LangGraph-based workflow orchestration
- Three-step process: Retrieve → Rerank → Generate
- Supports conversational context
API Server (
src/index.js)- Express.js REST API
- File upload handling
- Conversation session management
Database Schema
-- Documents table
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
file_path VARCHAR(500),
file_type VARCHAR(50),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Document chunks with embeddings
CREATE TABLE document_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Search sessions for tracking
CREATE TABLE search_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
query TEXT NOT NULL,
results JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Chat Sessions for conversation persistence (NEW)
CREATE TABLE chat_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id VARCHAR(255) UNIQUE NOT NULL,
user_id VARCHAR(255),
knowledgebot_id VARCHAR(255),
history JSONB DEFAULT '[]'::jsonb,
metadata JSONB DEFAULT '{}'::jsonb,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
message_count INTEGER DEFAULT 0
);
-- Indexes for chat sessions
CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id);
CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id);
CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id);
CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity);LangGraph Workflow
graph TD
A[Query Input] --> B[Retrieve Node]
B --> C[Rerank Node]
C --> D[Generate Node]
D --> E[Response Output]
B --> F[Vector Search]
F --> G[Similar Chunks]
C --> H[Score Ranking]
H --> I[Top Chunks]
D --> J[LLM Generation]
J --> K[Contextual Response]🔧 Configuration
The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object.
Complete Configuration Example
import RAGSystem from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
const rag = new RAGSystem({
// ========================================
// 1. Database Configuration (Required)
// ========================================
database: {
host: 'localhost', // Database host
port: 5432, // Database port
database: 'rag_db', // Database name
username: 'postgres', // Database user
password: 'your_password', // Database password
// Connection Pool Settings
max: 10, // Max connections in pool
min: 0, // Min connections in pool
maxUses: Infinity, // Max uses per connection
allowExitOnIdle: false, // Allow pool to close when idle
maxLifetimeSeconds: 0, // Max connection lifetime (0 = unlimited)
idleTimeoutMillis: 10000 // Idle timeout (10 seconds)
},
// ========================================
// 2. AI Provider Configuration (Required)
// ========================================
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-ada-002'
}),
llm: new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-4',
temperature: 0.7
}),
// ========================================
// 3. Embedding Configuration
// ========================================
embeddingDimensions: 1536, // Dimensions for embeddings
// OpenAI ada-002: 1536
// HuggingFace MiniLM: 384
// Anthropic: varies
// ========================================
// 4. Vector Store Configuration
// ========================================
vectorStore: {
tableName: 'document_chunks_vector',
vectorColumnName: 'embedding',
contentColumnName: 'content',
metadataColumnName: 'metadata'
},
// ========================================
// 5. Document Processing Configuration
// ========================================
processing: {
chunkSize: 1000, // Characters per chunk
chunkOverlap: 200 // Overlap between chunks
},
// ========================================
// 6. Chat History Configuration (NEW)
// ========================================
chatHistory: {
enabled: true, // Enable chat history feature
maxMessages: 20, // Max messages before management kicks in
maxTokens: 3000, // Max tokens in chat history
summarizeThreshold: 30, // Trigger summarization after N messages
keepRecentCount: 10, // Recent messages to preserve
alwaysKeepFirst: true, // Always keep conversation starter
persistSessions: true, // Store sessions in database
sessionTimeout: 3600000 // Session timeout (1 hour in ms)
}
});
await rag.initialize();Configuration Sections Explained
1. Database Configuration
Controls PostgreSQL connection and pool behavior:
database: {
host: 'localhost', // Where PostgreSQL is running
port: 5432, // PostgreSQL port (default: 5432)
database: 'rag_db', // Your database name
username: 'postgres', // Database user
password: 'your_password', // User password
// Pool Settings (Advanced)
max: 10, // Maximum concurrent connections
min: 0, // Minimum idle connections
idleTimeoutMillis: 10000 // Close idle connections after 10s
}Best Practices:
- Use environment variables for sensitive data
- Set
maxbased on your application's concurrency needs - Monitor connection pool usage in production
2. AI Provider Configuration
Specify your embedding and language model providers:
OpenAI Example:
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-ada-002'
}),
llm: new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-4',
temperature: 0.7
})Anthropic Example:
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-ada-002'
}),
llm: new ChatAnthropic({
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
modelName: 'claude-3-sonnet-20240229',
temperature: 0.7
})Local Models Example:
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';
embeddings: new HuggingFaceTransformersEmbeddings({
modelName: 'sentence-transformers/all-MiniLM-L6-v2'
}),
llm: new Ollama({
baseUrl: 'http://localhost:11434',
model: 'llama2'
})3. Embedding Dimensions
Match this to your embedding model's output dimensions:
| Model | Dimensions | Provider | |-------|------------|----------| | text-embedding-ada-002 | 1536 | OpenAI | | all-MiniLM-L6-v2 | 384 | HuggingFace | | text-embedding-3-small | 1536 | OpenAI | | text-embedding-3-large | 3072 | OpenAI |
embeddingDimensions: 1536 // Must match your embedding modelImportant: If you change embedding models, you must recreate the database schema!
4. Vector Store Configuration
Customize the vector store table structure:
vectorStore: {
tableName: 'document_chunks_vector', // Table name for vectors
vectorColumnName: 'embedding', // Column for embeddings
contentColumnName: 'content', // Column for text content
metadataColumnName: 'metadata' // Column for metadata
}Most users can use the defaults.
5. Document Processing
Control how documents are chunked:
processing: {
chunkSize: 1000, // Characters per chunk (500-2000 recommended)
chunkOverlap: 200 // Overlap between chunks (10-20% of chunkSize)
}Guidelines:
- Small chunks (500): Better precision, more chunks, higher cost
- Large chunks (2000): Better context, fewer chunks, lower cost
- Overlap: Prevents context loss at boundaries (typically 10-20%)
Examples:
// For technical documentation (needs precision)
processing: { chunkSize: 800, chunkOverlap: 150 }
// For books/long content (needs context)
processing: { chunkSize: 1500, chunkOverlap: 300 }
// For code documentation (needs structure)
processing: { chunkSize: 1000, chunkOverlap: 200 }6. Chat History Configuration (NEW in v2.3.0)
Control conversation history management:
chatHistory: {
enabled: true, // Enable/disable chat history
maxMessages: 20, // Start management after N messages
maxTokens: 3000, // Maximum tokens in history
summarizeThreshold: 30, // Summarize after N messages
keepRecentCount: 10, // Recent messages to always keep
alwaysKeepFirst: true, // Keep conversation starter
persistSessions: true, // Store in database
sessionTimeout: 3600000 // 1 hour timeout (in milliseconds)
}Chat History Options Explained:
enabled: Master switch for chat history featuremaxMessages: Soft limit before history management activatesmaxTokens: Hard limit on token count (prevents API errors)summarizeThreshold: When to trigger LLM-based summarizationkeepRecentCount: Recent messages to preserve during summarizationalwaysKeepFirst: Preserve conversation context from the beginningpersistSessions: Save sessions to database for persistencesessionTimeout: Milliseconds before session is considered inactive
Preset Configurations:
// Minimal (cost-effective)
chatHistory: {
enabled: true,
maxMessages: 10,
maxTokens: 1500,
summarizeThreshold: 15,
keepRecentCount: 5,
persistSessions: false
}
// Balanced (recommended)
chatHistory: {
enabled: true,
maxMessages: 20,
maxTokens: 3000,
summarizeThreshold: 30,
keepRecentCount: 10,
persistSessions: true
}
// Maximum context (for complex conversations)
chatHistory: {
enabled: true,
maxMessages: 40,
maxTokens: 6000,
summarizeThreshold: 50,
keepRecentCount: 20,
persistSessions: true
}
// Disabled (for single-shot queries)
chatHistory: {
enabled: false
}Environment Variables
Create a .env file for sensitive configuration:
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=rag_db
DB_USER=postgres
DB_PASSWORD=your_secure_password
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...
# Azure (optional)
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...
# Processing (optional)
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
EMBEDDING_DIMENSIONS=1536Then use in your code:
import 'dotenv/config';
const rag = new RAGSystem({
database: {
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT),
database: process.env.DB_NAME,
username: process.env.DB_USER,
password: process.env.DB_PASSWORD
},
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY
}),
llm: new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY
}),
embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536')
});Query-Time Configuration
You can also configure behavior at query time:
const result = await rag.query('Your question', {
// Filtering
userId: 'user_123', // Filter by user
knowledgebotId: 'bot_456', // Filter by bot
filter: { category: 'tech' }, // Custom metadata filters
// Retrieval
limit: 10, // Number of chunks to retrieve
threshold: 0.5, // Similarity threshold (0-1)
// Chat History
chatHistory: previousHistory, // Previous conversation
maxHistoryLength: 15, // Override default history length
sessionId: 'session_789', // Session identifier
persistSession: true, // Save session to database
// Context
context: additionalContext, // Extra context to include
metadata: { source: 'api' } // Custom metadata
});Configuration Best Practices
- Security: Never hardcode API keys or passwords
- Environment-Specific: Use different configs for dev/staging/prod
- Performance: Monitor and adjust based on usage patterns
- Cost: Balance context size with API costs
- Testing: Test with different configurations to find optimal settings
📊 Performance Optimization
Database Indexes
The system creates optimized indexes:
-- For vector similarity search
CREATE INDEX idx_document_chunks_embedding
ON document_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- For document relationships
CREATE INDEX idx_document_chunks_document_id
ON document_chunks(document_id);Chunking Strategy
- Recursive Character Text Splitter: Preserves semantic boundaries
- Configurable overlap: Ensures context continuity
- Multiple separators: Prioritizes paragraph, sentence, then word boundaries
🧪 Testing
Test Document Processing
# Create test documents directory
mkdir test-docs
# Add some test files (PDF, DOCX, TXT, etc.)
# Then process them
npm run process-docs ./test-docsTest Search
# Interactive search
npm run search
# Or single query
npm run search "What is machine learning?"🔍 Troubleshooting
Common Issues
pgvector extension not found
-- Install pgvector extension CREATE EXTENSION IF NOT EXISTS vector;OpenAI API quota exceeded
- Check your OpenAI API usage
- Consider using alternative embedding models
Large document processing fails
- Increase chunk size or reduce document size
- Check memory limits
Poor search results
- Lower similarity threshold
- Adjust chunk size and overlap
- Verify document content quality
Debug Mode
Enable verbose logging by setting:
NODE_ENV=development🤝 Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- LangChain for the excellent AI/ML framework
- pgvector for vector similarity search
- OpenAI for embedding and language models
