@dcyfr/ai-rag
v1.0.0
Published
RAG (Retrieval-Augmented Generation) system template - DCYFR AI starter
Maintainers
Readme
@dcyfr/ai-rag
RAG (Retrieval-Augmented Generation) framework for Node.js and TypeScript
Build production-ready RAG systems with document loading, embedding, vector stores, and semantic search.
✨ Features
- 📄 Document Loaders - Load text, markdown, and HTML documents with intelligent chunking
- 🔢 Embeddings - Pluggable providers (OpenAI, Cohere, Anthropic, Ollama local)
- 🗄️ Vector Stores - In-memory + persistent (Chroma, Pinecone, Weaviate)
- 🔍 Semantic Retrieval - Find relevant documents by meaning, not just keywords
- 🎯 Metadata Filtering - Complex filters (AND/OR, nested, temporal queries)
- ⚡ Batch Processing - Efficient ingestion with progress tracking and error handling
- 🔄 Hybrid Search - Combine keyword (BM25) + semantic search for best results
- 📊 Multiple Distance Metrics - Cosine similarity, dot product, euclidean
- 🚀 Production Ready - Retry logic, monitoring hooks, comprehensive error handling
- 📚 Complete Documentation - 4 comprehensive guides + advanced examples
📦 Installation
npm install @dcyfr/ai-ragOptional Dependencies
# For production embeddings (recommended)
npm install openai # or anthropic
# For persistent vector storage
npm install chromadb # or pinecone-client or weaviate-client🚀 Quick Start
import {
TextLoader,
SimpleEmbeddingGenerator,
InMemoryVectorStore,
IngestionPipeline,
RetrievalPipeline,
} from '@dcyfr/ai-rag';
// 1. Setup components
const loader = new TextLoader();
const embedder = new SimpleEmbeddingGenerator({ dimensions: 384 });
const store = new InMemory VectorStore({
collectionName: 'my-docs',
embeddingDimensions: 384,
});
// 2. Ingest documents
const ingestion = new IngestionPipeline(loader, embedder, store);
await ingestion.ingest(['./docs/file1.txt', './docs/file2.txt']);
// 3. Query for relevant context
const retrieval = new RetrievalPipeline(store, embedder);
const result = await retrieval.query('What is machine learning?', {
limit: 5,
threshold: 0.7,
});
console.log(result.context) // Assembled context from top results
console.log(result.results); // Ranked document chunks with scores📚 Documentation
Comprehensive Guides
Explore our detailed documentation covering all aspects of RAG development:
Document Loaders Guide - Complete guide to loading and chunking documents
- TextLoader, MarkdownLoader, HTMLLoader
- Chunking strategies (fixed-size, sentence-aware, paragraph-based, semantic)
- Custom loaders and streaming
Embeddings Guide - Vector embedding providers and techniques
- OpenAI, Cohere, Anthropic, Ollama (local)
- Batch processing and caching
- Similarity metrics explained
Vector Stores Guide - Storage and retrieval optimization
- InMemoryVectorStore, ChromaVectorStore, PineconeVectorStore, WeaviateVectorStore
- Metadata filtering (AND/OR, nested queries)
- Performance optimization (batching, ANN search)
Pipelines Guide - End-to-end RAG workflows
- Ingestion pipeline (load → chunk → embed → store)
- Retrieval pipeline (query → search → assemble context)
- Production patterns (hybrid search, re-ranking, error handling)
Quick Reference
Document Loaders - Load and chunk documents
import { TextLoader } from '@dcyfr/ai-rag';
const loader = new TextLoader();
const docs = await loader.load('./document.txt', {
chunkSize: 1000,
chunkOverlap: 200,
});MarkdownLoader - Load markdown files (.md)
import { MarkdownLoader } from '@dcyfr/ai-rag';
const loader = new MarkdownLoader();
const docs = await loader.load('./README.md', {
chunkSize: 800,
chunkOverlap: 150,
});HTMLLoader - Load HTML files (.html)
import { HTMLLoader } from '@dcyfr/ai-rag';
const loader = new HTMLLoader();
const docs = await loader.load('./page.html', {
chunkSize: 600,
chunkOverlap: 100,
});Embedding Generators
SimpleEmbeddingGenerator - Placeholder embeddings (for development/testing)
import { SimpleEmbeddingGenerator } from '@dcyfr/ai-rag';
const embedder = new SimpleEmbeddingGenerator({ dimensions: 384 });
const embeddings = await embedder.embed(['text 1', 'text 2']);⚠️ Production Note: Use real embedding models in production:
- OpenAI
text-embedding-3-small(1536 dimensions) - Cohere
embed-english-v3.0 - Local models via Ollama
Vector Stores
InMemoryVectorStore - Fast in-memory storage
import { InMemoryVectorStore } from '@dcyfr/ai-rag';
const store = new InMemoryVectorStore({
collectionName: 'docs',
embeddingDimensions: 384,
distanceMetric: 'cosine', // 'cosine' | 'dot' | 'euclidean'
});
// Add documents
await store.addDocuments(chunks);
// Search
const results = await store.search(queryEmbedding, 10);
// Filter by metadata
const filtered = await store.search(queryEmbedding, 10, {
field: 'category',
operator: 'eq',
value: 'documentation',
});Ingestion Pipeline
import { IngestionPipeline } from '@dcyfr/ai-rag';
const pipeline = new IngestionPipeline(loader, embedder, store);
const result = await pipeline.ingest(['./docs/'], {
batchSize: 32,
onProgress: (current, total, details) => {
console.log(`Processing ${current}/${total}`);
},
});
console.log(`Processed ${result.documentsProcessed} documents`);
console.log(`Generated ${result.chunksGenerated} chunks`);Retrieval Pipeline
import { RetrievalPipeline } from '@dcyfr/ai-rag';
const pipeline = new RetrievalPipeline(store, embedder);
// Semantic search
const result = await pipeline.query('your question here', {
limit: 5,
threshold: 0.7,
includeMetadata: true,
});
console.log(result.context); // Assembled context
console.log(result.results); // Ranked results
console.log(result.metadata); // Query metadata
// Find similar documents
const similar = await pipeline.findSimilar('doc-id-123', { limit: 10 });💡 Examples
Basic Examples
- Basic RAG - Simple document ingestion and retrieval workflow
- Semantic Search - Advanced search with metadata filtering
- Q&A System - Question answering with context assembly
Advanced Examples
Advanced RAG - Production-ready workflow with:
- OpenAI embeddings for semantic search
- Chroma persistent vector store
- Metadata filtering with multiple criteria
- Progress tracking and error handling
- Question answering with context
Metadata Filtering - Complex query scenarios:
- AND/OR filter combinations
- Nested complex filters
- Temporal queries (date ranges)
- Tag-based search with arrays
- Multi-field filtering
Hybrid Search - Combine keyword + semantic:
- BM25 keyword search implementation
- Weighted score fusion
- Reciprocal rank fusion (RRF)
- Performance comparisons
Running Examples
# Basic examples
npm run example:basic-rag
npm run example:semantic-search
npm run example:qa-system
# Advanced examples
npm run example:advanced-rag
npm run example:metadata-filtering
npm run example:hybrid-search🏗️ Architecture
┌─────────────┐
│ Documents │
└──────┬──────┘
│
▼
┌─────────────┐
│ Loaders │ (Text, Markdown, HTML)
└──────┬──────┘
│
▼
┌─────────────┐
│ Chunking │ (Size + overlap)
└──────┬──────┘
│
▼
┌─────────────┐
│ Embeddings │ (Vector generation)
└──────┬──────┘
│
▼
┌─────────────┐
│ Vector Store│ (In-memory or persistent)
└──────┬──────┘
│
▼
┌─────────────┐
│ Retrieval │ (Semantic search)
└──────┬──────┘
│
▼
┌─────────────┐
│ Context │ (Assembled results)
└─────────────┘💡 Best Practices
Chunking Strategy
Choose appropriate chunk sizes:
- Technical documentation: 800-1200 characters
- Blog posts/articles: 1000-1500 characters
- Code documentation: 600-1000 characters
- Q&A pairs: 400-800 characters
Use 15-20% overlap:
const loader = new TextLoader();
const docs = await loader.load('./document.txt', {
chunkSize: 1000,
chunkOverlap: 200, // 20% overlap prevents context loss at boundaries
});Preserve document structure:
- Use MarkdownLoader for
.mdfiles (preserves headings, code blocks) - Use HTMLLoader for web pages (extracts main content, excludes nav/footer)
- Add rich metadata (source, category, tags, dates, author)
Embedding Selection
Development/Testing:
- SimpleEmbeddingGenerator (fast, no API costs, not for production)
Production (Recommended):
- OpenAI
text-embedding-3-small(1536 dim, $0.02/1M tokens, fast, good quality) - OpenAI
text-embedding-3-large(3072 dim, best quality, higher cost) - Cohere
embed-english-v3.0(1024 dim, multilingual support) - Ollama local models (no API costs, data privacy, requires GPU)
Critical: Use the same embedder for both documents and queries!
Search Optimization
Set appropriate similarity thresholds:
const result = await pipeline.query('search query', {
limit: 10,
threshold: 0.7, // Filter results with score < 0.7 (adjust 0.6-0.8 based on needs)
});Use metadata filtering to narrow search space:
const result = await pipeline.query('search query', {
limit: 5,
filter: {
operator: 'and',
filters: [
{ field: 'category', operator: 'eq', value: 'technical' },
{ field: 'published', operator: 'gte', value: '2024-01-01' },
],
},
});For large collections (>100k documents):
- Use persistent vector stores (Chroma, Pinecone, Weaviate)
- Enable Approximate Nearest Neighbor (ANN) search
- Implement caching for frequent queries
🔧 Troubleshooting
Poor Search Results
Problem: Retrieved context not relevant to query
Solutions:
- Verify using same embedder for docs and queries
- Increase similarity threshold (0.75-0.8 for higher quality)
- Test embedding quality:
const [ml, ai, pizza] = await embedder.embed(['machine learning', 'artificial intelligence', 'pizza']); const similarity = cosineSimilarity(ml, ai); // Should be >0.7 const unrelated = cosineSimilarity(ml, pizza); // Should be <0.3 - Adjust chunk size (smaller chunks = more precise, larger = more context)
- Add metadata filters to narrow search space
High API Costs
Problem: Embedding API costs too high
Solutions:
- Implement caching for frequent queries:
const cache = new LRUCache<string, number[]>({ max: 10000, ttl: 1000 * 60 * 60 }); async function embedWithCache(text: string): Promise<number[]> { const cached = cache.get(text); if (cached) return cached; const [embedding] = await embedder.embed([text]); cache.set(text, embedding); return embedding; } - Use smaller embedding dimensions (OpenAI supports 512, 1024, 1536)
- Switch to local models (Ollama) for development/testing
- Batch process documents (100+ at a time) to reduce API calls
Slow Performance
Problem: Search or ingestion too slow
Solutions:
For ingestion:
- Increase batch size:
{ batchSize: 100 } - Process files in parallel (use Promise.all with batches)
- Use streaming loader for huge files
- Increase batch size:
For search:
- Reduce result limit:
{ limit: 5 }instead of 50 - Use metadata filters to narrow search space
- Enable ANN search for collections >100k:
const store = new InMemoryVectorStore({ useApproximateSearch: true, approximationParams: { nprobe: 10, nlist: 100 }, }); - Use persistent vector stores with indexing (Pinecone, Weaviate)
- Reduce result limit:
Memory Issues
Problem: Application crashes with large document collections
Solutions:
- Use persistent vector stores instead of in-memory
- Set maxDocuments limit with LRU eviction:
const store = new InMemoryVectorStore({ maxDocuments: 100000, evictionPolicy: 'lru', }); - Process documents in smaller batches
- Use streaming loader for large files
🧪 Development
# Install dependencies
npm install
# Build
npm run build
# Test
npm run test:run
# Watch mode
npm run test:watch
# Coverage
npm run test:coverage
# Lint
npm run lint🔧 Production Setup
1. Use Real Embedding Models
import OpenAI from 'openai';
class OpenAIEmbeddingGenerator implements EmbeddingGenerator {
private client: OpenAI;
constructor(apiKey: string) {
this.client = new OpenAI({ apiKey });
}
async embed(texts: string[]): Promise<number[][]> {
const response = await this.client.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
return response.data.map((d) => d.embedding);
}
getDimensions(): number {
return 1536;
}
}2. Use Persistent Vector Stores
import { ChromaClient } from 'chromadb';
// Initialize Chroma for persistent storage
const client = new ChromaClient({ path: './chroma-data' });3. Add Production Monitoring
const result = await ingestion.ingest(files, {
onProgress: (current, total, details) => {
// Send metrics to monitoring service
metrics.gauge('rag.ingestion.progress', current / total);
logger.info({ current, total, details }, 'Ingestion progress');
},
});�️ Roadmap
v1.1 (Planned)
- [ ] Additional vector stores (Qdrant, Milvus)
- [ ] Streaming ingestion pipeline
- [ ] Built-in caching layer
- [ ] Query expansion and synonyms
- [ ] Document versioning and updates
v1.2 (Planned)
- [ ] Hybrid search (keyword + semantic) built-in
- [ ] Re-ranking strategies (cross-encoder models)
- [ ] Multi-query retrieval
- [ ] Sparse + dense vector support
- [ ] Advanced chunking (recursive, semantic)
v2.0 (Future)
- [ ] Distributed vector search
- [ ] Graph RAG (knowledge graphs + vectors)
- [ ] Multi-modal embeddings (text + images)
- [ ] Real-time indexing
- [ ] Auto-tuning (chunk size, thresholds)
See our GitHub Issues for feature requests and progress.
�📄 License
MIT © DCYFR
🤝 Contributing
See CONTRIBUTING.md for contribution guidelines.
🔗 Links
Built with ❤️ by the DCYFR team
