rag-foundation-stone

v0.3.0

Published

4 days ago

Production-grade RAG foundation with pluggable stores, text splitting, and high-level pipeline

0High
0Medium
0Low

pulkit004

rag retrieval embeddings llm vector-search text-splitting pipeline

RAG Foundation

Production-grade Retrieval-Augmented Generation foundation designed for extension into SaaS products.

What This Is

A reference implementation of RAG architecture with strict separation of concerns, deterministic behavior, and zero hallucination tolerance. This is not a demo or chatbot toy—it's a foundation meant to be extended into production systems.

Who This Is For

Engineering teams building knowledge bases or document search products
Architects designing RAG systems that need to scale
Teams requiring deterministic, auditable AI responses
Organizations that need cloud-agnostic, vendor-neutral infrastructure

Problems This Solves

1. Hallucination Prevention

Strict architectural boundaries ensure the LLM cannot fabricate information:

Retrieval returns references only, never full content
Empty retrieval triggers immediate refusal
Temperature locked at 0
No retry logic that could introduce non-determinism

2. Content Storage Decoupling

Vector databases should never store full content:

Pluggable content stores (S3, Postgres, filesystem)
Content fetched separately using content_hash as contract
Vector store contains only embeddings and metadata

3. Incremental Ingestion

Production systems must handle updates efficiently:

Content-hash based deduplication
Idempotent ingestion pipeline
Re-running ingestion skips unchanged content
No unnecessary re-embedding

4. Cloud Agnostic Design

Avoid vendor lock-in from day one:

Abstract interfaces for all external dependencies
Bring your own embedding provider
Bring your own LLM
Bring your own vector database

Architecture

See ARCHITECTURE.md for detailed design documentation.

Quick Start

import {
  IngestionPipeline,
  Retriever,
  AnswerGenerator,
  FilesystemContentStore,
  MemoryVectorStore,
} from 'rag-foundation';

// Configure stores
const contentStore = new FilesystemContentStore('./data/content');
const vectorStore = new MemoryVectorStore();

// Initialize pipeline
const ingestion = new IngestionPipeline(
  contentStore,
  vectorStore,
  embeddingProvider // Implement EmbeddingProvider interface
);

// Ingest documents
await ingestion.ingest(chunks);

// Query
const retriever = new Retriever(vectorStore, embeddingProvider, { topK: 5 });
const queryResult = await retriever.retrieve('What is X?');

// Generate answer
const generator = new AnswerGenerator(contentStore, llmProvider);
const answer = await generator.generate('What is X?', queryResult);

Project Structure

src/
├── core/
│   ├── interfaces/       # Abstract interfaces
│   └── types/            # Core types
├── stores/
│   ├── content/          # Content storage adapters
│   └── vector/           # Vector storage adapters
├── ingestion/            # Ingestion pipeline
├── retrieval/            # Retrieval layer
├── answer/               # Answer generation
└── utils/                # Utilities

Extending for SaaS

This foundation is designed to be extended with:

Authentication: Add user context to metadata, filter by user_id
Multi-tenancy: Add tenant_id to metadata, filter in retrieval
Access control: Implement permission checks in retrieval layer
Usage tracking: Wrap components with metering decorators
Rate limiting: Add rate limit middleware around public APIs
Audit logging: Emit events at ingestion and retrieval boundaries
Observability: Instrument with OpenTelemetry or similar
API layer: Wrap with REST/GraphQL endpoints
Async ingestion: Replace direct calls with queue-based workers

See TODO: markers in code for specific extension points.

Implementation Checklist

Before going to production:

[ ] Replace mock embedding provider with actual implementation
[ ] Replace mock LLM provider with actual implementation
[ ] Choose and configure production vector store
[ ] Choose and configure production content store
[ ] Implement chunking strategy for your document types
[ ] Configure embedding dimensions and model
[ ] Tune retrieval parameters (topK, scoreThreshold)
[ ] Add error handling and retries where appropriate
[ ] Add observability and metrics
[ ] Add integration tests
[ ] Load test ingestion and retrieval paths
[ ] Document runbooks for common operations

Design Principles

Retrieval returns references, not content
Content storage is fully decoupled
Vector database stores embeddings and metadata only
Deterministic by design (temperature=0, no retries)
Empty retrieval must cause refusal
Incremental ingestion via content hashing
Cloud-agnostic interfaces

License

MIT