rag-foundation-stone
v0.3.0
Published
Production-grade RAG foundation with pluggable stores, text splitting, and high-level pipeline
Maintainers
Readme
RAG Foundation
Production-grade Retrieval-Augmented Generation foundation designed for extension into SaaS products.
What This Is
A reference implementation of RAG architecture with strict separation of concerns, deterministic behavior, and zero hallucination tolerance. This is not a demo or chatbot toy—it's a foundation meant to be extended into production systems.
Who This Is For
- Engineering teams building knowledge bases or document search products
- Architects designing RAG systems that need to scale
- Teams requiring deterministic, auditable AI responses
- Organizations that need cloud-agnostic, vendor-neutral infrastructure
Problems This Solves
1. Hallucination Prevention
Strict architectural boundaries ensure the LLM cannot fabricate information:
- Retrieval returns references only, never full content
- Empty retrieval triggers immediate refusal
- Temperature locked at 0
- No retry logic that could introduce non-determinism
2. Content Storage Decoupling
Vector databases should never store full content:
- Pluggable content stores (S3, Postgres, filesystem)
- Content fetched separately using
content_hashas contract - Vector store contains only embeddings and metadata
3. Incremental Ingestion
Production systems must handle updates efficiently:
- Content-hash based deduplication
- Idempotent ingestion pipeline
- Re-running ingestion skips unchanged content
- No unnecessary re-embedding
4. Cloud Agnostic Design
Avoid vendor lock-in from day one:
- Abstract interfaces for all external dependencies
- Bring your own embedding provider
- Bring your own LLM
- Bring your own vector database
Architecture
See ARCHITECTURE.md for detailed design documentation.
Quick Start
import {
IngestionPipeline,
Retriever,
AnswerGenerator,
FilesystemContentStore,
MemoryVectorStore,
} from 'rag-foundation';
// Configure stores
const contentStore = new FilesystemContentStore('./data/content');
const vectorStore = new MemoryVectorStore();
// Initialize pipeline
const ingestion = new IngestionPipeline(
contentStore,
vectorStore,
embeddingProvider // Implement EmbeddingProvider interface
);
// Ingest documents
await ingestion.ingest(chunks);
// Query
const retriever = new Retriever(vectorStore, embeddingProvider, { topK: 5 });
const queryResult = await retriever.retrieve('What is X?');
// Generate answer
const generator = new AnswerGenerator(contentStore, llmProvider);
const answer = await generator.generate('What is X?', queryResult);Project Structure
src/
├── core/
│ ├── interfaces/ # Abstract interfaces
│ └── types/ # Core types
├── stores/
│ ├── content/ # Content storage adapters
│ └── vector/ # Vector storage adapters
├── ingestion/ # Ingestion pipeline
├── retrieval/ # Retrieval layer
├── answer/ # Answer generation
└── utils/ # UtilitiesExtending for SaaS
This foundation is designed to be extended with:
- Authentication: Add user context to metadata, filter by user_id
- Multi-tenancy: Add tenant_id to metadata, filter in retrieval
- Access control: Implement permission checks in retrieval layer
- Usage tracking: Wrap components with metering decorators
- Rate limiting: Add rate limit middleware around public APIs
- Audit logging: Emit events at ingestion and retrieval boundaries
- Observability: Instrument with OpenTelemetry or similar
- API layer: Wrap with REST/GraphQL endpoints
- Async ingestion: Replace direct calls with queue-based workers
See TODO: markers in code for specific extension points.
Implementation Checklist
Before going to production:
- [ ] Replace mock embedding provider with actual implementation
- [ ] Replace mock LLM provider with actual implementation
- [ ] Choose and configure production vector store
- [ ] Choose and configure production content store
- [ ] Implement chunking strategy for your document types
- [ ] Configure embedding dimensions and model
- [ ] Tune retrieval parameters (topK, scoreThreshold)
- [ ] Add error handling and retries where appropriate
- [ ] Add observability and metrics
- [ ] Add integration tests
- [ ] Load test ingestion and retrieval paths
- [ ] Document runbooks for common operations
Design Principles
- Retrieval returns references, not content
- Content storage is fully decoupled
- Vector database stores embeddings and metadata only
- Deterministic by design (temperature=0, no retries)
- Empty retrieval must cause refusal
- Incremental ingestion via content hashing
- Cloud-agnostic interfaces
License
MIT
