rag-guard
v1.0.0
Published
Lightweight, deterministic backend guard for RAG pipelines. Detects irrelevant context, ungrounded answers, and generic responses without LLM calls.
Maintainers
Readme
rag-guard
A lightweight, deterministic backend guard for Retrieval-Augmented Generation (RAG) pipelines.
rag-guard acts as a runtime safety and quality gate for production AI systems. It evaluates the relationship between your user's query, the retrieved context, and the LLM's answer to ensure reliability.
It is a pure npm package with no LLM calls, no network requests, and no UI.
What problems does it solve?
RAG pipelines often fail silently.
- Irrelevant Context: The retriever finds documents, but they don't answer the specific question.
- Hallucinations: The LLM ignores the context and makes up an answer.
- Generic Answers: The model politely refuses to answer ("I don't know") but the system doesn't detect this as a retrieval failure.
- Empty Retrievals: Pipelines sometimes pass empty or malformed strings to the model.
Key characteristics
- Deterministic: Same input always yields same output. No "AI judging AI".
- Fast: Runs in milliseconds using lexical overlap and set theory (Jaccard similarity, asymmetric coverage).
- Privacy-First: Data never leaves your server.
- Inspectable: Returns clear, human-readable reasons for failure.
Badges
Author: Divesh Sarkar
Installation
npm install rag-guard
# or
yarn add rag-guard
# or
pnpm add rag-guardQuick Start
import { guardRAG, RagGuardInput } from 'rag-guard';
const input: RagGuardInput = {
query: "What is the battery life of the X200?",
context: "The X200 features a 12-hour battery life under normal usage conditions.",
answer: "The X200 lasts for about 12 hours."
};
const result = guardRAG(input);
if (result.isSafe) {
console.log("Safe to return to user:", result.confidence);
} else {
console.error("Unsafe response:", result.reasons);
}Core API
Input
interface RagGuardInput {
query: string;
context: string | string[];
answer: string;
}Output
interface RagGuardEvaluation {
isSafe: boolean;
confidence: number;
reasons: string[];
metrics: {
contextRelevance: number;
answerGrounding: number;
contextCoverage: number;
infoDensity: number;
contextRedundancy?: number;
};
}How evaluation works (high-level)
- Text is cleaned and vectorized using bag-of-words.
- Context Relevance: Cosine similarity between query and context.
- Answer Grounding: Cosine similarity between answer and context (highest chunk).
- Context Coverage: Overlap between answer terms and full context.
- Additional heuristics: Length checks, info density, redundancy, generic phrases.
- Zero dependencies · Pure TypeScript · < 5ms runtime
Configuration Options
You can customize strictness by passing an options object:
import { guardRAG } from 'rag-guard';
const result = guardRAG(input, {
relevanceThreshold: 0.25, // Min context relevance
groundingThreshold: 0.4, // Min answer grounding
coverageThreshold: 0.2, // Min query coverage in context
minContextLength: 50, // Min chars for context
maxContextLength: 10000, // Max chars for context
genericAnswerPhrases: [ // Custom refusal phrases
"i don't know",
"not sufficient information",
"cannot provide"
]
});Production Use Cases
Safety gate example
Block answers that aren't supported by your internal data to prevent hallucinations in customer support chat bots.
Monitoring & observability example
Log the metrics object for every RAG interaction. Over time, visualize contextRelevance to measure the quality of your vector retriever, and answerGrounding to measure how well your LLM adheres to instructions.
Performance
rag-guard is designed to add negligible latency (< 5ms) to your pipeline. It uses efficient set operations and string manipulation, avoiding heavy embedding models or external API calls.
Where rag-guard fits
User Query
│
▼
[Retriever] ──► (Documents)
│
▼
[LLM] ──► (Answer)
│
▼
[rag-guard] ◄── (Query, Docs, Answer)
│
├──► ✅ Safe: Return to User
└──► ❌ Unsafe: Return fallback / RetryWhy use rag-guard?
Using an LLM to evaluate another LLM (LLM-as-a-judge) is slow, expensive, and non-deterministic. rag-guard provides a baseline logic layer that catches 80% of common RAG failures instantly and cheaply.
Contributing
Contributions are welcome! Please ensure all logic remains deterministic and dependency-free.
Author
Divesh Sarkar
License
MIT © 2026 Divesh Sarkar
