rag-guard

v1.0.0

Published

a month ago

Lightweight, deterministic backend guard for RAG pipelines. Detects irrelevant context, ungrounded answers, and generic responses without LLM calls.

0High
0Medium
0Low

divesh_platypus

rag retrieval-augmented-generation llm ai-safety hallucination-detection evaluation backend typescript production

rag-guard

A lightweight, deterministic backend guard for Retrieval-Augmented Generation (RAG) pipelines.

rag-guard acts as a runtime safety and quality gate for production AI systems. It evaluates the relationship between your user's query, the retrieved context, and the LLM's answer to ensure reliability.

It is a pure npm package with no LLM calls, no network requests, and no UI.

What problems does it solve?

RAG pipelines often fail silently.

Irrelevant Context: The retriever finds documents, but they don't answer the specific question.
Hallucinations: The LLM ignores the context and makes up an answer.
Generic Answers: The model politely refuses to answer ("I don't know") but the system doesn't detect this as a retrieval failure.
Empty Retrievals: Pipelines sometimes pass empty or malformed strings to the model.

Key characteristics

Deterministic: Same input always yields same output. No "AI judging AI".
Fast: Runs in milliseconds using lexical overlap and set theory (Jaccard similarity, asymmetric coverage).
Privacy-First: Data never leaves your server.
Inspectable: Returns clear, human-readable reasons for failure.

Badges

license

Author: Divesh Sarkar

Installation

npm install rag-guard
# or
yarn add rag-guard
# or
pnpm add rag-guard

Quick Start

import { guardRAG, RagGuardInput } from 'rag-guard';

const input: RagGuardInput = {
  query: "What is the battery life of the X200?",
  context: "The X200 features a 12-hour battery life under normal usage conditions.",
  answer: "The X200 lasts for about 12 hours."
};

const result = guardRAG(input);

if (result.isSafe) {
  console.log("Safe to return to user:", result.confidence);
} else {
  console.error("Unsafe response:", result.reasons);
}

Core API

Input

interface RagGuardInput {
  query: string;
  context: string | string[];
  answer: string;
}

Output

interface RagGuardEvaluation {
  isSafe: boolean;
  confidence: number;
  reasons: string[];
  metrics: {
    contextRelevance: number;
    answerGrounding: number;
    contextCoverage: number;
    infoDensity: number;
    contextRedundancy?: number;
  };
}

How evaluation works (high-level)

Text is cleaned and vectorized using bag-of-words.
Context Relevance: Cosine similarity between query and context.
Answer Grounding: Cosine similarity between answer and context (highest chunk).
Context Coverage: Overlap between answer terms and full context.
Additional heuristics: Length checks, info density, redundancy, generic phrases.
Zero dependencies · Pure TypeScript · < 5ms runtime

Configuration Options

You can customize strictness by passing an options object:

import { guardRAG } from 'rag-guard';

const result = guardRAG(input, {
  relevanceThreshold: 0.25,     // Min context relevance
  groundingThreshold: 0.4,      // Min answer grounding
  coverageThreshold: 0.2,       // Min query coverage in context
  minContextLength: 50,         // Min chars for context
  maxContextLength: 10000,      // Max chars for context
  genericAnswerPhrases: [       // Custom refusal phrases
    "i don't know",
    "not sufficient information",
    "cannot provide"
  ]
});

Production Use Cases

Safety gate example

Block answers that aren't supported by your internal data to prevent hallucinations in customer support chat bots.

Monitoring & observability example

Log the metrics object for every RAG interaction. Over time, visualize contextRelevance to measure the quality of your vector retriever, and answerGrounding to measure how well your LLM adheres to instructions.

Performance

rag-guard is designed to add negligible latency (< 5ms) to your pipeline. It uses efficient set operations and string manipulation, avoiding heavy embedding models or external API calls.

Where rag-guard fits

User Query 
    │
    ▼
[Retriever] ──► (Documents)
    │
    ▼
[LLM] ──► (Answer)
    │
    ▼
[rag-guard] ◄── (Query, Docs, Answer)
    │
    ├──► ✅ Safe: Return to User
    └──► ❌ Unsafe: Return fallback / Retry

Why use rag-guard?

Using an LLM to evaluate another LLM (LLM-as-a-judge) is slow, expensive, and non-deterministic. rag-guard provides a baseline logic layer that catches 80% of common RAG failures instantly and cheaply.

Contributing

Contributions are welcome! Please ensure all logic remains deterministic and dependency-free.

Author

Divesh Sarkar

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

rag-guard

What problems does it solve?

Key characteristics

Badges

Installation

Quick Start

Core API

Input

Output

How evaluation works (high-level)

Configuration Options

Production Use Cases

Safety gate example

Monitoring & observability example

Performance

Where rag-guard fits

Why use rag-guard?

Contributing

Author

License