npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/rag-eval-metrics

v0.1.0

Published

RAG evaluation metric scorers: faithfulness, relevance, context precision/recall

Downloads

227

Readme

@reaatech/rag-eval-metrics

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Heuristic metric scorers for RAG evaluation. Provides four independent scorers — faithfulness, relevance, context precision, and context recall — plus a MetricsEngine orchestrator that runs them in parallel with configurable concurrency.

Installation

npm install @reaatech/rag-eval-metrics
# or
pnpm add @reaatech/rag-eval-metrics

Feature Overview

  • Faithfulness — measures factual grounding of the answer in retrieved context (statement-level decomposition)
  • Relevance — measures semantic alignment between query and answer (intent decomposition + cosine similarity)
  • Context Precision — measures retrieval ranking quality via MAP (Mean Average Precision) and NDCG
  • Context Recall — measures ground truth coverage by decomposing facts and checking context overlap
  • Parallel executionMetricsEngine runs all configured scorers concurrently with configurable parallelJobs
  • Heuristic-first — no LLM calls required; all scorers use NLP libraries (compromise, natural)

Quick Start

import {
  FaithfulnessScorer,
  RelevanceScorer,
  ContextPrecisionScorer,
  ContextRecallScorer,
  MetricsEngine,
} from "@reaatech/rag-eval-metrics";

const engine = new MetricsEngine({ parallelJobs: 4 });

const result = await engine.evaluateSample(
  {
    query: "What is the refund policy?",
    context: [
      "Refunds are processed within 14 days of purchase.",
      "Contact [email protected] for refund requests.",
    ],
    ground_truth: "Refunds must be requested within 14 days by contacting support.",
    generated_answer: "You can request a refund within 14 days by emailing support.",
  },
  { metrics: ["faithfulness", "relevance", "context_precision", "context_recall"] },
  0
);

console.log(result.faithfulness?.score); // ~0.95
console.log(result.relevance?.score);    // ~0.88

API Reference

FaithfulnessScorer

Decomposes the generated answer into atomic statements and verifies each against the provided context.

import { FaithfulnessScorer } from "@reaatech/rag-eval-metrics";

const scorer = new FaithfulnessScorer();
const result = await scorer.score(sample);
// → { score: 0.90, statements: [...], supported_count: 8, total_count: 9 }

| Property | Type | Description | |----------|------|-------------| | score | number | Ratio of supported statements to total (0–1) | | statements | string[] | Decomposed atomic statements from the answer | | supported_count | number | Number of statements supported by context | | total_count | number | Total number of extracted statements |

RelevanceScorer

Decomposes the query into intents and checks how well the answer addresses each intent using semantic similarity.

import { RelevanceScorer } from "@reaatech/rag-eval-metrics";

const scorer = new RelevanceScorer();
const result = await scorer.score(sample);
// → { score: 0.88, intents: [...], similarity: 0.82 }

| Property | Type | Description | |----------|------|-------------| | score | number | Composite relevance score (0–1) | | intents | string[] | Decomposed query intents | | similarity | number | Cosine similarity between intent and answer embeddings |

ContextPrecisionScorer

Evaluates how well the retrieval system ranks relevant context chunks. Computes MAP and NDCG against the ground truth.

import { ContextPrecisionScorer } from "@reaatech/rag-eval-metrics";

const scorer = new ContextPrecisionScorer();
const result = await scorer.score(sample);
// → { score: 0.75, map: 0.72, ndcg: 0.78, relevant_ranks: [1, 3] }

| Property | Type | Description | |----------|------|-------------| | score | number | Average of MAP and NDCG | | map | number | Mean Average Precision | | ndcg | number | Normalized Discounted Cumulative Gain | | relevant_ranks | number[] | Rank positions of relevant chunks (1-indexed) |

ContextRecallScorer

Decomposes the ground truth into individual facts and measures how many are covered by the retrieved context.

import { ContextRecallScorer } from "@reaatech/rag-eval-metrics";

const scorer = new ContextRecallScorer();
const result = await scorer.score(sample);
// → { score: 0.90, total_facts: 5, covered_facts: 4 }

| Property | Type | Description | |----------|------|-------------| | score | number | Ratio of covered facts to total (0–1) | | total_facts | number | Number of facts extracted from ground truth | | covered_facts | number | Number of facts found in retrieved context |

MetricsEngine

Orchestrates parallel metric computation.

import { MetricsEngine } from "@reaatech/rag-eval-metrics";

const engine = new MetricsEngine({ parallelJobs: 5 });

// Evaluate a single sample
const result = await engine.evaluateSample(sample, config, index);

// Aggregate results across all samples
const aggregated = engine.aggregateResults(sampleResults);
// → { overall_score, avg_faithfulness, avg_relevance, ..., std_dev: { ... } }

Constructor Options

| Property | Type | Default | Description | |----------|------|---------|-------------| | parallelJobs | number | 5 | Maximum concurrent metric evaluations |

Usage Patterns

Individual Scorer

import { FaithfulnessScorer } from "@reaatech/rag-eval-metrics";

const scorer = new FaithfulnessScorer();

const result = await scorer.score({
  query: "What is the refund policy?",
  context: ["Refunds are processed within 14 days."],
  ground_truth: "Refunds within 14 days.",
  generated_answer: "You have 14 days to request a refund.",
});

if (result.score < 0.85) {
  console.warn("Answer may contain hallucinations");
}

Batch Evaluation with Aggregation

import { MetricsEngine } from "@reaatech/rag-eval-metrics";
import type { EvaluationSample, EvalSuiteConfig } from "@reaatech/rag-eval-core";

const engine = new MetricsEngine({ parallelJobs: 8 });
const config: EvalSuiteConfig = {
  metrics: ["faithfulness", "relevance", "context_precision", "context_recall"],
};

const results = await Promise.all(
  samples.map((sample, i) => engine.evaluateSample(sample, config, i))
);

const aggregated = engine.aggregateResults(results);
console.log("Overall score:", aggregated.overall_score);
console.log("Faithfulness:", aggregated.avg_faithfulness);

Related Packages

License

MIT