npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/rag-eval-dataset

v0.1.0

Published

Dataset loading, validation, generation, and versioning for RAG evals

Readme

@reaatech/rag-eval-dataset

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Dataset management utilities for RAG evaluation. Loads evaluation samples from JSONL, JSON, and YAML files; validates samples against Zod schemas; generates synthetic datasets from templates; and tracks dataset versioning.

Installation

npm install @reaatech/rag-eval-dataset
# or
pnpm add @reaatech/rag-eval-dataset

Feature Overview

  • Multi-format loading — read evaluation samples from JSONL, JSON array, and YAML files
  • Schema validation — validate every sample against EvaluationSampleSchema with detailed error reporting
  • Duplicate detection — identify duplicate queries and context sets across samples
  • Synthetic generation — generate evaluation datasets from templates with configurable difficulty
  • Version tracking — maintain dataset changelogs and version identifiers
  • Config loading — load evaluation suite configurations from YAML or JSON files

Quick Start

import { DatasetLoader, DatasetValidator } from "@reaatech/rag-eval-dataset";

const loader = new DatasetLoader();

// Load samples from any supported format
const samples = await loader.load("datasets/eval-samples.jsonl");
console.log(`Loaded ${samples.length} samples`);

// Validate the dataset
const validator = new DatasetValidator();
const result = validator.validate(samples);

if (!result.valid) {
  for (const error of result.errors) {
    console.error(`[${error.field}] ${error.message}`);
  }
}

API Reference

DatasetLoader

Loads evaluation datasets from files or strings.

import { DatasetLoader } from "@reaatech/rag-eval-dataset";

const loader = new DatasetLoader();

Loading Methods

| Method | Returns | Description | |--------|---------|-------------| | load(path: string) | Promise<EvaluationSample[]> | Auto-detect format from file extension and load | | loadFromString(content, format) | Promise<EvaluationSample[]> | Parse content string in specified format ("jsonl" \| "json") |

Supported Formats

| Format | Extension | Structure | |--------|-----------|-----------| | JSONL | .jsonl | One JSON object per line | | JSON | .json | Array of sample objects | | YAML | .yaml, .yml | Array of sample objects |

Each sample is validated against EvaluationSampleSchema from @reaatech/rag-eval-core. Invalid lines in JSONL files are skipped with a warning.

Config Loading

import { loadEvalConfig } from "@reaatech/rag-eval-dataset";

const config = await loadEvalConfig("eval-config.yaml");
// → EvalSuiteConfig with metrics, judge, cost, gates, execution

| Export | Description | |--------|-------------| | loadEvalConfig(path) | Load and validate an EvalSuiteConfig from YAML or JSON |

DatasetValidator

Validates datasets for structural correctness and quality issues.

import { DatasetValidator } from "@reaatech/rag-eval-dataset";

const validator = new DatasetValidator();
const result = validator.validate(samples);

ValidationResult

| Property | Type | Description | |----------|------|-------------| | valid | boolean | Whether the dataset passed all checks | | errors | ValidationError[] | Errors found (empty if valid) | | warnings | ValidationWarning[] | Non-blocking warnings |

ValidationError

| Property | Type | Description | |----------|------|-------------| | field | string | Field name or sample index | | message | string | Human-readable error description |

Validations Performed

  • Schema compliance — every sample matches EvaluationSampleSchema
  • Required fieldsquery, context, ground_truth, generated_answer
  • Non-empty context — context arrays must contain at least one chunk
  • Non-empty dataset — dataset must contain at least one sample
  • Duplicate detection — identical queries with matching context triggers a warning

DatasetGenerator

Generates synthetic evaluation datasets from templates.

import { DatasetGenerator } from "@reaatech/rag-eval-dataset";

const generator = new DatasetGenerator();
const samples = generator.generate({
  templates: myTemplates,
  count: 100,
  difficulty: "medium",
  domain: "customer-support",
});

GeneratorConfig

| Property | Type | Default | Description | |----------|------|---------|-------------| | templates | DatasetTemplate[] | (required) | Templates for sample generation | | count | number | 10 | Number of samples to generate | | difficulty | "easy" \| "medium" \| "hard" | "medium" | Difficulty level | | domain | string | — | Domain label for metadata |

DatasetVersioning

Tracks dataset version history and changelogs.

import { DatasetVersioning } from "@reaatech/rag-eval-dataset";

const versioning = new DatasetVersioning();

// Record a new version
versioning.addVersion({
  version: "v1.1.0",
  description: "Added 50 new e-commerce samples",
  timestamp: new Date().toISOString(),
});

// Get version history
const history = versioning.getHistory();

Usage Patterns

Loading and Validating a Dataset

import { DatasetLoader, DatasetValidator } from "@reaatech/rag-eval-dataset";

const loader = new DatasetLoader();
const validator = new DatasetValidator();

try {
  const samples = await loader.load("eval-dataset.jsonl");
  const result = validator.validate(samples);

  if (!result.valid) {
    console.error("Dataset validation failed:");
    for (const error of result.errors) {
      console.error(`  - ${error.message}`);
    }
    process.exit(1);
  }

  console.log(`Ready to evaluate ${samples.length} samples`);
} catch (err) {
  console.error("Failed to load dataset:", err);
  process.exit(1);
}

Loading Config from YAML

# eval-config.yaml
metrics:
  - faithfulness
  - relevance
  - context_precision
  - context_recall

judge:
  model: claude-opus
  enabled: true

cost:
  budget_limit: 10.00

gates:
  - name: min-faithfulness
    type: threshold
    metric: avg_faithfulness
    operator: ">="
    threshold: 0.85
import { loadEvalConfig } from "@reaatech/rag-eval-dataset";

const config = await loadEvalConfig("eval-config.yaml");

Related Packages

License

MIT