npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

evalguard

v0.3.9

Published

A plug-and-play TypeScript library that turns any LLM-powered function into a self-testing unit

Readme

EvalGuard-JS

A plug-and-play TypeScript library that turns any LLM-powered function into a self-testing unit.

Features

With one wrapper you can:

  • Record the model's current answers as snapshots.
  • Re-run & compare those answers on every commit, using opt-in metrics such as exact diff, embedding-based semantic similarity, Claude/OpenAI rubric-grading, graph-isomorphism, latency, etc.
  • Fail or pass the build according to fully-configurable thresholds, weights, and cost budgets.

All engines—metrics, LLM providers, reporters, snapshot stores—are discoverable plugins, so teams can swap models, disable features, or add new checks without touching core code. The result: every PR shows at a glance whether your AI output stayed the same, got better, or silently regressed.

Installation

npm install evalguard

Quick Start

import { Guard } from "evalguard";

// Wrap your LLM-powered function with the Guard decorator
export const guardedSearch = Guard({
  id: "search_cli",
  metrics: ["exact", "semantic>=0.92", "judge>=8"],
  store: "yaml",                 // or "sqlite", or your own
  provider: "claude",            // any registered adapter
  model: "claude-3-haiku-embed",
  judgeModel: "claude-3-sonnet",
  temperature: 0,
  concurrency: 3                // eval in parallel
})(probeSearch);

// Use it like the original function
const result = await guardedSearch("query", "context");

For frameworks that dislike decorators, you can use the imperative Runner:

import { Runner } from "evalguard";

const run = new Runner({ config: "evalguardrc.yaml" });
const { pass, results } = await run.test("search_cli", () => probeSearch("ranking", "."));

Configuration

EvalGuard can be configured using a YAML or JSON configuration file:

# .evalguardrc.yaml
defaults:
  provider: claude
  model: claude-3-sonnet-2025-05-10
  temperature: 0
  metrics: [exact, semantic>=0.92]

metrics:
  exact:
    enabled: true

  semantic:
    provider: claude-embed
    min: 0.92

  judge:
    enabled: false          # opt-out by default
    provider: openai
    model: gpt-4o-mini
    rubricFile: .eval/rubrics/default.md
    min: 8

snapshots:
  backend: yaml
  dir: .evalguard/snapshots

reporters:
  - console
  - github-check
budgetUsd: 2
concurrency: 3

Environment Variables

EvalGuard can be configured using environment variables, which is especially useful when integrating with other tools like the probe project:

# LLM Provider API Keys
export ANTHROPIC_API_KEY=your_anthropic_api_key    # For Claude models
export OPENAI_API_KEY=your_openai_api_key          # For OpenAI models

# API URLs (optional, for proxies or enterprise deployments)
export ANTHROPIC_API_URL=https://your-anthropic-proxy.com
export OPENAI_API_URL=https://your-openai-proxy.com/v1

# Model Selection
export MODEL_NAME=claude-3-opus-20240229           # Override the default model

# EvalGuard Configuration
export EVALGUARD_BUDGET_USD=5                      # Set budget limit in USD
export EVALGUARD_CONCURRENCY=4                     # Set concurrency level

When both ANTHROPIC_API_KEY and OPENAI_API_KEY are provided, EvalGuard will prioritize Claude models by default.

CLI Commands

EvalGuard comes with a CLI for managing snapshots and running tests:

# Initialize a new project
evalguard init

# Record or update snapshots
evalguard snapshot
evalguard snapshot -u  # update existing snapshots

# Run tests
evalguard run

# Show differences between snapshots and current results
evalguard diff

# List available metrics, providers, and reporters
evalguard list

Workflow Examples

# Run only exact + semantic (skip judge) for quick local check
evalguard run --metrics exact,semantic

# CI Stage 1: cheap metrics
evalguard run --failOn any --metrics exact,semantic

# CI Stage 2 (nightly): full judge pass
evalguard run --metrics judge --budget 10 --concurrency 1

Plugin System

EvalGuard uses a plugin system for metrics, providers, reporters, and snapshot stores. Plugins are discovered via the evalguard.plugins field in package.json.

Creating a Metric Plugin

import { Metric } from "evalguard";

export const myMetric: Metric = {
  name: "my-metric",
  init(opts, ctx) {
    // Initialize the metric
  },
  score(gold, fresh, prompt) {
    // Calculate the score
    return 0.95;
  }
};

Creating a Provider Plugin

import { LLMProvider } from "evalguard";

export const myProvider: LLMProvider = {
  name: "my-provider",
  generate(opts) {
    // Generate text
    return Promise.resolve("Generated text");
  },
  embed(texts) {
    // Generate embeddings
    return Promise.resolve(texts.map(() => Array(1536).fill(0).map(() => Math.random())));
  }
};

License

MIT