npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pauly4010/evalai-sdk

v1.9.1

Published

AI Evaluation Platform SDK - Complete API Coverage with Performance Optimizations

Readme

@pauly4010/evalai-sdk

npm version npm downloads TypeScript SDK Tests Contract Version License: MIT

One-command CI for AI evaluation. Complete pipeline: discover → manifest → impact → run → diff → PR summary.

Zero to production CI in 60 seconds. No infra. No lock-in. Remove anytime.


Quick Start (60 seconds)

Add this to your .github/workflows/evalai.yml:

name: EvalAI CI
on: [push, pull_request]
jobs:
  evalai:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx @pauly4010/evalai-sdk ci --format github --write-results --base main
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalai-results
          path: .evalai/

Create eval/your-spec.spec.ts:

import { defineEval } from "@pauly4010/evalai-sdk";

defineEval({
  name: "Basic Math Operations",
  description: "Test fundamental arithmetic",
  prompt: "Test: 1+1=2, string concatenation, array includes",
  expected: "All tests should pass",
  tags: ["basic", "math"],
  category: "unit-test"
});
git add .github/workflows/evalai.yml eval/
git commit -m "feat: add EvalAI CI pipeline"
git push

That's it! Your CI now:

  • ✅ Discovers evaluation specs automatically
  • ✅ Runs only impacted specs (smart caching)
  • ✅ Compares results against base branch
  • ✅ Posts rich summary in PR with regressions
  • ✅ Exits with proper codes (0=clean, 1=regressions, 2=config)

🚀 New in v1.9.0: One-Command CI

evalai ci - Complete CI Pipeline

npx @pauly4010/evalai-sdk ci --format github --write-results --base main

What it does:

  1. Discover - Finds all evaluation specs automatically
  2. Manifest - Builds stable manifest if missing
  3. Impact Analysis - Runs only specs impacted by changes (optional)
  4. Run - Executes evaluations with artifact retention
  5. Diff - Compares results against base branch
  6. PR Summary - Posts rich markdown summary to GitHub
  7. Debug Flow - Prints copy/paste next step on failure

Advanced Options:

npx @pauly4010/evalai-sdk ci --base main --impacted-only    # Run only impacted specs
npx @pauly4010/evalai-sdk ci --format json --write-results   # JSON output for automation
npx @pauly4010/evalai-sdk ci --base develop                  # Custom base branch

Smart Diffing & GitHub Integration

npx @pauly4010/evalai-sdk diff --base main --head last --format github

Features:

  • 📊 Pass rate delta and score changes
  • 🚨 Regression detection with classifications
  • 📈 Improvements and new specs
  • 📁 Artifact links and technical details
  • 🎯 Exit codes: 0=clean, 1=regressions, 2=config

Self-Documenting Failures

Every failure prints a clear next step:

🔧 Next step for debugging:
   Download base artifact and run: evalai diff --base .evalai/base-run.json --head .evalai/last-run.json
   Artifacts: .evalai/runs/

CLI Commands

🚀 One-Command CI (v1.9.0)

| Command | Description | |---------|-------------| | npx evalai ci | Complete CI pipeline: discover → manifest → impact → run → diff → PR summary | | npx evalai ci --base main | Run CI with diff against main branch | | npx evalai ci --impacted-only | Run only specs impacted by changes | | npx evalai ci --format github | GitHub Step Summary with rich markdown | | npx evalai ci --format json | JSON output for automation |

Discovery & Manifest

| Command | Description | |---------|-------------| | npx evalai discover | Find and analyze evaluation specs | | npx evalai discover --manifest | Generate stable manifest for incremental analysis |

Impact Analysis

| Command | Description | |---------|-------------| | npx evalai impact-analysis --base main | Analyze impact of changes | | npx evalai impact-analysis --changed-files file1.ts,file2.ts | Analyze specific changed files |

Run & Diff

| Command | Description | |---------|-------------| | npx evalai run | Run evaluation specifications | | npx evalai run --write-results | Run with artifact retention | | npx evalai diff --base main | Compare results against base branch | | npx evalai diff --base last --head last | Compare last two runs | | npx evalai diff --format github | GitHub Step Summary with regressions |

Legacy Regression Gate (local, no account needed)

| Command | Description | |---------|-------------| | npx evalai init | Full project scaffolder — creates everything you need | | npx evalai gate | Run regression gate locally | | npx evalai gate --format json | Machine-readable JSON output | | npx evalai gate --format github | GitHub Step Summary with delta table | | npx evalai baseline init | Create starter evals/baseline.json | | npx evalai baseline update | Re-run tests and update baseline with real scores | | npx evalai upgrade --full | Upgrade from Tier 1 (built-in) to Tier 2 (full gate) |

API Gate (requires account)

| Command | Description | |---------|-------------| | npx evalai check | Gate on quality score from dashboard | | npx evalai share | Create share link for a run |

Debugging & Diagnostics

| Command | Description | |---------|-------------| | npx evalai doctor | Comprehensive preflight checklist — verifies config, baseline, auth, API, CI wiring | | npx evalai explain | Offline report explainer — top failures, root cause classification, suggested fixes | | npx evalai print-config | Show resolved config with source-of-truth annotations (file/env/default/arg) |

Migration Tools

| Command | Description | |---------|-------------| | npx evalai migrate config --in evalai.config.json --out eval/migrated.spec.ts | Convert legacy config to DSL |

Guided failure flow:

evalai ci  →  fails  →  "Next: evalai explain --report .evalai/last-run.json"
                              ↓
                   evalai explain  →  root causes + fixes

GitHub Actions step summary — CI result at a glance with regressions and artifacts:

GitHub Actions step summary showing CI pass/fail with delta table

evalai explain terminal output — root causes + fix commands:

Terminal output of evalai explain showing top failures and suggested fixes

All commands automatically write artifacts so explain works with zero flags.

Gate Exit Codes

| Code | Meaning | |------|---------| | 0 | Pass — no regression | | 1 | Regression detected | | 2 | Infra error (baseline missing, tests crashed) |

Check Exit Codes (API mode)

| Code | Meaning | |------|---------| | 0 | Pass | | 1 | Score below threshold | | 2 | Regression failure | | 3 | Policy violation | | 4 | API error | | 5 | Bad arguments | | 6 | Low test count | | 7 | Weak evidence | | 8 | Warn (soft regression) |

Doctor Exit Codes

| Code | Meaning | |------|---------| | 0 | Ready — all checks passed | | 2 | Not ready — one or more checks failed | | 3 | Infrastructure error |


How the Gate Works

Built-in mode (any Node project, no config needed):

  • Runs <pm> test, captures exit code + test count
  • Compares against evals/baseline.json
  • Writes evals/regression-report.json
  • Fails CI if tests regress

Project mode (advanced, for full regression gate):

  • If eval:regression-gate script exists in package.json, delegates to it
  • Supports golden eval scores, confidence tests, p95 latency, cost tracking
  • Full delta table with tolerances

Run a Regression Test Locally (no account)

npm install @pauly4010/evalai-sdk openai
import { openAIChatEval } from "@pauly4010/evalai-sdk";

await openAIChatEval({
  name: "chat-regression",
  cases: [
    { input: "Hello", expectedOutput: "greeting" },
    { input: "2 + 2 = ?", expectedOutput: "4" },
  ],
});

Output: PASS 2/2 (score: 100). No account needed. Just a score.

Vitest Integration

import { openAIChatEval, extendExpectWithToPassGate } from "@pauly4010/evalai-sdk";
import { expect } from "vitest";

extendExpectWithToPassGate(expect);

it("passes gate", async () => {
  const result = await openAIChatEval({
    name: "chat-regression",
    cases: [
      { input: "Hello", expectedOutput: "greeting" },
      { input: "2 + 2 = ?", expectedOutput: "4" },
    ],
  });
  expect(result).toPassGate();
});

SDK Exports

Regression Gate Constants

import {
  GATE_EXIT,           // { PASS: 0, REGRESSION: 1, INFRA_ERROR: 2, ... }
  GATE_CATEGORY,       // { PASS: "pass", REGRESSION: "regression", INFRA_ERROR: "infra_error" }
  REPORT_SCHEMA_VERSION,
  ARTIFACTS,           // { BASELINE, REGRESSION_REPORT, CONFIDENCE_SUMMARY, LATENCY_BENCHMARK }
} from "@pauly4010/evalai-sdk";

// Or tree-shakeable:
import { GATE_EXIT } from "@pauly4010/evalai-sdk/regression";

Types

import type {
  RegressionReport,
  RegressionDelta,
  Baseline,
  BaselineTolerance,
  GateExitCode,
  GateCategory,
} from "@pauly4010/evalai-sdk/regression";

Platform Client

import { AIEvalClient } from "@pauly4010/evalai-sdk";

const client = AIEvalClient.init(); // from EVALAI_API_KEY env
// or
const client = new AIEvalClient({ apiKey: "...", organizationId: 123 });

Framework Integrations

import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
import { traceAnthropic } from "@pauly4010/evalai-sdk/integrations/anthropic";

Installation

npm install @pauly4010/evalai-sdk
# or
yarn add @pauly4010/evalai-sdk
# or
pnpm add @pauly4010/evalai-sdk

Add openai as a peer dependency if using openAIChatEval:

npm install openai

Environment Support

| Feature | Node.js | Browser | |---------|---------|---------| | Platform APIs (Traces, Evaluations, LLM Judge) | ✅ | ✅ | | Assertions, Test Suites, Error Handling | ✅ | ✅ | | CJS/ESM | ✅ | ✅ | | CLI, Snapshots, File Export | ✅ | — | | Context Propagation | ✅ Full | ⚠️ Basic |

No Lock-in

rm evalai.config.json

Your local openAIChatEval runs continue to work. No account cancellation. No data export required.

Changelog

See CHANGELOG.md for the full release history.

v1.8.0evalai doctor rewrite (9-check checklist), evalai explain command, guided failure flow, CI template with doctor preflight

v1.7.0evalai init scaffolder, evalai upgrade --full, detectRunner(), machine-readable gate output, init test matrix

v1.6.0evalai gate, evalai baseline, regression gate constants & types

v1.5.8 — secureRoute fix, test infra fixes, 304 handling fix

v1.5.5 — PASS/WARN/FAIL semantics, flake intelligence, golden regression suite

v1.5.0 — GitHub annotations, --onFail import, evalai doctor

License

MIT

Support

  • Docs: https://v0-ai-evaluation-platform-nu.vercel.app/documentation
  • Issues: https://github.com/pauly7610/ai-evaluation-platform/issues