npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/agent-eval-harness-types

v0.1.0

Published

Shared domain types and Zod schemas for agent-eval-harness

Readme

@reaatech/agent-eval-harness-types

npm version License CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Canonical TypeScript domain types, Zod schemas, and interfaces for the agent-eval-harness ecosystem. This package is the foundational dependency of every other package in the monorepo.

Installation

npm install @reaatech/agent-eval-harness-types
# or
pnpm add @reaatech/agent-eval-harness-types

Feature Overview

  • 19 domain type interfacesTurn, Trajectory, EvalResult, JudgeScore, CostBreakdown, LatencyBudget, GoldenTrajectory, RegressionGate, and more
  • 20 Zod schemas — runtime validation for every domain type with full type inference via z.infer
  • Zero runtime dependencies beyond zod
  • Dual ESM/CJS output — works with import and require
  • Golden trajectory markersgolden, expected, and quality_notes fields on every turn
  • CI gate types — threshold, baseline-comparison, and distribution gates with regression tracking
  • Suite runner types — configuration, run status, comparison, and metric regression interfaces

Quick Start

import { TurnSchema, type Trajectory, type EvalResult } from '@reaatech/agent-eval-harness-types';

const turn = TurnSchema.parse({
  turn_id: 1,
  role: 'user',
  content: 'Hello',
  timestamp: '2026-04-15T00:00:00Z',
});

const trajectory: Trajectory = { turns: [turn], metadata: { total_turns: 1 } };

API Reference

Core Types

| Name | Type | Description | |------|------|-------------| | Turn | interface | Single turn in a trajectory with role, content, timestamp, and optional tool calls, latency, and cost | | ToolCall | interface | Tool invocation with name, arguments, and optional result | | CostData | interface | Token usage and cost for a single turn | | Trajectory | interface | Complete agent execution with turns array and optional metadata | | EvalResult | interface | Evaluation result with overall score, per-metric scores, and issues | | EvalIssue | interface | Issue found during evaluation with type, severity, and description |

Judge Types

| Name | Type | Description | |------|------|-------------| | JudgeScore | interface | LLM judge scoring result with score, explanation, confidence, and calibration status |

Cost Types

| Name | Type | Description | |------|------|-------------| | CostBreakdown | interface | Full cost breakdown for a trajectory with LLM, tool, and per-turn costs | | TurnCost | interface | Cost breakdown for a single turn with token counts |

Latency Types

| Name | Type | Description | |------|------|-------------| | LatencyBudget | interface | Latency SLA budget with P50, P90, P99 thresholds and component breakdowns | | LatencyResult | interface | Latency measurement result with percentiles, violations, and SLA status | | LatencyViolation | interface | SLA violation record with turn ID, actual vs threshold values |

Golden Types

| Name | Type | Description | |------|------|-------------| | GoldenTrajectory | interface | Golden reference trajectory with versioning and quality markers |

Gate Types

| Name | Type | Description | |------|------|-------------| | RegressionGate | interface | Gate definition with threshold, baseline-comparison, or distribution types | | GateResult | interface | Single gate evaluation result with pass/fail and actual vs expected values |

Suite Types

| Name | Type | Description | |------|------|-------------| | EvalSuiteConfig | interface | Suite configuration with metrics, judge model, budgets, gates, and parallelism | | EvalRunStatus | interface | Suite run progress with status, completion counts, and timing | | RunComparison | interface | Comparison of two evaluation runs with metric diffs and significance testing | | MetricRegression | interface | Single regression with baseline and candidate values and change percentage |

Schemas

| Name | Type | Description | |------|------|-------------| | ToolCallSchema | ZodObject | Validates tool invocation structure | | CostDataSchema | ZodObject | Validates token counts and cost data | | TurnSchema | ZodObject | Validates turn structure with optional tool calls, latency, and golden markers | | TrajectoryMetadataSchema | ZodObject | Validates trajectory metadata | | TrajectorySchema | ZodObject | Validates complete trajectory (minimum one turn, optional metadata) | | EvalIssueSchema | ZodObject | Validates evaluation issue records | | EvalResultSchema | ZodObject | Validates evaluation results with metrics and issues | | JudgeScoreSchema | ZodObject | Validates judge scoring output | | CostBreakdownSchema | ZodObject | Validates cost breakdowns with per-turn cost arrays | | LatencyBudgetSchema | ZodObject | Validates latency budget configuration | | LatencyViolationSchema | ZodObject | Validates latency SLA violations | | LatencyResultSchema | ZodObject | Validates latency measurement results | | QualityMarkersSchema | ZodObject | Validates golden trajectory quality markers | | GoldenTrajectorySchema | ZodObject | Validates golden trajectories with nested trajectory and quality markers | | RegressionGateSchema | ZodObject | Validates regression gate definitions | | GateResultSchema | ZodObject | Validates gate evaluation results | | EvalSuiteConfigSchema | ZodObject | Validates suite configuration with nested latency budget and gates | | EvalRunStatusSchema | ZodObject | Validates suite run status | | MetricRegressionSchema | ZodObject | Validates metric regression records | | RunComparisonSchema | ZodObject | Validates run comparison results with statistical significance arrays |

Related Packages

| Package | Description | |---------|-------------| | @reaatech/agent-eval-harness-types | Shared domain types and Zod schemas | | @reaatech/agent-eval-harness-trajectory | Trajectory loading, evaluation, and golden comparison | | @reaatech/agent-eval-harness-tool-use | Tool-use validation and schema compliance | | @reaatech/agent-eval-harness-cost | Cost tracking, budgets, and reporting | | @reaatech/agent-eval-harness-latency | Latency monitoring, SLA enforcement, and optimization | | @reaatech/agent-eval-harness-judge | LLM-as-judge with calibration and consensus | | @reaatech/agent-eval-harness-golden | Golden trajectory management and curation | | @reaatech/agent-eval-harness-suite | Suite runner, results aggregation, and comparison | | @reaatech/agent-eval-harness-gate | CI regression gates with JUnit and GitHub output | | @reaatech/agent-eval-harness-mcp-server | MCP server with three-layer tool architecture | | @reaatech/agent-eval-harness-cli | Command-line interface | | @reaatech/agent-eval-harness-observability | OTel tracing, metrics, structured logging, and dashboards |

License

MIT