@halo-sdk/eval

v2.0.0

Published

2 days ago

Cache cost benchmark + behavioral-eval seam for Halo SDK — measure the prefix-cache moat, plug in promptfoo/vitest for quality

0High
0Medium
0Low

maplecity0512

ai benchmark eval llm prefix-cache

@halo-sdk/eval

Two things, deliberately scoped:

Cache cost benchmark — the differentiated "evidence" that proves Halo's prefix-cache moat. benchmarkCache(agent, inputs) drives an agent through a multi-turn scenario and reports hit rate, token split, estimated spend, and an A–F grade. compareCache(scenario, a, b) runs the same scenario through two agents — e.g. to show SummarizeAppendStrategy retains hit-rate where naive truncation collapses it.
Behavioral-eval seam — a thin runEvalCases(agent, cases) harness. For real behavioral/quality evaluation (LLM-as-judge, datasets, regression gates), point promptfoo or vitest at your agent. Halo does not reimplement generic eval.

Usage

import { benchmarkCache, compareCache, runEvalCases } from "@halo-sdk/eval";

const report = await benchmarkCache(agent, [
  "Summarize the cache design.",
  "Now expand on breakpoints.",
  "And how does it compare to OpenAI?",
]);
console.log(report.hitRate, report.grade, report.estimatedUsd);

const cmp = await compareCache(
  scenario,
  { name: "truncate", agent: a },
  { name: "summarize", agent: b },
);

const evalReport = await runEvalCases(agent, [
  { name: "greets", input: "say hi", assert: (out) => out.toLowerCase().includes("hi") },
]);

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

@halo-sdk/eval

v2.0.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@halo-sdk/eval

Usage