@brainst0rm/eval
v0.13.0
Published
Capability probes (7 dimensions), eval runner, and scorer
Maintainers
Readme
@brainst0rm/eval
Capability evaluation system — probes, runner, scorer, and scorecard.
Key Exports
EvalRunner— Execute capability probes against modelsScorer— Score model responses against rubricsScorecard— Aggregate eval results into a capability profile
Purpose
Eval results feed into the capability routing strategy, enabling data-driven model selection based on measured performance rather than assumptions.
Results are stored as JSONL in ~/.brainstorm/evals/ and in the SQLite database.
