@they-juanreina/compost-evals
v0.2.1
Published
Eval surfaces: versioned LLM-as-judge rubric, eval-grader loop, skill golden-set runner.
Downloads
1,369
Readme
evals
Three eval surfaces, all stored locally in .compost/evals.sqlite:
- Skill evals — golden-set examples per skill in
golden/<skill>/. Score coverage, faithfulness, schema conformance. CI-friendly. - AI-suggestion evals — live, per-event LLM-as-judge runs from the eval-grader loop.
- End-to-end harness evals — "complete seed" fixtures; gates major releases.
See ROADMAP.md § Evals.
