@artale/pi-eval
v1.3.0
Published
Agent evaluation harness. Judge sessions on success, tool usage, efficiency, methodology. Inspired by opencc.
Maintainers
Readme
@artale/pi-eval
Agent evaluation harness. Judge coding sessions on methodology, efficiency, and success.
Install
npm install -g @artale/pi-evalTools
- eval_judge — Score a session's tool calls, errors, completion
- eval_handoff — Validate agent handoff confidence calibration
Commands
/eval— Evaluation utilities
