@auggy/evals
v0.4.3
Published
Eval suites for auggy agents — security (red-team prompts), auto-save (fact extraction fixtures), layered-memory (multi-session recall), plus the harness primitives all three build on.
Downloads
383
Maintainers
Readme
@auggy/evals
Eval suites for auggy agents.
| Suite | What it tests | Cost |
|---|---|---|
| security | Red-team prompts (jailbreak, prompt injection, identity leak, instruction override) | LLM-judged; ~$0.07/run on Haiku |
| auto-save | Layered-memory fact extraction fixtures (peer isolation, retention class, false-extract) | Free in --dry-run; ~$0.005/run live |
| layered-memory | Multi-session recall + cross-identity promotion + cost-overhead graders | Structural (no LLM) |
Install
npm i -g @auggy/evalsThen run via the auggy CLI:
auggy eval # default fixture, security suite
auggy eval my-agent # registered agent
auggy eval auto-save --dry-run # fixture validation only
auggy eval my-agent --suite security-onlyYou don't normally import from this package — auggy eval resolves it lazily at command-run time.
Why a separate package
Eval fixtures + graders are ~1MB of test infrastructure (prompts, expected-output specs, scoring rubrics). Shipping them in auggy core would inflate every install for the small slice of users who run evals. The split mirrors @auggy/anthropic / @auggy/openai / etc. — opt-in via npm.
License
Apache-2.0 — see LICENSE.
