@gnsx/genesys.agent.eval
v1.3.2
Published
Agent evaluation harness for benchmarking AI agents against test suites
Readme
Local Testing
# Run in this folder in the monorepo
pnpm dev --tests ./examples/basic-tests.yaml -a "tsx ../cli/src/launcher.ts"References:
- https://www.chanl.ai/blog/how-to-evaluate-ai-agents-build-eval-framework
- https://platform.claude.com/docs/en/test-and-evaluate/develop-tests
