@wix/evalforge-evaluator
v0.129.0
Published
EvalForge Evaluator
Downloads
6,370
Maintainers
Keywords
Readme
@wix/evalforge-evaluator
CLI tool that executes AI agent evaluations. It fetches an eval run configuration from the backend, runs each scenario against a Claude Code agent, streams trace events, runs assertions, and reports results.
How It Works
evaluator <project-id> <eval-run-id>- Load configuration from environment variables (server URL, AI Gateway credentials, etc.)
- Fetch evaluation data from the backend API — eval run, scenarios, agent config, skills, MCPs, sub-agents, rules, and templates
- For each scenario:
- Prepare a working directory (download and extract template)
- Write skills to
.claude/skills/<name>/SKILL.md - Write MCPs to
.mcp.json - Write sub-agents to
.claude/agents/<name>.md - Write rules to
CLAUDE.md,AGENTS.md, or.cursor/rules/<name>.mdbased on rule type - Launch the Claude Code agent with the scenario's trigger prompt via
@anthropic-ai/claude-agent-sdk - Stream trace events back to the backend
- Run assertions on the agent's output
- Report the scenario result
- Finalize — set eval run status to
COMPLETEDorFAILED
Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| EVAL_SERVER_URL | Yes | Backend server URL for fetching data and reporting results |
| AI_GATEWAY_URL | Yes | AI Gateway base URL for LLM calls |
| AI_GATEWAY_HEADERS | No | Custom headers for AI Gateway (newline-separated key:value pairs) |
| EVAL_API_PREFIX | No | API path prefix (e.g., /api/v1) |
| EVALUATIONS_DIR | No | Directory for evaluation working directories |
| TRACE_PUSH_URL | No | URL for pushing trace events (remote job execution) |
| EVAL_ROUTE_HEADER | No | x-wix-route header for deploy preview routing |
| EVAL_AUTH_TOKEN | No | Bearer token for public endpoint authentication |
The evaluator is typically launched by the backend (locally or on a remote Dev Machine) with these variables pre-configured.
Scripts
yarn build # Build CJS + ESM + type declarations
yarn test # Run tests
yarn lint # Run ESLint
yarn clean # Remove build artifactsDependencies
@wix/evalforge-types— shared type definitions@wix/eval-assertions— assertion evaluation framework@wix/evalforge-github-client— GitHub API client for fetching skill files@anthropic-ai/claude-agent-sdk— Claude Code agent SDK
