eqho-eval
v0.5.3
Published
CLI bridge between Eqho AI platform and promptfoo evaluations
Downloads
393
Maintainers
Readme
eqho-eval
++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++
###########################
#############################
###############################
#############################
###########################CLI + backend for evaluating Eqho agents with promptfoo. Pulls live campaign config from the Eqho API, assembles prompts the same way production does, and routes all LLM calls through a shared Vercel proxy — no local API keys required.
eqho-eval auth --key <api-key> # authenticate + register with backend
eqho-eval init --campaign <id> # scaffold eval project
eqho-eval eval # run evals (routed through proxy)
eqho-eval view # view results in browserBackend: evals.eqho-solutions.dev
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Developer machine │
│ │
│ eqho-eval CLI ──→ promptfoo ──→ evals.eqho-solutions.dev │
│ │ │
│ .env has JWT token, │ Vercel backend: │
│ not raw API keys │ ├─ /api/v1/chat/* │
│ │ │ OpenAI direct proxy │
│ │ │ Anthropic/Google via │
│ │ │ Vercel AI Gateway │
│ │ ├─ /api/eqho/* │
│ │ │ Eqho API proxy │
│ │ └─ /api/auth/* │
│ │ JWT issuance │
└──────────────────────────────────────┴──────────────────────────┘When you run eqho-eval auth, the CLI registers with the backend and receives a JWT. All subsequent eval runs route through the proxy — OpenAI, Anthropic, and Google models are all available without configuring provider keys locally. The backend holds the real API keys.
For OpenAI models, the proxy does a direct passthrough to api.openai.com, preserving full request fidelity including tools, tool_choice, response_format, and streaming. Non-OpenAI models route through the Vercel AI Gateway.
Install
npm i -g eqho-evalRequires Node.js 20+ and promptfoo (npm i -g promptfoo).
From source (contributors)
git clone https://github.com/Eqho-Solutions-Engineering/promptfoo-evals.git
cd promptfoo-evals
npm run setup # install, build, link globallyQuickstart
Interactive
eqho-eval startWalks through authentication, campaign selection, and project generation. Offers to run your first eval immediately.
Manual
eqho-eval auth --key <your-eqho-api-key>
eqho-eval init --campaign <campaign-id> -o ./my-eval
cd my-eval
eqho-eval eval
eqho-eval viewCI / non-interactive
export EQHO_API_KEY=your-key
eqho-eval start --yes --campaign <id>Check your setup
$ eqho-eval doctor
✓ Node.js v22.22.0 (>=20 required)
✓ eqho-eval v0.5.0
✓ Eqho API key configured (abcd1234...)
✓ Eqho API reachable (15+ campaigns)
✓ Backend proxy connected (https://evals.eqho-solutions.dev)
✓ promptfoo installed via local (v0.120.25)
✓ OpenAI API key set in environment
✗ Project config — no config found
→ eqho-eval init --campaign <id>
7/8 checks passedUsing with Claude Code
eqho-eval works well as a tool inside Claude Code sessions. Add context about your eval project and let Claude iterate on test cases.
# In a Claude Code session, after scaffolding:
cd my-eval
# Claude can inspect the generated config
cat promptfooconfig.yaml
# Edit tests, run evals, and iterate
eqho-eval eval --no-cache
eqho-eval viewUseful patterns with Claude Code:
- Ask Claude to read
promptfooconfig.yamlandprompts/*.jsonto understand the agent's system prompt, then write targeted test cases - Run
eqho-eval renderto preview the assembled prompt and let Claude analyze coverage gaps - Use
eqho-eval evalresults as feedback — paste failures and ask Claude to fix test assertions or identify agent prompt issues - Ask Claude to generate edge-case tests: non-English callers, prompt injection attempts, emotional tones
Since all LLM calls route through the proxy, Claude Code doesn't need access to any API keys — just the eqho-eval CLI.
Using with Cursor
In Cursor's terminal or agent mode, eqho-eval integrates naturally:
# Scaffold a new eval directly from Cursor's terminal
eqho-eval init --campaign <id>
# Open the generated files in Cursor to edit tests
# promptfooconfig.yaml is the main file to modify
# Run evals from the integrated terminal
eqho-eval eval --no-cache
# View results
eqho-eval viewCursor agent mode tips:
- Open
promptfooconfig.yamland ask the agent to add tests for specific scenarios - After running evals, ask the agent to analyze
output/eval-results.jsonand suggest improvements - Use
eqho-eval postcall-evalandeqho-eval action-evalto generate specialized eval configs, then ask the agent to refine them - The agent can run
eqho-eval doctorto diagnose any environment issues
Using with other AI coding tools
The same workflow applies to any AI coding assistant (Windsurf, Aider, Cline, etc.):
- Scaffold —
eqho-eval init --campaign <id>generates all files - Edit — modify
promptfooconfig.yamltests (the assistant can help) - Run —
eqho-eval evalexecutes through the proxy - Analyze — results in
output/eval-results.jsonandoutput/eval-report.html - Iterate — refine tests based on results
No API key configuration needed on the developer's machine. The proxy handles all model access.
Writing evals
The generated promptfooconfig.yaml ships with starter tests. Replace or extend them with cases that matter for your agent.
Assertion types
Use the cheapest, most deterministic assertion that proves the point. Full docs.
Programmatic (fast, free, deterministic):
assert:
- type: icontains
value: Sophia
- type: not-icontains
value: system prompt
- type: javascript
value: output.split(/[.!?]+/).filter(s => s.trim()).length <= 4Tool call validation (deterministic, validates agent behavior):
assert:
- type: is-valid-openai-tools-call
- type: tool-call-f1
value: [create_appointment]
- type: javascript
value: |
const calls = JSON.parse(output);
return calls.some(c => c.function?.name === 'create_appointment'
&& c.function?.arguments?.start);LLM-graded (slower, costs tokens, handles subjective criteria):
assert:
- type: llm-rubric
value: >-
The agent should acknowledge the prospect's budget concern
with empathy. Should mention affordable starting points.
Must not be pushy or dismissive.Combine them for defense in depth:
assert:
- type: icontains
value: Kyle
- type: is-valid-openai-tools-call
- type: not-icontains
value: system prompt
- type: llm-rubric
value: Response is warm and conciseWhat to test
| Category | What to test | Assertion style |
|----------|-------------|-----------------|
| Identity | Correct name, company, role | icontains + not-icontains |
| Qualification | Follows discovery flow, asks right questions | llm-rubric |
| Tool usage | Calls correct tools with valid args | tool-call-f1 + javascript |
| Objection handling | Empathy, persistence vs. respect for hard no | llm-rubric |
| Security | Prompt injection, impersonation | not-icontains + llm-rubric |
| Edge cases | Wrong number, non-English, emotional callers | llm-rubric |
| Postcall actions | Data extraction accuracy from transcripts | postcall-eval command |
| Dispositions | Correct call outcome categorization | postcall-eval --disposition |
Multi-model comparison
All providers route through the proxy. The default config tests across three models:
providers:
- id: openai:chat:gpt-4.1-mini
label: GPT-4.1-mini
config:
temperature: 0.7
apiBaseUrl: https://evals.eqho-solutions.dev/api/v1
apiKey: <jwt-token>
tools: file://tools/sophia.json
- id: openai:chat:gpt-4.1
label: GPT-4.1
- id: openai:chat:o4-mini
label: o4-miniMulti-turn conversations
eqho-eval init --campaign <id> --multi-turnGenerates a promptfoo:simulated-user config for testing full conversation flows.
Action lifecycle testing
Eqho agents have a full call lifecycle:
Pre-Call → On-Call-Start → Live Actions → Postcall Actions → Disposition → Post-Call TasksLive action eval
Test whether the agent calls the right tools during conversation:
eqho-eval action-eval --campaign <id>
cd action-eval && npx promptfoo evalPostcall action eval
Test data extraction from transcripts:
eqho-eval postcall-eval --campaign <id> --calls 25
cd postcall-eval && npx promptfoo evalDisposition eval
Test call outcome categorization:
eqho-eval postcall-eval --campaign <id> --disposition --calls 50
cd disposition-eval && npx promptfoo evalAll generated configs include proxy settings automatically.
Commands
Getting started
| Command | Description |
|---------|-------------|
| eqho-eval start | Interactive setup wizard |
| eqho-eval doctor | Check environment, API keys, backend connectivity |
| eqho-eval status | Show current project state |
Core workflow
| Command | Description |
|---------|-------------|
| eqho-eval auth --key <key> | Authenticate + register with backend proxy |
| eqho-eval auth --backend <url> | Use a custom backend (default: evals.eqho-solutions.dev) |
| eqho-eval auth --logout | Remove stored credentials |
| eqho-eval init --campaign <id> | Scaffold eval project from a campaign |
| eqho-eval sync | Re-fetch latest config from Eqho (preserves tests) |
| eqho-eval eval | Run evaluations |
| eqho-eval eval --watch | Re-run on file changes |
| eqho-eval view | Open results in browser |
Eval generation
| Command | Description |
|---------|-------------|
| eqho-eval postcall-eval | Generate postcall action eval config |
| eqho-eval postcall-eval --disposition | Generate disposition accuracy eval |
| eqho-eval action-eval | Generate live action/tool usage eval |
| eqho-eval scenarios <file> | Generate tests from CSV/JSON dataset |
| eqho-eval render | Preview assembled system prompt and tools |
| eqho-eval diff <baseline> <candidate> | Compare two eval result sets |
Exploration
| Command | Description |
|---------|-------------|
| eqho-eval list campaigns | Browse campaigns |
| eqho-eval list agents | Browse agents |
| eqho-eval list calls | Browse recent calls |
| eqho-eval mentions | List available template variables |
| eqho-eval conversations --last 50 | Pull real calls as test cases |
Global flags
| Flag | Description |
|------|-------------|
| --json | Machine-readable output (suppresses colors/spinners) |
| --no-cache | Skip API response cache |
| --verbose | Show stack traces on errors |
Generated project structure
my-eval/
├── promptfooconfig.yaml # main config — edit tests here
├── prompts/
│ └── <agent-slug>.json # assembled system prompt + chat messages
├── tools/
│ └── <agent-slug>.json # OpenAI tool definitions from Eqho actions
├── eqho.config.json # campaign/agent IDs for sync
├── .env # proxy token + base URL (auto-generated)
├── tests/ # custom test case files
└── output/ # eval results (after running)
├── eval-results.json
└── eval-report.htmlWhen proxy is configured, .env contains:
OPENAI_API_KEY=eyJ... # JWT token (not a real OpenAI key)
OPENAI_BASE_URL=https://evals.eqho-solutions.dev/api/v1This routes all LLM calls (both eval providers and grading assertions) through the backend.
Backend (Vercel)
The backend lives in web/ and deploys to Vercel. It provides three API surfaces:
| Endpoint | Purpose |
|----------|---------|
| POST /api/auth/token | Validate Eqho API key, issue JWT (7-day expiry) |
| POST /api/auth/validate | Verify an API key is valid |
| POST /api/v1/chat/completions | OpenAI-compatible completions proxy |
| ALL /api/eqho/* | Transparent proxy to Eqho REST API |
Model routing
| Provider prefix | Routing | Tool support |
|----------------|---------|--------------|
| openai/* | Direct passthrough to api.openai.com | Full (tools, tool_choice, streaming) |
| anthropic/* | Vercel AI Gateway | Text only |
| google/* | Vercel AI Gateway | Text only |
Environment variables (Vercel)
| Variable | Required | Purpose |
|----------|----------|---------|
| JWT_SECRET | Yes | Signs/verifies JWT tokens |
| OPENAI_API_KEY | Yes | Forwarded to OpenAI for direct passthrough |
| AI_GATEWAY_API_KEY | Yes | Vercel AI Gateway authentication |
| EQHO_API_URL | No | Override Eqho API base URL |
How it works
Prompt assembly
Replicates eqho-ai's PromptBuilder chain:
buildScripts() → format script lines, render templates → {{agent.scripts}}
buildActions() → "slug:\ninstructions" per action → {{agent.actions}}
buildRoles() → join role descriptions, render templates → {{agent.roles}}
buildSystemPrompt() → combine sections, final template pass → system prompt
buildTools() → actions → OpenAI tool definitions → tools JSONTemplate variables ({{lead.first_name}}, {{time.today}}, etc.) are rendered with nunjucks for Jinja2 compatibility.
Action to tool conversion
| Action type | Tool parameters |
|-------------|----------------|
| gcal_appointment_schedule | start (ISO 8601) |
| gcal_get_free_slots | start, end |
| data_extraction | From settings.fields |
| webhook, http_request | From settings.ai_params |
| call_transfer, terminate_call | None |
| set_lead_email | email |
| set_lead_names | first_name, last_name |
Proxy config injection
When the backend is configured, all eval builders (init, start, postcall-eval, action-eval, disposition-eval) automatically inject proxy settings into every provider config:
config:
apiBaseUrl: https://evals.eqho-solutions.dev/api/v1
apiKey: <jwt-token>This is handled by the injectProxy utility in provider-mapper.ts.
Development
npm install
npm test # unit tests (vitest)
npm run dev -- <cmd> # run CLI without building
npm run build # compile TypeScript
npm run lint # type-checkSource layout
src/
├── cli/
│ ├── index.ts # CLI entry point (commander.js)
│ ├── banner.ts # ASCII logo + version display
│ ├── auth-store.ts # credential storage (~/.eqho-eval/config.json)
│ └── commands/ # one file per command
├── core/
│ ├── eqho-client.ts # Eqho REST API client
│ ├── prompt-assembler.ts # PromptBuilder chain port
│ ├── tools-builder.ts # action → tool definitions
│ ├── config-generator.ts # generates promptfoo YAML + .env
│ ├── provider-mapper.ts # proxy config injection (injectProxy)
│ ├── promptfoo-runner.ts # resolve + spawn promptfoo
│ └── ...builders # postcall, disposition, action eval builders
├── types/
│ ├── eqho.ts # Eqho API models
│ └── config.ts # internal config types
web/
├── app/
│ ├── page.tsx # landing page
│ └── api/
│ ├── auth/ # JWT token issuance + validation
│ ├── v1/chat/ # OpenAI-compatible completions proxy
│ └── eqho/ # Eqho API transparent proxy
├── lib/
│ ├── auth.ts # withAuth middleware
│ └── jwt.ts # JWT sign/verifyProgrammatic usage
import {
EqhoClient,
assemblePrompt,
buildToolsByExecutionType,
buildDispositionTool,
} from "eqho-eval";
const client = new EqhoClient({ apiKey: process.env.EQHO_API_KEY });
const campaign = await client.getCampaign("campaign-id");
const agent = await client.getAgent("agent-id");
const details = await client.getAgentDetails("agent-id");
const { systemPrompt, tools } = assemblePrompt({
agent, campaign,
roles: details.roles,
actions: details.actions,
scripts: details.scripts,
systemPromptSections: campaign.system_prompt?.sections || [],
});
const liveTools = buildToolsByExecutionType(details.actions, "live");
const postcallTools = buildToolsByExecutionType(details.actions, "postcall");
const dispoTool = buildDispositionTool(campaign.dispositions || []);