eqho-eval

v0.5.3

Published

4 months ago

CLI bridge between Eqho AI platform and promptfoo evaluations

Downloads

0High
0Medium
0Low

eqhokyle

eqho promptfoo eval llm testing voice-ai

eqho-eval

  ++++++++++++++++++++++++++++++++++++++++++++++
 ++++++++++++++++++++++++++++++++++++++++++++++++
 +++++++++++++++++++++++++++++++++++++++++++++++++
 ++++++++++++++++++++++++++++++++++++++++++++++++
  ++++++++++++++++++++++++++++++++++++++++++++++

      ++++++++++++++++++++++++++++++++++++
     ++++++++++++++++++++++++++++++++++++++
     +++++++++++++++++++++++++++++++++++++++
     ++++++++++++++++++++++++++++++++++++++
      ++++++++++++++++++++++++++++++++++++

                    ###########################
                   #############################
                  ###############################
                   #############################
                    ###########################

CLI + backend for evaluating Eqho agents with promptfoo. Pulls live campaign config from the Eqho API, assembles prompts the same way production does, and routes all LLM calls through a shared Vercel proxy — no local API keys required.

eqho-eval auth --key <api-key>        # authenticate + register with backend
eqho-eval init --campaign <id>        # scaffold eval project
eqho-eval eval                        # run evals (routed through proxy)
eqho-eval view                        # view results in browser

Backend: evals.eqho-solutions.dev

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Developer machine                                              │
│                                                                 │
│  eqho-eval CLI ──→ promptfoo ──→ evals.eqho-solutions.dev      │
│                                      │                          │
│  .env has JWT token,                 │  Vercel backend:         │
│  not raw API keys                    │  ├─ /api/v1/chat/*       │
│                                      │  │  OpenAI direct proxy  │
│                                      │  │  Anthropic/Google via │
│                                      │  │  Vercel AI Gateway    │
│                                      │  ├─ /api/eqho/*          │
│                                      │  │  Eqho API proxy       │
│                                      │  └─ /api/auth/*          │
│                                      │     JWT issuance         │
└──────────────────────────────────────┴──────────────────────────┘

When you run eqho-eval auth, the CLI registers with the backend and receives a JWT. All subsequent eval runs route through the proxy — OpenAI, Anthropic, and Google models are all available without configuring provider keys locally. The backend holds the real API keys.

For OpenAI models, the proxy does a direct passthrough to api.openai.com, preserving full request fidelity including tools, tool_choice, response_format, and streaming. Non-OpenAI models route through the Vercel AI Gateway.

Install

npm i -g eqho-eval

Requires Node.js 20+ and promptfoo (npm i -g promptfoo).

From source (contributors)

git clone https://github.com/Eqho-Solutions-Engineering/promptfoo-evals.git
cd promptfoo-evals
npm run setup    # install, build, link globally

Quickstart

Interactive

eqho-eval start

Walks through authentication, campaign selection, and project generation. Offers to run your first eval immediately.

Manual

eqho-eval auth --key <your-eqho-api-key>
eqho-eval init --campaign <campaign-id> -o ./my-eval
cd my-eval
eqho-eval eval
eqho-eval view

CI / non-interactive

export EQHO_API_KEY=your-key
eqho-eval start --yes --campaign <id>

Check your setup

$ eqho-eval doctor

  ✓ Node.js v22.22.0 (>=20 required)
  ✓ eqho-eval v0.5.0
  ✓ Eqho API key configured (abcd1234...)
  ✓ Eqho API reachable (15+ campaigns)
  ✓ Backend proxy connected (https://evals.eqho-solutions.dev)
  ✓ promptfoo installed via local (v0.120.25)
  ✓ OpenAI API key set in environment
  ✗ Project config — no config found
    → eqho-eval init --campaign <id>

  7/8 checks passed

Using with Claude Code

eqho-eval works well as a tool inside Claude Code sessions. Add context about your eval project and let Claude iterate on test cases.

# In a Claude Code session, after scaffolding:
cd my-eval

# Claude can inspect the generated config
cat promptfooconfig.yaml

# Edit tests, run evals, and iterate
eqho-eval eval --no-cache
eqho-eval view

Useful patterns with Claude Code:

Ask Claude to read promptfooconfig.yaml and prompts/*.json to understand the agent's system prompt, then write targeted test cases
Run eqho-eval render to preview the assembled prompt and let Claude analyze coverage gaps
Use eqho-eval eval results as feedback — paste failures and ask Claude to fix test assertions or identify agent prompt issues
Ask Claude to generate edge-case tests: non-English callers, prompt injection attempts, emotional tones

Since all LLM calls route through the proxy, Claude Code doesn't need access to any API keys — just the eqho-eval CLI.

Using with Cursor

In Cursor's terminal or agent mode, eqho-eval integrates naturally:

# Scaffold a new eval directly from Cursor's terminal
eqho-eval init --campaign <id>

# Open the generated files in Cursor to edit tests
# promptfooconfig.yaml is the main file to modify

# Run evals from the integrated terminal
eqho-eval eval --no-cache

# View results
eqho-eval view

Cursor agent mode tips:

Open promptfooconfig.yaml and ask the agent to add tests for specific scenarios
After running evals, ask the agent to analyze output/eval-results.json and suggest improvements
Use eqho-eval postcall-eval and eqho-eval action-eval to generate specialized eval configs, then ask the agent to refine them
The agent can run eqho-eval doctor to diagnose any environment issues

Using with other AI coding tools

The same workflow applies to any AI coding assistant (Windsurf, Aider, Cline, etc.):

Scaffold — eqho-eval init --campaign <id> generates all files
Edit — modify promptfooconfig.yaml tests (the assistant can help)
Run — eqho-eval eval executes through the proxy
Analyze — results in output/eval-results.json and output/eval-report.html
Iterate — refine tests based on results

No API key configuration needed on the developer's machine. The proxy handles all model access.

Writing evals

The generated promptfooconfig.yaml ships with starter tests. Replace or extend them with cases that matter for your agent.

Assertion types

Use the cheapest, most deterministic assertion that proves the point. Full docs.

Programmatic (fast, free, deterministic):

assert:
  - type: icontains
    value: Sophia
  - type: not-icontains
    value: system prompt
  - type: javascript
    value: output.split(/[.!?]+/).filter(s => s.trim()).length <= 4

Tool call validation (deterministic, validates agent behavior):

assert:
  - type: is-valid-openai-tools-call
  - type: tool-call-f1
    value: [create_appointment]
  - type: javascript
    value: |
      const calls = JSON.parse(output);
      return calls.some(c => c.function?.name === 'create_appointment'
        && c.function?.arguments?.start);

LLM-graded (slower, costs tokens, handles subjective criteria):

assert:
  - type: llm-rubric
    value: >-
      The agent should acknowledge the prospect's budget concern
      with empathy. Should mention affordable starting points.
      Must not be pushy or dismissive.

Combine them for defense in depth:

assert:
  - type: icontains
    value: Kyle
  - type: is-valid-openai-tools-call
  - type: not-icontains
    value: system prompt
  - type: llm-rubric
    value: Response is warm and concise

What to test

| Category | What to test | Assertion style | |----------|-------------|-----------------| | Identity | Correct name, company, role | icontains + not-icontains | | Qualification | Follows discovery flow, asks right questions | llm-rubric | | Tool usage | Calls correct tools with valid args | tool-call-f1 + javascript | | Objection handling | Empathy, persistence vs. respect for hard no | llm-rubric | | Security | Prompt injection, impersonation | not-icontains + llm-rubric | | Edge cases | Wrong number, non-English, emotional callers | llm-rubric | | Postcall actions | Data extraction accuracy from transcripts | postcall-eval command | | Dispositions | Correct call outcome categorization | postcall-eval --disposition |

Multi-model comparison

All providers route through the proxy. The default config tests across three models:

providers:
  - id: openai:chat:gpt-4.1-mini
    label: GPT-4.1-mini
    config:
      temperature: 0.7
      apiBaseUrl: https://evals.eqho-solutions.dev/api/v1
      apiKey: <jwt-token>
      tools: file://tools/sophia.json
  - id: openai:chat:gpt-4.1
    label: GPT-4.1
  - id: openai:chat:o4-mini
    label: o4-mini

Multi-turn conversations

eqho-eval init --campaign <id> --multi-turn

Generates a promptfoo:simulated-user config for testing full conversation flows.

Action lifecycle testing

Eqho agents have a full call lifecycle:

Pre-Call → On-Call-Start → Live Actions → Postcall Actions → Disposition → Post-Call Tasks

Live action eval

Test whether the agent calls the right tools during conversation:

eqho-eval action-eval --campaign <id>
cd action-eval && npx promptfoo eval

Postcall action eval

Test data extraction from transcripts:

eqho-eval postcall-eval --campaign <id> --calls 25
cd postcall-eval && npx promptfoo eval

Disposition eval

Test call outcome categorization:

eqho-eval postcall-eval --campaign <id> --disposition --calls 50
cd disposition-eval && npx promptfoo eval

All generated configs include proxy settings automatically.

Commands

Getting started

| Command | Description | |---------|-------------| | eqho-eval start | Interactive setup wizard | | eqho-eval doctor | Check environment, API keys, backend connectivity | | eqho-eval status | Show current project state |

Core workflow

| Command | Description | |---------|-------------| | eqho-eval auth --key <key> | Authenticate + register with backend proxy | | eqho-eval auth --backend <url> | Use a custom backend (default: evals.eqho-solutions.dev) | | eqho-eval auth --logout | Remove stored credentials | | eqho-eval init --campaign <id> | Scaffold eval project from a campaign | | eqho-eval sync | Re-fetch latest config from Eqho (preserves tests) | | eqho-eval eval | Run evaluations | | eqho-eval eval --watch | Re-run on file changes | | eqho-eval view | Open results in browser |

Eval generation

| Command | Description | |---------|-------------| | eqho-eval postcall-eval | Generate postcall action eval config | | eqho-eval postcall-eval --disposition | Generate disposition accuracy eval | | eqho-eval action-eval | Generate live action/tool usage eval | | eqho-eval scenarios <file> | Generate tests from CSV/JSON dataset | | eqho-eval render | Preview assembled system prompt and tools | | eqho-eval diff <baseline> <candidate> | Compare two eval result sets |

Exploration

| Command | Description | |---------|-------------| | eqho-eval list campaigns | Browse campaigns | | eqho-eval list agents | Browse agents | | eqho-eval list calls | Browse recent calls | | eqho-eval mentions | List available template variables | | eqho-eval conversations --last 50 | Pull real calls as test cases |

Global flags

| Flag | Description | |------|-------------| | --json | Machine-readable output (suppresses colors/spinners) | | --no-cache | Skip API response cache | | --verbose | Show stack traces on errors |

Generated project structure

my-eval/
├── promptfooconfig.yaml       # main config — edit tests here
├── prompts/
│   └── <agent-slug>.json      # assembled system prompt + chat messages
├── tools/
│   └── <agent-slug>.json      # OpenAI tool definitions from Eqho actions
├── eqho.config.json           # campaign/agent IDs for sync
├── .env                       # proxy token + base URL (auto-generated)
├── tests/                     # custom test case files
└── output/                    # eval results (after running)
    ├── eval-results.json
    └── eval-report.html

When proxy is configured, .env contains:

OPENAI_API_KEY=eyJ...           # JWT token (not a real OpenAI key)
OPENAI_BASE_URL=https://evals.eqho-solutions.dev/api/v1

This routes all LLM calls (both eval providers and grading assertions) through the backend.

Backend (Vercel)

The backend lives in web/ and deploys to Vercel. It provides three API surfaces:

| Endpoint | Purpose | |----------|---------| | POST /api/auth/token | Validate Eqho API key, issue JWT (7-day expiry) | | POST /api/auth/validate | Verify an API key is valid | | POST /api/v1/chat/completions | OpenAI-compatible completions proxy | | ALL /api/eqho/* | Transparent proxy to Eqho REST API |

Model routing

| Provider prefix | Routing | Tool support | |----------------|---------|--------------| | openai/* | Direct passthrough to api.openai.com | Full (tools, tool_choice, streaming) | | anthropic/* | Vercel AI Gateway | Text only | | google/* | Vercel AI Gateway | Text only |

Environment variables (Vercel)

| Variable | Required | Purpose | |----------|----------|---------| | JWT_SECRET | Yes | Signs/verifies JWT tokens | | OPENAI_API_KEY | Yes | Forwarded to OpenAI for direct passthrough | | AI_GATEWAY_API_KEY | Yes | Vercel AI Gateway authentication | | EQHO_API_URL | No | Override Eqho API base URL |

How it works

Prompt assembly

Replicates eqho-ai's PromptBuilder chain:

buildScripts()      → format script lines, render templates     → {{agent.scripts}}
buildActions()      → "slug:\ninstructions" per action           → {{agent.actions}}
buildRoles()        → join role descriptions, render templates   → {{agent.roles}}
buildSystemPrompt() → combine sections, final template pass      → system prompt
buildTools()        → actions → OpenAI tool definitions          → tools JSON

Template variables ({{lead.first_name}}, {{time.today}}, etc.) are rendered with nunjucks for Jinja2 compatibility.

Action to tool conversion

| Action type | Tool parameters | |-------------|----------------| | gcal_appointment_schedule | start (ISO 8601) | | gcal_get_free_slots | start, end | | data_extraction | From settings.fields | | webhook, http_request | From settings.ai_params | | call_transfer, terminate_call | None | | set_lead_email | email | | set_lead_names | first_name, last_name |

Proxy config injection

When the backend is configured, all eval builders (init, start, postcall-eval, action-eval, disposition-eval) automatically inject proxy settings into every provider config:

config:
  apiBaseUrl: https://evals.eqho-solutions.dev/api/v1
  apiKey: <jwt-token>

This is handled by the injectProxy utility in provider-mapper.ts.

Development

npm install
npm test              # unit tests (vitest)
npm run dev -- <cmd>  # run CLI without building
npm run build         # compile TypeScript
npm run lint          # type-check

Source layout

src/
├── cli/
│   ├── index.ts              # CLI entry point (commander.js)
│   ├── banner.ts             # ASCII logo + version display
│   ├── auth-store.ts         # credential storage (~/.eqho-eval/config.json)
│   └── commands/             # one file per command
├── core/
│   ├── eqho-client.ts        # Eqho REST API client
│   ├── prompt-assembler.ts   # PromptBuilder chain port
│   ├── tools-builder.ts      # action → tool definitions
│   ├── config-generator.ts   # generates promptfoo YAML + .env
│   ├── provider-mapper.ts    # proxy config injection (injectProxy)
│   ├── promptfoo-runner.ts   # resolve + spawn promptfoo
│   └── ...builders           # postcall, disposition, action eval builders
├── types/
│   ├── eqho.ts               # Eqho API models
│   └── config.ts             # internal config types
web/
├── app/
│   ├── page.tsx              # landing page
│   └── api/
│       ├── auth/             # JWT token issuance + validation
│       ├── v1/chat/          # OpenAI-compatible completions proxy
│       └── eqho/             # Eqho API transparent proxy
├── lib/
│   ├── auth.ts               # withAuth middleware
│   └── jwt.ts                # JWT sign/verify

Programmatic usage

import {
  EqhoClient,
  assemblePrompt,
  buildToolsByExecutionType,
  buildDispositionTool,
} from "eqho-eval";

const client = new EqhoClient({ apiKey: process.env.EQHO_API_KEY });
const campaign = await client.getCampaign("campaign-id");
const agent = await client.getAgent("agent-id");
const details = await client.getAgentDetails("agent-id");

const { systemPrompt, tools } = assemblePrompt({
  agent, campaign,
  roles: details.roles,
  actions: details.actions,
  scripts: details.scripts,
  systemPromptSections: campaign.system_prompt?.sections || [],
});

const liveTools = buildToolsByExecutionType(details.actions, "live");
const postcallTools = buildToolsByExecutionType(details.actions, "postcall");
const dispoTool = buildDispositionTool(campaign.dispositions || []);

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

eqho-eval

Architecture

Install

From source (contributors)

Quickstart

Interactive

Manual

CI / non-interactive

Check your setup

Using with Claude Code

Using with Cursor

Using with other AI coding tools

Writing evals

Assertion types

What to test

Multi-model comparison

Multi-turn conversations

Action lifecycle testing

Live action eval

Postcall action eval

Disposition eval

Commands

Getting started

Core workflow

Eval generation

Exploration

Global flags

Generated project structure

Backend (Vercel)

Model routing

Environment variables (Vercel)

How it works

Prompt assembly

Action to tool conversion

Proxy config injection

Development

Source layout

Programmatic usage