@dutchmanlabs/evalstudio

v0.1.4

Published

3 months ago

Local-first CLI for Dutchman Labs Eval Studio

0High
0Medium
0Low

rtsarsour

riyadsarsour

evals agents cli openai testing

Eval Studio CLI

Local-first CLI for scanning AI agents, generating eval suites through Dutchman Labs, running them locally, and uploading results back to Eval Studio.

Install and Run

For zero-install usage, run the public wrapper package directly with npx:

npx evalstudio-cli login
npx evalstudio-cli init
npx evalstudio-cli detect
npx evalstudio-cli generate
npx evalstudio-cli run

If you want the bare evalstudio command, install the package globally once:

npm install -g evalstudio-cli
evalstudio login
evalstudio init
evalstudio detect
evalstudio generate
evalstudio run

Advanced fallback: the underlying implementation package is still @dutchmanlabs/evalstudio.

From the monorepo during development:

npm run build:cli
node packages/cli/dist/index.js --help
node packages/cli/dist/index.js login

For plain Python agents, run no longer requires a local HTTP server. If your selected candidate is Python, Eval Studio defaults to calling a Python entrypoint such as agent:run directly.

Commands

evalstudio login
evalstudio init
evalstudio detect
evalstudio scan (alias)
evalstudio generate
evalstudio run
evalstudio status
evalstudio export

Detection Notes

detect recognizes:

OpenAI
Anthropic
LangChain
LangGraph
LlamaIndex
Next.js / FastAPI / Express routes
Plain Python or TypeScript agent files with a run / respond style entrypoint, messages array, tools, or system prompt

You can bias detection manually with:

evalstudio detect --framework openai

If detection finds more than one candidate, Eval Studio will show a ranked list and let you pick one. If the top result is low-confidence, it will say so and suggest using --framework.

Run Modes

Defaults:

login prompts for an es_live_... key and stores it in ~/.evalstudio/config.json
init expects to run inside a git repository and writes .evalstudio/config.json
generate uses the selected candidate and asks the hosted backend for 24 tests unless you pass --count
run defaults to Python function mode for Python candidates and HTTP mode otherwise
export writes all three formats (jsonl, csv, pytest) unless you pass --format

For Python candidates, the default happy path is:

evalstudio run --entrypoint agent:run

Or just:

evalstudio run

if the selected candidate lives in a file like agent.py and exposes a callable such as run.

Use --entrypoint when Eval Studio cannot infer the right callable or when your function lives deeper in the repo:

evalstudio run --entrypoint app.agents.refund_agent:run_agent

For teams already running a local web server, HTTP mode still works:

evalstudio run --mode http --url http://127.0.0.1:3000/api/chat

Use --url when your local server is not already saved in .evalstudio/config.json, and use --payload only when the request body is not the default prompt-only shape.

Local Files

The CLI writes state in the current repo under .evalstudio/:

.evalstudio/config.json
.evalstudio/scan-results.json
.evalstudio/latest-suite.json
.evalstudio/latest-run.json
.evalstudio/exports/

Global auth is stored in ~/.evalstudio/config.json.

generate writes the current hosted suite to .evalstudio/latest-suite.json.

run executes the suite locally, saves the result set to .evalstudio/latest-run.json, and then uploads those results to the Dutchman Labs dashboard.

export is local-only. It transforms .evalstudio/latest-run.json into JSONL, CSV, or pytest artifacts under .evalstudio/exports/.

Manual Scan Cache Schema

Power users can pre-populate .evalstudio/scan-results.json. The minimum supported shape is:

{
  "projectId": "proj_123",
  "scannedAt": "2026-04-03T00:00:00.000Z",
  "candidates": [
    {
      "id": "cand_123",
      "path": "agent.py",
      "language": "python",
      "framework_guess": "openai",
      "entrypoint_guess": "run",
      "route_guess": null,
      "tool_names": [],
      "prompt_snippets": ["You are a support assistant."],
      "confidence": 0.5,
      "why_detected": ["manual"]
    }
  ]
}

If the file is malformed, generate now returns a clear schema error instead of crashing.

Help

evalstudio --help
evalstudio help
npx evalstudio-cli --help
evalstudio generate --help
evalstudio help run

Demo Target

The canonical sibling demo repo used during validation is:

/Users/riyadsarsour/Desktop/dutchman/testagent

That demo agent listens on:

http://127.0.0.1:3000/api/chat

Typical demo flow:

cd /Users/riyadsarsour/Desktop/dutchman/testagent
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js init
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js detect
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js generate
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js run --url http://127.0.0.1:3000/api/chat
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js export

Artifacts to inspect after the demo:

.evalstudio/scan-results.json
.evalstudio/latest-suite.json
.evalstudio/latest-run.json
.evalstudio/exports/

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme