@dutchmanlabs/evalstudio
v0.1.4
Published
Local-first CLI for Dutchman Labs Eval Studio
Readme
Eval Studio CLI
Local-first CLI for scanning AI agents, generating eval suites through Dutchman Labs, running them locally, and uploading results back to Eval Studio.
Install and Run
For zero-install usage, run the public wrapper package directly with npx:
npx evalstudio-cli login
npx evalstudio-cli init
npx evalstudio-cli detect
npx evalstudio-cli generate
npx evalstudio-cli runIf you want the bare evalstudio command, install the package globally once:
npm install -g evalstudio-cli
evalstudio login
evalstudio init
evalstudio detect
evalstudio generate
evalstudio runAdvanced fallback: the underlying implementation package is still @dutchmanlabs/evalstudio.
From the monorepo during development:
npm run build:cli
node packages/cli/dist/index.js --help
node packages/cli/dist/index.js loginFor plain Python agents, run no longer requires a local HTTP server. If your selected candidate is Python, Eval Studio defaults to calling a Python entrypoint such as agent:run directly.
Commands
evalstudio loginevalstudio initevalstudio detectevalstudio scan(alias)evalstudio generateevalstudio runevalstudio statusevalstudio export
Detection Notes
detect recognizes:
- OpenAI
- Anthropic
- LangChain
- LangGraph
- LlamaIndex
- Next.js / FastAPI / Express routes
- Plain Python or TypeScript agent files with a
run/respondstyle entrypoint, messages array, tools, or system prompt
You can bias detection manually with:
evalstudio detect --framework openaiIf detection finds more than one candidate, Eval Studio will show a ranked list and let you pick one. If the top result is low-confidence, it will say so and suggest using --framework.
Run Modes
Defaults:
loginprompts for anes_live_...key and stores it in~/.evalstudio/config.jsoninitexpects to run inside a git repository and writes.evalstudio/config.jsongenerateuses the selected candidate and asks the hosted backend for24tests unless you pass--countrundefaults to Python function mode for Python candidates and HTTP mode otherwiseexportwrites all three formats (jsonl,csv,pytest) unless you pass--format
For Python candidates, the default happy path is:
evalstudio run --entrypoint agent:runOr just:
evalstudio runif the selected candidate lives in a file like agent.py and exposes a callable such as run.
Use --entrypoint when Eval Studio cannot infer the right callable or when your function lives deeper in the repo:
evalstudio run --entrypoint app.agents.refund_agent:run_agentFor teams already running a local web server, HTTP mode still works:
evalstudio run --mode http --url http://127.0.0.1:3000/api/chatUse --url when your local server is not already saved in .evalstudio/config.json, and use --payload only when the request body is not the default prompt-only shape.
Local Files
The CLI writes state in the current repo under .evalstudio/:
.evalstudio/config.json.evalstudio/scan-results.json.evalstudio/latest-suite.json.evalstudio/latest-run.json.evalstudio/exports/
Global auth is stored in ~/.evalstudio/config.json.
generate writes the current hosted suite to .evalstudio/latest-suite.json.
run executes the suite locally, saves the result set to .evalstudio/latest-run.json, and then uploads those results to the Dutchman Labs dashboard.
export is local-only. It transforms .evalstudio/latest-run.json into JSONL, CSV, or pytest artifacts under .evalstudio/exports/.
Manual Scan Cache Schema
Power users can pre-populate .evalstudio/scan-results.json. The minimum supported shape is:
{
"projectId": "proj_123",
"scannedAt": "2026-04-03T00:00:00.000Z",
"candidates": [
{
"id": "cand_123",
"path": "agent.py",
"language": "python",
"framework_guess": "openai",
"entrypoint_guess": "run",
"route_guess": null,
"tool_names": [],
"prompt_snippets": ["You are a support assistant."],
"confidence": 0.5,
"why_detected": ["manual"]
}
]
}If the file is malformed, generate now returns a clear schema error instead of crashing.
Help
evalstudio --help
evalstudio help
npx evalstudio-cli --help
evalstudio generate --help
evalstudio help runDemo Target
The canonical sibling demo repo used during validation is:
/Users/riyadsarsour/Desktop/dutchman/testagent
That demo agent listens on:
http://127.0.0.1:3000/api/chat
Typical demo flow:
cd /Users/riyadsarsour/Desktop/dutchman/testagent
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js init
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js detect
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js generate
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js run --url http://127.0.0.1:3000/api/chat
node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js exportArtifacts to inspect after the demo:
.evalstudio/scan-results.json.evalstudio/latest-suite.json.evalstudio/latest-run.json.evalstudio/exports/
