@dutchmanlabs/evalstudio-cli
v0.3.0
Published
Local-first CLI for Dutchman Labs Eval Studio
Downloads
22
Readme
Eval Studio CLI
Local-first CLI for detecting AI agents in a codebase, generating eval suites, running them locally, and syncing results back to Dutchman Labs when the hosted backend is available.
Install and Run
Zero-install:
npx evalstudio-cli login
npx evalstudio-cli init
npx evalstudio-cli detect
npx evalstudio-cli generate
npx evalstudio-cli runCreate an API key at https://dutchmanlabs.com/dashboard/settings. If you are not signed in, Dutchman Labs routes you through signup and returns you to the API key page.
Global install:
npm install -g evalstudio-cli
evalstudio-cli login
evalstudio-cli init
evalstudio-cli detect
evalstudio-cli generate
evalstudio-cli runFrom the monorepo during development:
npm run build:cli
node packages/cli/dist/index.js --help
node packages/cli/dist/index.js loginlogin is still the best first step for hosted generation and dashboard sync, but the CLI now stays useful without it:
initcan create a local project config and sync it laterdetectalways runs locally and only uploads when credentials are availablegeneratecreates up to 3 local sample evals when no API key is saved yetgeneratefalls back to a full local synthetic suite when the backend is unavailable but you do have a keyrunalways executes locally and only uploads when a hosted suite and valid credentials are available
Commands
evalstudio-cli loginevalstudio-cli initevalstudio-cli detectevalstudio-cli scan(alias)evalstudio-cli generateevalstudio-cli runevalstudio-cli sandbox runevalstudio-cli sandbox doctorevalstudio-cli sandbox latestevalstudio-cli statusevalstudio-cli export
Detection
detect scans the local repo and recognizes patterns such as:
- OpenAI
- Anthropic / Claude
- Vertex AI / Gemini
- Azure AI
- LangChain
- LangGraph
- LlamaIndex
- Next.js, FastAPI, and Express handlers
- Plain JavaScript, TypeScript, or Python agent files with callable entrypoints, tool usage, messages arrays, or system prompts
Bias detection manually when you know the framework:
evalstudio-cli detect --framework langchainIf detection finds more than one candidate, Eval Studio prints a ranked list and lets you choose one. If your local .evalstudio/scan-results.json file is malformed, the CLI warns and falls back to automatic detection instead of crashing.
Generate
generate prefers the hosted backend.
- If you are logged in and the backend is reachable, Eval Studio generates the full hosted suite.
- If you are not logged in yet, Eval Studio creates up to 3 local sample evals and points you to sign up for a free account.
- If you are logged in but the backend is temporarily unavailable, Eval Studio falls back to a full local synthetic suite and still writes
.evalstudio/latest-suite.json.
evalstudio-cli generate
evalstudio-cli generate --count 12When hosted generation succeeds, the CLI prints your remaining daily generation quota. When generation falls back locally, the CLI tells you whether you are seeing a 3-eval sample because you are not logged in yet, or a full local fallback because the backend is temporarily unavailable.
Run
run has a single default path now: call the detected local function entrypoint directly.
- Python candidates default to
module:functionentrypoints such asagent:run - JavaScript and TypeScript candidates default to
path#exportNameentrypoints such assrc/agent.ts#run - HTTP is only used when you explicitly pass
--url
Examples:
evalstudio-cli run
evalstudio-cli run --entrypoint src/agent.ts#run
evalstudio-cli run --entrypoint app.agents.refund_agent:run_agent
evalstudio-cli run --url http://127.0.0.1:3000/api/chat
evalstudio-cli run --payload '{"input":"{{prompt}}"}' --url http://127.0.0.1:3000/api/chatIf a hosted run cannot be created or synced, or the CLI is operating without an API key, Eval Studio still saves .evalstudio/latest-run.json locally so you can inspect or export the results.
Browser Sandbox Runs
Use sandbox run for browser-executing agents. It loads trajectory JSON, creates an isolated browser context per trajectory, replays the steps, scores expected URL/text/selectors/tool calls, and writes trace/replay artifacts.
evalstudio-cli sandbox run \
--eval-set ./evals/browser-trajectories.json \
--backend local \
--url http://127.0.0.1:3000 \
--parallel 2 \
--timeout 300 \
--export jsonCheck local setup before a run:
evalstudio-cli sandbox doctor --eval-set ./evals/browser-trajectories.jsonPrint the latest sandbox summary and artifact paths:
evalstudio-cli sandbox latestTrajectory files can be a top-level array, { "trajectories": [...] }, or an Eval Studio { "evals": [...] } suite.
{
"trajectories": [
{
"id": "checkout-flow",
"name": "Checkout under $50",
"start_url": "http://127.0.0.1:3000",
"steps": [
{
"step": 1,
"input": { "user_message": "Buy the blue widget under $50" },
"expected_tool_calls": ["search_products", "add_to_cart"],
"expected_dom_state": {
"url_pattern": "/cart",
"element_text": "Proceed to checkout"
}
}
],
"metadata": { "domain": "ecommerce", "risk_level": "high" }
}
]
}Artifacts are written under .evalstudio/sandbox-runs/<run-id>/:
summary.jsontrace.ndjsonreplay.htmlscreenshots/
If you are logged in, initialized, and have selected a hosted candidate with detect, the sandbox summary, trace, and replay HTML also sync to the dashboard as browser sandbox artifacts. Screenshot files stay local in the current MVP.
Local mode auto-detects common Chrome, Chromium, Edge, and Brave installs. If your browser is in a custom location, set PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/path/to/chrome.
Local Files
Per-project state lives under .evalstudio/:
.evalstudio/config.json.evalstudio/scan-results.json.evalstudio/latest-suite.json.evalstudio/latest-run.json.evalstudio/exports/.evalstudio/sandbox-runs/
Global auth lives in ~/.evalstudio/config.json.
Anonymous CLI telemetry is enabled by default to help us understand command usage and funnel dropoff. It does not block CLI execution, and you can opt out with:
evalstudio-cli --no-telemetryor:
EVALSTUDIO_NO_TELEMETRY=1 evalstudio-cli detectStatus
status is the quickest way to see what Eval Studio knows about the current repo.
evalstudio-cli statusIt shows:
- current project ID
- selected candidate
- latest suite ID and run ID when cached locally
- hosted usage and reset time when you are logged in
- local-only state when you are not logged in yet
Manual Scan Cache Schema
Power users can pre-populate .evalstudio/scan-results.json. The minimum supported shape is:
{
"projectId": "proj_123",
"scannedAt": "2026-04-04T00:00:00.000Z",
"candidates": [
{
"path": "src/agent.ts",
"exportName": "run",
"language": "typescript",
"framework_guess": "openai",
"tool_names": ["lookup_order"],
"prompt_snippets": ["You are a support assistant."],
"confidence": 0.7
}
]
}Unknown fields are ignored. Invalid candidates are skipped with a warning. If the whole file cannot be used, Eval Studio falls back to automatic detection.
Help
evalstudio-cli --help
evalstudio-cli help
npx evalstudio-cli --help
evalstudio-cli run --help