npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dutchmanlabs/evalstudio-cli

v0.3.0

Published

Local-first CLI for Dutchman Labs Eval Studio

Downloads

22

Readme

Eval Studio CLI

Local-first CLI for detecting AI agents in a codebase, generating eval suites, running them locally, and syncing results back to Dutchman Labs when the hosted backend is available.

Install and Run

Zero-install:

npx evalstudio-cli login
npx evalstudio-cli init
npx evalstudio-cli detect
npx evalstudio-cli generate
npx evalstudio-cli run

Create an API key at https://dutchmanlabs.com/dashboard/settings. If you are not signed in, Dutchman Labs routes you through signup and returns you to the API key page.

Global install:

npm install -g evalstudio-cli
evalstudio-cli login
evalstudio-cli init
evalstudio-cli detect
evalstudio-cli generate
evalstudio-cli run

From the monorepo during development:

npm run build:cli
node packages/cli/dist/index.js --help
node packages/cli/dist/index.js login

login is still the best first step for hosted generation and dashboard sync, but the CLI now stays useful without it:

  • init can create a local project config and sync it later
  • detect always runs locally and only uploads when credentials are available
  • generate creates up to 3 local sample evals when no API key is saved yet
  • generate falls back to a full local synthetic suite when the backend is unavailable but you do have a key
  • run always executes locally and only uploads when a hosted suite and valid credentials are available

Commands

  • evalstudio-cli login
  • evalstudio-cli init
  • evalstudio-cli detect
  • evalstudio-cli scan (alias)
  • evalstudio-cli generate
  • evalstudio-cli run
  • evalstudio-cli sandbox run
  • evalstudio-cli sandbox doctor
  • evalstudio-cli sandbox latest
  • evalstudio-cli status
  • evalstudio-cli export

Detection

detect scans the local repo and recognizes patterns such as:

  • OpenAI
  • Anthropic / Claude
  • Vertex AI / Gemini
  • Azure AI
  • LangChain
  • LangGraph
  • LlamaIndex
  • Next.js, FastAPI, and Express handlers
  • Plain JavaScript, TypeScript, or Python agent files with callable entrypoints, tool usage, messages arrays, or system prompts

Bias detection manually when you know the framework:

evalstudio-cli detect --framework langchain

If detection finds more than one candidate, Eval Studio prints a ranked list and lets you choose one. If your local .evalstudio/scan-results.json file is malformed, the CLI warns and falls back to automatic detection instead of crashing.

Generate

generate prefers the hosted backend.

  • If you are logged in and the backend is reachable, Eval Studio generates the full hosted suite.
  • If you are not logged in yet, Eval Studio creates up to 3 local sample evals and points you to sign up for a free account.
  • If you are logged in but the backend is temporarily unavailable, Eval Studio falls back to a full local synthetic suite and still writes .evalstudio/latest-suite.json.
evalstudio-cli generate
evalstudio-cli generate --count 12

When hosted generation succeeds, the CLI prints your remaining daily generation quota. When generation falls back locally, the CLI tells you whether you are seeing a 3-eval sample because you are not logged in yet, or a full local fallback because the backend is temporarily unavailable.

Run

run has a single default path now: call the detected local function entrypoint directly.

  • Python candidates default to module:function entrypoints such as agent:run
  • JavaScript and TypeScript candidates default to path#exportName entrypoints such as src/agent.ts#run
  • HTTP is only used when you explicitly pass --url

Examples:

evalstudio-cli run
evalstudio-cli run --entrypoint src/agent.ts#run
evalstudio-cli run --entrypoint app.agents.refund_agent:run_agent
evalstudio-cli run --url http://127.0.0.1:3000/api/chat
evalstudio-cli run --payload '{"input":"{{prompt}}"}' --url http://127.0.0.1:3000/api/chat

If a hosted run cannot be created or synced, or the CLI is operating without an API key, Eval Studio still saves .evalstudio/latest-run.json locally so you can inspect or export the results.

Browser Sandbox Runs

Use sandbox run for browser-executing agents. It loads trajectory JSON, creates an isolated browser context per trajectory, replays the steps, scores expected URL/text/selectors/tool calls, and writes trace/replay artifacts.

evalstudio-cli sandbox run \
  --eval-set ./evals/browser-trajectories.json \
  --backend local \
  --url http://127.0.0.1:3000 \
  --parallel 2 \
  --timeout 300 \
  --export json

Check local setup before a run:

evalstudio-cli sandbox doctor --eval-set ./evals/browser-trajectories.json

Print the latest sandbox summary and artifact paths:

evalstudio-cli sandbox latest

Trajectory files can be a top-level array, { "trajectories": [...] }, or an Eval Studio { "evals": [...] } suite.

{
  "trajectories": [
    {
      "id": "checkout-flow",
      "name": "Checkout under $50",
      "start_url": "http://127.0.0.1:3000",
      "steps": [
        {
          "step": 1,
          "input": { "user_message": "Buy the blue widget under $50" },
          "expected_tool_calls": ["search_products", "add_to_cart"],
          "expected_dom_state": {
            "url_pattern": "/cart",
            "element_text": "Proceed to checkout"
          }
        }
      ],
      "metadata": { "domain": "ecommerce", "risk_level": "high" }
    }
  ]
}

Artifacts are written under .evalstudio/sandbox-runs/<run-id>/:

  • summary.json
  • trace.ndjson
  • replay.html
  • screenshots/

If you are logged in, initialized, and have selected a hosted candidate with detect, the sandbox summary, trace, and replay HTML also sync to the dashboard as browser sandbox artifacts. Screenshot files stay local in the current MVP.

Local mode auto-detects common Chrome, Chromium, Edge, and Brave installs. If your browser is in a custom location, set PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/path/to/chrome.

Local Files

Per-project state lives under .evalstudio/:

  • .evalstudio/config.json
  • .evalstudio/scan-results.json
  • .evalstudio/latest-suite.json
  • .evalstudio/latest-run.json
  • .evalstudio/exports/
  • .evalstudio/sandbox-runs/

Global auth lives in ~/.evalstudio/config.json.

Anonymous CLI telemetry is enabled by default to help us understand command usage and funnel dropoff. It does not block CLI execution, and you can opt out with:

evalstudio-cli --no-telemetry

or:

EVALSTUDIO_NO_TELEMETRY=1 evalstudio-cli detect

Status

status is the quickest way to see what Eval Studio knows about the current repo.

evalstudio-cli status

It shows:

  • current project ID
  • selected candidate
  • latest suite ID and run ID when cached locally
  • hosted usage and reset time when you are logged in
  • local-only state when you are not logged in yet

Manual Scan Cache Schema

Power users can pre-populate .evalstudio/scan-results.json. The minimum supported shape is:

{
  "projectId": "proj_123",
  "scannedAt": "2026-04-04T00:00:00.000Z",
  "candidates": [
    {
      "path": "src/agent.ts",
      "exportName": "run",
      "language": "typescript",
      "framework_guess": "openai",
      "tool_names": ["lookup_order"],
      "prompt_snippets": ["You are a support assistant."],
      "confidence": 0.7
    }
  ]
}

Unknown fields are ignored. Invalid candidates are skipped with a warning. If the whole file cannot be used, Eval Studio falls back to automatic detection.

Help

evalstudio-cli --help
evalstudio-cli help
npx evalstudio-cli --help
evalstudio-cli run --help