npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cliwatch/cli-bench

v0.7.1

Published

LLM CLI agent testing framework — benchmark how well AI models use your CLI tool

Readme

@cliwatch/cli-bench

LLM CLI agent testing framework — benchmark how well AI models use your CLI tool. Runs tasks directly on the host (no Docker required), has models execute commands via tool-calling, and validates results with assertions.

Quick start

# 1. Scaffold a config file
npx @cliwatch/cli-bench init

# 2. Edit cli-bench.yaml (define your CLI, providers, tasks)

# 3. Run (locally or in CI)
npx @cliwatch/cli-bench

Config file (cli-bench.yaml)

cli: docker
version_command: "docker --version"

providers:
  - anthropic/claude-sonnet-4-20250514
  - openai/gpt-4o

tasks:
  - id: pull-image
    intent: "Pull the latest nginx image"
    assert:
      - ran: "docker pull.*nginx"
      - verify:
          run: "docker images nginx --format '{{.Repository}}'"
          output_contains: "nginx"

  - id: create-project
    intent: "Create a new project called my-app"
    setup:
      - "mkdir -p /tmp/bench-workspace"
    assert:
      - ran: "mycli create.*my-app"
      - exit_code: 0
      - file_exists: "/tmp/bench-workspace/my-app/package.json"
      - verify:
          run: "mycli list --json"
          output_contains: "my-app"

Split tasks across files

cli: docker
providers: [anthropic/claude-sonnet-4-20250514]
tasks:
  - file://tasks/basics.yaml
  - file://tasks/advanced/*.yaml
  - file://tasks/**/*.yaml          # recursive glob

Each referenced file is a plain array of tasks:

# tasks/basics.yaml
- id: list-containers
  intent: "List all running containers"
  assert:
    - ran: "docker ps"
    - exit_code: 0

Config fields

| Field | Required | Description | |-------|----------|-------------| | cli | Yes | CLI name (must be in PATH) | | version_command | No | e.g. "mycli --version", for tracking | | providers | No | Model IDs (default: claude-sonnet-4) | | help_modes | No | injected, discoverable, none (default: [injected]) | | concurrency | No | Max concurrent API calls (default: 3) | | workdir | No | Working directory (default: temp dir per task) | | upload | No | auto, always, never (default: auto) | | repeat | No | Run all tasks N times (default: 1, range: 1-100) | | system_prompt | No | Custom prompt appended to the default agent system message | | thresholds | No | Pass rate thresholds (see docs) | | env | No | Environment variables for all tasks (supports {{workdir}}) | | setup | No | Commands to run before each task (supports {{workdir}}) | | cleanup | No | Commands to run after each task (supports {{workdir}}) | | tasks | Yes | Array of tasks or file:// references |

Assertion types

| Assertion | Example | Description | |-----------|---------|-------------| | ran | ran: "docker ps" | Agent ran a command matching regex | | not_ran | not_ran: "rm -rf" | No command matched regex | | run_count | run_count: {pattern: "curl", min: 1, max: 3} | Count of matching commands | | output_contains | output_contains: "hello" | Last command stdout contains | | output_equals | output_equals: "ok" | Last command stdout exact match | | error_contains | error_contains: "warning" | Last command stderr contains | | exit_code | exit_code: 0 | Last command exit code | | file_exists | file_exists: "./my-app/package.json" | File exists | | file_contains | file_contains: {path: "...", text: "..."} | File content check | | verify | verify: {run: "cmd", output_contains: "ok"} | Run post-agent command, check output |

verify is the universal escape hatch — runs any command after the agent finishes.

GitHub Actions

steps:
  - uses: actions/checkout@v4
  - uses: actions/setup-node@v4
    with: { node-version: 22 }
  - run: npm install -g my-cli
  - run: npx @cliwatch/cli-bench
    env:
      AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
      CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}  # optional, uploads to dashboard

No Docker required. Commands run directly on the CI runner.

Environment variables

| Variable | Description | |----------|-------------| | AI_GATEWAY_API_KEY | Vercel AI Gateway key — provides access to all models | | CLIWATCH_API_KEY | API key from app.cliwatch.com for uploading results |

Uploading results

Results upload automatically when CLIWATCH_API_KEY is set (default upload: auto). Override with upload: always or upload: never in your config, or pass --upload on the CLI.

Available models

| Model ID | Provider | |----------|----------| | anthropic/claude-sonnet-4-20250514 | Anthropic | | anthropic/claude-haiku-4-5-20251001 | Anthropic | | openai/gpt-4o | OpenAI | | google/gemini-2.5-pro | Google |

Any model supported by the Vercel AI SDK gateway can be used — just pass the full provider/model-id.

Changelog

0.5.0

  • system_prompt config field for custom agent instructions

0.4.0

  • Repeat support, threshold checks, conversation traces, task suite hashing

0.3.0

  • Config file mode, file references with globs, CI metadata, dashboard uploads

See CHANGELOG.md for full history.

License

MIT