npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

probeai

v0.2.2

Published

CLI tool for testing and evaluating AI coding agents

Readme

ProbeAI

Test and evaluate AI coding agents with YAML scenarios.

ProbeAI runs your AI agent, captures its output, and scores it using rule-based checks and LLM judges. Get a pass/fail verdict with detailed reports.

Install

npm install -g probeai

Or run directly:

npx probeai run my-scenario.yaml

Quick Start

1. Create a scenario file (test-my-agent.yaml):

id: hello-test
name: "Hello World Test"
description: "Check that my agent can echo hello"

agent:
  type: command
  command: "echo 'Hello from agent'"

steps:
  - action: check_output
    expect: "Hello"

evaluate:
  method: rules
  passThreshold: 100
  rules:
    - type: contains
      target: stdout
      value: "Hello from agent"
      weight: 1
    - type: exit_code
      target: exit
      value: "0"
      weight: 1

2. Run it:

probeai run test-my-agent.yaml

3. See results:

Running 1 scenario(s)...

  [hello-test] Hello World Test
    PASS score=100/100

All 1 scenario(s) passed!

Usage

# Run one or more scenarios
probeai run scenario.yaml
probeai run tests/*.yaml

# Verbose output (shows stdout, stderr, timing)
probeai run scenario.yaml -v

# Generate markdown report
probeai run scenario.yaml --md

# Custom output directory
probeai run scenario.yaml -o ./my-results

# Validate scenario files without running
probeai validate scenario.yaml

Scenario Format

A scenario is a YAML file with 4 sections:

agent

What to run. Currently supports command type (runs a shell command).

agent:
  type: command
  command: "my-agent --task 'do something'"
  env:
    API_KEY: "test-key"

steps

Actions to perform during the run.

steps:
  - action: send           # Send input to stdin
    input: "hello"
  - action: wait           # Wait for N ms
    duration: 2000
  - action: check_output   # Mark output for evaluation
    expect: "success"
  - action: check_file     # Mark file for evaluation
    path: "./output.txt"

evaluate

How to score the result. Three methods:

Rules only — deterministic checks:

evaluate:
  method: rules
  passThreshold: 80
  rules:
    - type: contains        # stdout/stderr contains string
      target: stdout
      value: "success"
    - type: regex           # regex match (case-insensitive)
      target: stdout
      value: "(ok|done|success)"
    - type: exit_code       # process exit code
      target: exit
      value: "0"
    - type: file_exists     # file was created
      target: file
      value: "./output.txt"
    - type: json_match      # JSON key-value match
      target: stdout
      value: '{"status":"ok"}'

LLM only — Ollama judges the output:

evaluate:
  method: llm
  model: "qwen2.5-coder:14b"
  rubric: |
    Did the agent complete the task correctly?
    Score 0-100 based on correctness and completeness.

Hybrid — average of rules + LLM:

evaluate:
  method: hybrid
  passThreshold: 60
  model: "qwen2.5-coder:14b"
  rubric: "Evaluate the output quality."
  rules:
    - type: exit_code
      target: exit
      value: "0"

Other options

id: unique-id              # Required
name: "Human readable"     # Required
description: "What this tests"
timeout: 120               # Seconds (default: 120)

Templates

Starter templates in templates/:

  • cli-smoke.yaml — Quick check that a CLI runs without crashing
  • agent-task.yaml — Full AI agent task with hybrid evaluation
  • api-health.yaml — HTTP endpoint health check

Copy and edit:

cp node_modules/probeai/templates/cli-smoke.yaml my-test.yaml

Reports

ProbeAI generates JSON reports by default. Add --md for markdown.

probeai run tests/*.yaml --md -o ./reports

Reports include:

  • Pass/fail per scenario
  • Score breakdown (rule results, LLM reasoning)
  • Execution duration
  • Overall summary

GitHub Actions

Run ProbeAI automatically on every PR and push.

1. Add scenarios to your repo (e.g. tests/probeai/build-check.yaml):

id: build-check
name: "Build Check"
description: "Verify the project compiles"

agent:
  type: command
  command: "npx tsc --noEmit 2>&1"

steps:
  - action: check_output
    expect: ""

evaluate:
  method: rules
  passThreshold: 100
  rules:
    - type: exit_code
      target: exit
      value: "0"
      weight: 1

More scenario ideas:

# Lint check
id: lint-check
name: "Lint Check"
agent:
  type: command
  command: "npx biome check src/ 2>&1"
steps:
  - action: check_output
    expect: ""
evaluate:
  method: rules
  passThreshold: 100
  rules:
    - type: exit_code
      target: exit
      value: "0"
      weight: 1
# Test check
id: test-check
name: "Test Check"
agent:
  type: command
  command: "npm test 2>&1"
steps:
  - action: check_output
    expect: ""
evaluate:
  method: rules
  passThreshold: 100
  rules:
    - type: exit_code
      target: exit
      value: "0"
      weight: 1

2. Create workflow (.github/workflows/probeai.yml):

name: ProbeAI

on:
  pull_request:
  push:

jobs:
  verify:
    name: ProbeAI Verification
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - run: npm install

      - name: Run ProbeAI scenarios
        run: npx probeai run tests/probeai/*.yaml -v

3. Push and check — ProbeAI results appear as a GitHub check on your PRs.

Programmatic Usage

Use ProbeAI as a library in your Node.js code:

import { probe, loadScenarios } from "probeai";

const scenarios = loadScenarios(["tests/build.yaml", "tests/lint.yaml"]);
const results = await probe(scenarios, {
  outputDir: "./results",
  markdown: false,
  verbose: false,
});

const failed = results.filter((r) => !r.evaluation.passed);
if (failed.length > 0) {
  console.log(`${failed.length} scenario(s) failed`);
  process.exit(1);
}

Requirements

  • Node.js 20+
  • For LLM evaluation: Ollama running locally

License

MIT