npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@razroo/iso-eval

v0.4.0

Published

Behavioral eval runner for AI coding agents — snapshot a workspace, hand it to a runner with a task prompt, score the resulting filesystem/git state.

Readme

@razroo/iso-eval

Behavioral eval runner for AI coding agents.

agentmd lints prompt structure, isolint lints prompt prose, iso-harness fans out the compiled source into every harness file layout. None of them answer the next question: did the agent actually do the task? That's what @razroo/iso-eval scores.

You give it a suite of tasks — each with a baseline workspace, a prompt, and a set of checks — and it snapshots the workspace per trial, hands it to a runner, then verifies the resulting filesystem / command state against your checks.

Built-in runners today:

  • fake — deterministic CI/offline runner that executes $ ... lines from the prompt as shell in the snapshotted workspace.
  • codex — real-agent runner that shells out to codex exec in the per-trial workspace and captures the final assistant message.
  • claude-code — real-agent runner that shells out to claude -p in the per-trial workspace.
  • cursor — real-agent runner that shells out to cursor-agent --print in the per-trial workspace.
  • opencode — real-agent runner that shells out to opencode run in the per-trial workspace.

The library API still accepts any RunnerFn, so you can plug in other harnesses without waiting on a packaged runner.

Install

npm install -D @razroo/iso-eval

Suite shape

# eval.yml
suite: refactor-basic
runner: fake              # fake | codex | claude-code | cursor | opencode
timeoutMs: 120000
harness:
  source: ../dist         # optional: stage generated harness files into each trial

tasks:
  - id: write-greeting
    prompt: tasks/write-greeting.md    # path (relative to eval.yml) or inline
    workspace: workspace/              # baseline dir, copied per-trial into tmpdir
    trials: 1
    checks:
      - { type: file_exists,       path: greeting.txt }
      - { type: file_contains,     path: greeting.txt, value: "hello" }
      - { type: file_not_contains, path: greeting.txt, value: "TODO"  }
      - { type: command, run: "test -f greeting.txt", expectExit: 0 }

Supported checks

| type | asserts | | --------------------- | ---------------------------------------------------------------- | | command | shell command exits with expectExit (default 0); optional stdout contains/matches | | file_exists | file at path exists in the workspace | | file_contains | file at path contains the literal substring value | | file_not_contains | file at path does NOT contain value | | file_matches | file at path matches the regex matches | | llm_judge | a user-supplied JudgeFn answers yes to prompt against runner stdout/stderr | | agentmd_adherence | per-rule pass rate from agentmd test meets minPassRate; optional ruleId filter |

agentmd_adherence

- type: agentmd_adherence
  promptFile: ../agent.md         # path to agentmd source (relative to eval.yml)
  fixtures: ../fixtures.yml       # path to agentmd fixture file
  ruleId: H3                      # optional — score only this rule
  minPassRate: 0.9                # required — pass rate floor in [0, 1]
  via: claude-code                # optional — default claude-code (api | claude-code | fake)
  model: claude-haiku-4-5         # optional — forwarded as --model
  timeoutMs: 180000               # optional — subprocess timeout

Shells out to the agentmd CLI (bundled as a runtime dependency) via agentmd test <promptFile> --fixtures <fixtures> --format json, parses the per-rule check outcomes, computes the pass rate for ruleId (or overall if omitted), and fails the check when the rate is below minPassRate. Tests can inject a fake subprocess runner via the library API (AgentmdSpawnFn) so CI doesn't need an API key.

CLI

iso-eval run  examples/suites/echo-basic/eval.yml
iso-eval plan examples/suites/echo-basic/eval.yml

iso-eval run eval.yml --filter write-greeting --concurrency 2 --json
iso-eval run eval.yml --runner claude-code --harness-source ../dist
iso-eval run eval.yml --runner cursor --harness-source ../dist
iso-eval run eval.yml --runner opencode --harness-source ../dist
iso-eval run eval.yml --keep-workspaces           # skip tmpdir cleanup for debugging

run exits 0 on all-pass, 1 on any failure, 2 on invalid invocation.

--runner and --harness-source let you replay the same suite through a different packaged harness without rewriting checks.yml.

Real runners and harness staging

Set runner: in YAML, or override it at the CLI with --runner. harness.source is optional; when present, iso-eval stages the generated harness files you want the runner to see into each snapshotted workspace.

codex

suite: refactor-basic
runner: codex
timeoutMs: 180000
harness:
  source: ../dist

Accepted harness.source shapes:

  • a project directory containing AGENTS.md and/or .codex/
  • a direct AGENTS.md path
  • a direct .codex/config.toml path

claude-code

Accepted harness.source shapes:

  • a project directory containing CLAUDE.md, .claude/, and/or .mcp.json
  • a direct CLAUDE.md path
  • a direct .claude/ path
  • a direct .claude/settings.json path
  • a direct .mcp.json path

The runner shells out to claude -p --no-session-persistence and passes .mcp.json through --mcp-config when present.

opencode

Accepted harness.source shapes:

  • a project directory containing AGENTS.md, opencode.json, and/or .opencode/
  • a direct AGENTS.md path
  • a direct opencode.json path
  • a direct .opencode/ path

The runner shells out to opencode run --dir <workspace> and defaults to --pure so each trial stays self-contained.

cursor

Accepted harness.source shapes:

  • a project directory containing .cursor/, AGENTS.md, and/or CLAUDE.md
  • a direct .cursor/ path
  • a direct .cursor/rules/ path
  • a direct .cursor/rules/*.mdc path
  • a direct .cursor/mcp.json path
  • a direct AGENTS.md path
  • a direct CLAUDE.md path

The runner shells out to cursor-agent --print --output-format text --workspace <workspace> and stages any Cursor harness files you exported with iso-harness into the per-trial workspace first.

This lets one suite exported from iso-trace be replayed across the packaged runners with the same task prompt and checks.

Library API

import { loadSuite, run, formatReport, fakeRunner } from "@razroo/iso-eval";

const suite = loadSuite("./eval.yml");
const report = await run(suite, {
  runner: fakeRunner,
  concurrency: 2,
  onTaskComplete: (t) => console.log(t.id, t.passed ? "✓" : "✗"),
});
console.log(formatReport(report));
process.exit(report.passed ? 0 : 1);

Bring your own runner

The YAML runner: field selects from shipped runners; the library accepts any RunnerFn:

import type { RunnerFn } from "@razroo/iso-eval";

const myRunner: RunnerFn = async ({ workspaceDir, taskPrompt, timeoutMs, harnessSource }) => {
  // spawn your agent (claude -p / codex exec / …) with cwd = workspaceDir
  // optionally stage files from harnessSource before invoking it
  // return { exitCode, stdout, stderr, durationMs }
};

Bring your own judge (for llm_judge checks)

import type { JudgeFn } from "@razroo/iso-eval";

const judge: JudgeFn = async (prompt, output) => {
  // call your model; return true if the rule was followed
};

await run(suite, { runner: fakeRunner, judge });

How this fits the rest of the pipeline

agent.md  →  agentmd lint  →  agentmd render  →  isolint lint  →  iso-harness build
                                                                         │
                                                                         ▼
                                                          project w/ CLAUDE.md etc.
                                                                         │  iso-eval run
                                                                         ▼
                                                                per-task pass / fail
  • @razroo/agentmd measures per-rule adherence on text output (input string → output string → check).
  • @razroo/iso-eval measures task success on a real workspace (snapshot dir → agent acts → filesystem state → check).

The two compose: an iso-eval suite can include llm_judge checks that reuse the same judge convention (yes = rule followed), plus agentmd_adherence checks that fold a fixture-level adherence score into the task report.

License

MIT — see LICENSE.