npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agents-harness

v0.3.2

Published

Multi-agent orchestrator for autonomous software development

Readme

agents-harness

A multi-agent orchestrator for autonomous software development. Three AI agents — Planner, Generator, and Evaluator — work together in a loop to turn your feature spec into working code.

Built on the architecture described in Anthropic's engineering blog post: Harness Design for Long-Running Apps. The core idea: separate generation from evaluation (like a GAN), reset context between agent invocations to prevent degradation, and use file-based handoffs so each agent starts fresh.

How It Works

You write a spec
     |
     v
 [Planner]  -----> Expands spec, breaks it into sprints, writes contracts
     |
     v
 [Generator] ----> Implements the sprint contract (reads/writes/edits code)
     |
     v
 [Evaluator] ----> Critically tests the implementation against the contract
     |
   PASS? ----yes--> Next sprint (or done)
     |
    no
     |
     v
 [Generator] ----> Tries again with evaluator feedback
     |
     v
   (loop up to max attempts)

Each agent gets a fresh context on every invocation — no accumulated confusion from long conversations. State is passed between agents via files in the .harness/ directory, not conversation history.

Quick Start

1. Install

npm install -g agents-harness

2. Set your API key

# Option A: Environment variable
export ANTHROPIC_API_KEY=sk-ant-...

# Option B: Global config
agents-harness config set api-key sk-ant-...

3. Run

cd your-project
agents-harness run "Add user authentication with email/password login and JWT tokens"

That's it. The harness will plan, implement, and test the feature across multiple sprints.

Commands

run — Start a new run

agents-harness run "<spec>"

Give it a feature description and it handles the rest.

Options:

| Flag | Description | Default | |------|-------------|---------| | -s, --scope <workspaces...> | Limit to specific workspaces (monorepo) | All | | --max-attempts <n> | Max retry attempts per sprint | 3 | | --max-budget <n> | Max total spend in USD | 50 | | --no-dashboard | Disable the live web dashboard | On | | --port <n> | Dashboard port | 3117 | | --model <model> | Claude model for all agents | — | | --planner-model <model> | Claude model for the planner agent | opus | | --generator-model <model> | Claude model for the generator agent | opus | | --evaluator-model <model> | Claude model for the evaluator agent | sonnet |

Available models: opus, sonnet, haiku (or full model IDs like claude-opus-4-6)

Examples:

# Simple feature (dashboard opens at http://localhost:3117)
agents-harness run "Add a /health endpoint that returns 200 OK"

# With budget limit
agents-harness run "Refactor the auth module to use OAuth2" --max-budget 20

# Monorepo — only touch the backend
agents-harness run "Add pagination to the users API" --scope packages/api

# Disable the dashboard for CI or headless environments
agents-harness run "Build a notification system" --no-dashboard

# Use Sonnet for all agents (cheaper)
agents-harness run "Add a /health endpoint" --model sonnet

# Mix models — Opus for generation, Sonnet for planning/evaluation
agents-harness run "Build auth system" --planner-model sonnet --generator-model opus

init — Initialize project config (optional)

agents-harness init

Creates a .harness/ directory with:

  • config.yaml — agent models, budget limits, attempt limits
  • criteria.md — custom evaluation criteria template

The harness works without init — it auto-detects your stack. Only run this if you want to customize settings.

Example output:

Detected project:
  Repository type: single
  Workspace: .
    Language: typescript
    Framework: next.js
    Test runner: vitest
    Test command: npx vitest run
  CLAUDE.md: found

Created .harness/config.yaml
Created .harness/criteria.md

status — Check run progress

agents-harness status

Shows the current state of a run — which sprint you're on, pass/fail status, and cost.

Example output:

Status: RUNNING
Spec: Add user authentication with email/password login...
Started: 2025-03-28T10:30:00.000Z
Phase: evaluate
Cost: $2.45 / $50.00

Sprints: 2 / 3
  [PASS] Sprint 1 — 1 attempt, $0.85
  [....] Sprint 2 — 2 attempts, $1.60
  [    ] Sprint 3

resume — Resume a stopped run

agents-harness resume

Picks up where a stopped or failed run left off. Skips completed sprints.

Options:

| Flag | Description | Default | |------|-------------|---------| | --max-budget <n> | Max total spend in USD | 50 | | --no-dashboard | Disable the live web dashboard | On | | --port <n> | Dashboard port | 3117 | | --model <model> | Claude model for all agents | — | | --planner-model <model> | Claude model for the planner agent | opus | | --generator-model <model> | Claude model for the generator agent | opus | | --evaluator-model <model> | Claude model for the evaluator agent | sonnet |

Example:

# Hit Ctrl+C during a run, then later:
agents-harness resume

# Resume with a higher budget
agents-harness resume --max-budget 100

# Resume without dashboard
agents-harness resume --no-dashboard

# Resume with different models
agents-harness resume --model sonnet

config — Manage global settings

agents-harness config set <key> <value>
agents-harness config get <key>

Examples:

# Save your API key globally
agents-harness config set api-key sk-ant-api03-...

# Check what's set
agents-harness config get api-key

Config is stored at ~/.agents-harness/config.yaml.

Configuration

Zero-config (default)

The harness auto-detects your project:

  • Language — TypeScript, Python, Rust, Go
  • Framework — Next.js, Django, etc.
  • Test runner — vitest, jest, pytest, cargo test, go test
  • Repo type — single repo or monorepo (npm workspaces, pnpm, lerna)
  • CLAUDE.md — reads project conventions if present

Custom config (optional)

Run agents-harness init, then edit .harness/config.yaml:

agents:
  planner:
    model: sonnet
  generator:
    model: opus
    maxTurns: 100
  evaluator:
    model: sonnet
max_attempts_per_sprint: 3
max_budget_per_sprint_usd: 5
max_total_budget_usd: 50

Available models: opus, sonnet, haiku

Custom evaluation criteria

Edit .harness/criteria.md to add project-specific rules:

# Custom Evaluation Criteria

- All API endpoints must return proper HTTP status codes
- Database migrations must be reversible
- All user-facing strings must be internationalized

These are checked in addition to the built-in defaults (correctness, testing, code quality, integration).

Live Dashboard

The dashboard starts automatically at http://localhost:3117 on every run. It provides a split-panel UI for monitoring the entire harness lifecycle.

# Dashboard is on by default
agents-harness run "Build a feature"
# Dashboard: http://localhost:3117

# Disable for CI or headless environments
agents-harness run "Build a feature" --no-dashboard

Left panel:

  • Phase pipeline (Plan → Decompose → Contract → Generate → Evaluate → Handoff) with active/done states
  • Sprint cards with status, attempt count, cost, and evaluation criteria

Right panel:

  • File viewer with tabs (Spec, Sprints, Contract, Evaluation, Handoff)
  • Live file updates via WebSocket as agents write to .harness/ files
  • Auto-switches to the relevant tab when the phase changes

Bottom:

  • Collapsible activity stream (every file read, edit, bash command)
  • Cost tracking with budget progress bar
  • Auto-reconnects if the connection drops

Programmatic API

Use agents-harness as a library in your own tools:

import { Harness } from "agents-harness";

const harness = new Harness({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  root: "/path/to/project",
  maxTotalBudgetUsd: 20,
});

harness.on("phase:start", (data) => {
  console.log(`Phase: ${data.phase}, Sprint: ${data.sprint}`);
});

harness.on("evaluation", (data) => {
  console.log(`Sprint ${data.sprint}: ${data.result.passed ? "PASS" : "FAIL"}`);
});

harness.on("run:complete", (data) => {
  console.log(`Done — ${data.status}, cost: $${data.totalCostUsd.toFixed(2)}`);
});

await harness.run("Add a REST API for managing todos");

Exported classes and functions

| Export | Description | |--------|-------------| | Harness | Main orchestrator class | | ContextManager | Wraps Agent SDK with fresh context per call | | FileProtocol | Manages .harness/ directory state | | DashboardServer | HTTP + WebSocket dashboard server | | buildProjectContext | Auto-detect project stack and config | | detectStack | Detect language, framework, test runner | | buildSystemPrompt | Build agent system prompts | | DEFAULT_CRITERIA | Built-in evaluation criteria |

The Three Agents

| Agent | Model | Role | Tools | |-------|-------|------|-------| | Planner | Opus | Writes specs, decomposes into sprints, writes contracts | Read, Write | | Generator | Opus | Implements code based on the contract | Read, Edit, Write, Bash, Glob, Grep | | Evaluator | Sonnet | Critically tests implementation against contract | Read, Bash, Grep, Glob |

Key design principle from the Anthropic article: the generator never evaluates its own work. A separate evaluator with fresh context provides unbiased assessment.

Requirements

  • Node.js 18+
  • An Anthropic API key
  • The @anthropic-ai/claude-agent-sdk package (peer dependency)

Credits

This project is built on the harness architecture described in Anthropic's engineering article: Harness Design for Long-Running Apps. The article introduces the pattern of separating generation from evaluation, using fresh context windows per agent invocation, and file-based state handoffs to enable reliable multi-hour autonomous coding sessions.

License

ISC