npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

harness-bench

v1.1.1

Published

How AI-Native is your dev environment? Benchmark your Claude Code harness across 8 axes.

Readme

harness-bench

npm version license: MIT Node ≥18

How AI-Native is your dev environment? A CLI that benchmarks your Claude Code harness across 8 axes — in under 5 seconds, with zero data leaving your machine.

npx harness-bench
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Harness Benchmark                                v1.1.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Adoption Depth   ███████████░░░  8/10
  Automation       ███████████░░░  8/10
  Context Eff.     ███████████░░░  8/10
  Tool Maker       ██████████████ 10/10
  Safety Guards    ██████████████ 10/10
  Multi-Agent      ██████████████ 10/10
  Portability      ██████████████ 10/10
  Learning Speed   ██████████████ 10/10

──────────────────────────────────────────────────────
  TOTAL            74 / 80
  LABEL            🛠  Tool Maker

  You don't use AI. You build environments where AI thrives.
──────────────────────────────────────────────────────

Why this exists

Every Claude Code user has a setup — some hooks here, an MCP server there, a CLAUDE.md file somewhere. But how do you know if that setup is actually advanced, or just busy? Is it production-grade or cargo-cult?

harness-bench gives you an objective score across 8 axes, calibrated against:

  • CMM/CMMI maturity levels (L1 Ad-hoc → L5 Optimizing)
  • Anthropic Claude Code documentation recommendations
  • Public AI-Native engineering writeups (Geoffrey Huntley, Simon Willison, swyx)
  • DORA metrics analogy (deployment frequency / lead time tiers)

It is opinionated, but the thresholds are written down in src/scoring/thresholds.ts — disagree and PR.

The 8 axes

| Axis | What it measures | Where the data comes from | |---|---|---| | Adoption Depth | MCP servers + skills + agents | ~/.claude.json, ~/.claude/skills/, ~/.claude/agents/ | | Automation | Tool calls per assistant message | ~/.claude/projects/*.jsonl (counts only) | | Context Efficiency | Avg session length + compaction usage | session metadata | | Tool Maker | Public repos matching AI-infra keywords | GitHub public API | | Safety Guards | Hook count + event diversity | ~/.claude/settings.json | | Multi-Agent | Task tool calls per 100 messages | session metadata | | Portability | Global CLAUDE.md + dotfiles repo + sync script | filesystem | | Learning Speed | Startup count × component breadth | ~/.claude.json |

The 6 character labels

Your 8 scores → one personality:

  • 🛠 Tool Maker — high Tool Maker + Safety. You build environments where AI thrives.
  • Speed Demon — high Automation + Multi-Agent. The bottleneck is no longer typing.
  • 🧙 Solo Wizard — broad mastery but low Portability. A castle built for one.
  • 🌊 Vibe Coder — high Automation, low Safety. Ships fast — and breaks fast.
  • 🔬 Tinkerer — adopting fast, components growing. Day 47 of trying every new MCP.
  • 📦 Cargo Culter — installed but unused. Configured but not internalized.

Privacy

harness-bench reads counts and configuration only. It never:

  • reads message content, prompts, or tool inputs/outputs
  • reads source code in your repos
  • transmits anything to a server (v0.1 — fully offline)

The only network call is to the public GitHub REST API to count your public repos (optional, for the Tool Maker axis — set HARNESS_BENCH_GITHUB_USER to override).

Usage

Three modes

# 1. Classic (heuristic only — fast, deterministic, no network)
npx harness-bench

# 2. LLM-augmented (BYO API key — judgment on top of heuristic)
ANTHROPIC_API_KEY=sk-... npx harness-bench --analyze

# 3. MCP server (no API key — your agent's LLM does the analysis)
npx harness-bench install     # one-shot: detects Claude Code, registers, sets env vars
# Then in Claude Code (new session): "Run harness-bench scan and analyze my environment"

The install command auto-detects Claude Code, fills env vars, backs up your config, and prints copy-paste instructions for Cursor / Codex / Gemini CLI / Aider users.

Why three? The classic heuristic is calibrated against author's environment (n=1) and miss assets in non-standard locations. The LLM-augmented modes inspect your actual file layout and adjust the score per-axis.

Other options

npx harness-bench --json              # Raw JSON
npx harness-bench --raw               # Per-axis raw metrics
npx harness-bench --svg               # 1200x630 SVG share card
npx harness-bench --svg=./my.svg
npx harness-bench --debug             # Tool histogram + subagent counts

Env vars (override autodetection)

HARNESS_BENCH_GITHUB_USER=yourname        # Override github user
HARNESS_BENCH_SYNC_SCRIPT=/path/to/sync   # Non-standard sync script location
HARNESS_BENCH_DOTFILES_DIR=/path/to/dot   # Non-standard dotfiles dir
HARNESS_BENCH_MODEL=claude-sonnet-4-6     # Model for --analyze (default: haiku)

Requires Node.js ≥ 18. Works on macOS, Linux, and Windows (Git Bash / PowerShell / cmd).

Reference benchmark

The author's own setup, as of release:

| Axis | Score | Raw | |---|---:|---| | Adoption Depth | 8/10 | 8 MCP + 10 skills + 8 agents = 26 | | Automation | 8/10 | 11,699 tool uses / 18,402 msgs = 63.6% | | Context Efficiency | 8/10 | avg 102 lines/session + compaction events present | | Tool Maker | 10/10 | 10 infra repos | | Safety Guards | 10/10 | 12 hooks across 9 event kinds | | Multi-Agent | 10/10 | 455 multi-agent calls + 978/1022 subagent files (95.5%) | | Portability | 10/10 | CLAUDE.md + dotfiles + sync.py (env var required for non-standard location) | | Learning Speed | 10/10 | 390 startups + 10 skills + 8 MCPs | | TOTAL | 74/80 | 🛠 Tool Maker |

Note: 74/80 requires env vars set (HARNESS_BENCH_SYNC_SCRIPT, HARNESS_BENCH_DOTFILES_DIR). Without env vars the heuristic alone scores 67/80 — this gap is intentional and explains why v1.0+ added --mcp / --analyze modes.

The author scores 74, not 80, on purpose. The thresholds aren't calibrated to make the maintainer look good — they're calibrated to the literature. If you score higher, ship a PR with your reference benchmark.

Roadmap

  • v0.2--svg share card, better Multi-Agent detection (SendMessage + subagent files), --debug mode ✅
  • v0.3 — PNG output (resvg-js), anonymous global percentile, Cursor adapter
  • v0.4 — Codex and Aider adapters, time series ("your AI-Native score over 6 months")
  • v0.5 — team mode (org-level aggregates)

Contributing

Disagreements about thresholds are the whole point. If you think Tool Maker shouldn't weigh keyword-match so heavily, or that Multi-Agent should account for parallel-agent patterns beyond the Task tool — open an issue or PR against src/scoring/.

License

MIT — see LICENSE.

Related

  • context-forge — bootstrap a fully context-engineered repo from a 5-minute discussion. The harness this benchmark measures is the one context-forge helps you build.

한국어 README는 README.ko.md 참고.