npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@twaldin/agentelo

v0.3.0

Published

Local benchmarking CLI for AI coding agents — Bradley-Terry rankings against a frozen baseline of 148 agents on real GitHub bug-fix challenges

Readme

agentelo

Local benchmarking tool for AI coding agents. Run your agent against real GitHub bug-fix challenges, get a Bradley-Terry score, see where it would slot into the snapshot of 148 baseline agents I ran across 6 harnesses.

Public leaderboard is closed. I'm not running a hosted submission server anymore — Stanford / Laude Institute's Terminal-Bench 2.0 + Harbor cover the public-leaderboard problem at a scale a solo student can't match. What's left is still useful: the harness adapters, the challenge corpus, and the baseline snapshot. The CLI now runs everything locally — register, run challenges, score, and rank your agent against the bundled baseline — with no network calls.

What it does

  • Runs your agent (any harness/model combo supported by @twaldin/harness-ts) on real merged-PR bug fixes from click, fastify, flask, jinja, koa, marshmallow, qs
  • Scores each run with the original PR's test suite — pass/fail per test, no rubric judgment
  • Compares your scores pairwise against the 148 bundled baseline agents using Bradley-Terry MLE, gives you an inferred ELO and which baselines your agent would beat

Use it to A/B your own prompt changes, your own harness configs, or a model you suspect is under- or over-rated by the baseline.

Install

npm i -g @twaldin/agentelo

Quickstart

# register a local agent (no network call — just saves identity to ~/.agentelo)
agentelo register --name my-agent --harness opencode --model gpt-5.4

# run a ranked match against a randomly picked challenge from the bundled corpus
agentelo play

# show your local results + inferred ranking against the baseline snapshot
agentelo leaderboard

The first play clones the challenge repo into ~/.agentelo/challenges/. After that, runs are offline.

Baseline snapshot (2026-04-15)

These rankings ship with the CLI and are what your local runs are scored against.

  • 148 agents ranked
  • 41 challenges across 7 repos
  • 6 harnesses: claude-code, codex, aider, swe-agent, opencode, gemini
  • Bradley-Terry ELO over all pairwise outcomes from ~3.5K verified runs

| Rank | Agent | ELO | Win Rate | |-----:|-------|----:|---------:| | 1 | swe-agent-glm-5 | 1887 | 85% | | 2 | opencode-glm-5 | 1882 | 85% | | 3 | opencode-gpt-5.4 | 1873 | 85% | | 4 | opencode-gpt-5.3-codex | 1861 | 84% | | 5 | gemini-gemini-3-flash-preview | 1856 | 84% |

Full rankings, match logs, and the SQLite database are in this repo. Browse the snapshot at tim.waldin.net/agentelo — read-only, no submission.

Where the related work lives

  • Multi-CLI harness abstractionharness (Python + TypeScript libraries, 13 adapters)
  • Fleet orchestrationflt (multi-agent, multi-CLI orchestrator)
  • Prompt/agent optimizationhone (uses harness as mutator backend)
  • Harness benchmarkingharness-bench (hold the model fixed, vary the scaffold)

License

MIT