npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-booster-pack-proof

v2.0.1

Published

Proof-first Pi extension that uses a red-green-refactor cycle when behavior should be specified in tests first, with built-in parsing for popular test frameworks. Renamed from pi-proof; old name deprecated.

Readme

agent-booster-pack-proof

A proof-first extension for Pi, the terminal coding agent.

It nudges the agent into a red-green-refactor loop when the next change needs a test. It stays out of the way for docs, config, and exploration.

agent-booster-pack-proof demo

Renamed from pi-proof. The old npm name is deprecated; migrate to agent-booster-pack-proof for ongoing updates. This package is one of four sibling packages in Agent Booster Pack; install the meta-package agent-booster-pack for the full bundle.

Install

Install Pi:

npm install -g @mariozechner/pi-coding-agent
pi

Install agent-booster-pack-proof:

pi install npm:agent-booster-pack-proof

If Pi is already running, run /reload.

Use it

Ask the agent to change behavior:

Fix the off-by-one error in pagination

The agent decides if proof mode fits. If it does, the agent writes a failing test, makes it pass, refactors, and finishes.

Toggle by hand:

/proof

When proof mode helps

Reach for it when:

  • A bug has a clear failing case.
  • A feature adds or changes observable behavior.
  • A business rule needs to be locked down before code.

Skip it when:

  • You are editing docs, config, manifests, or lockfiles.
  • You are scaffolding plumbing.
  • You are exploring and the behavior is not settled.

By default the extension is advisory. It tells the agent that proof mode is available. The agent decides. Once on, the loop is strict.

Why this works

Tests give the agent ground truth. Without it, the agent guesses.

The research backs this up. TDFlow (2025) found that human-written acceptance criteria improve agent accuracy by 12–46 points. AlphaCodium (2024) raised GPT-4 accuracy from 19% to 44% with a test-execute-fix loop. Reflexion (NeurIPS 2023) hit 91% on HumanEval, up from 80%.

Tests document, too. They show the next reader, human or agent, how the system actually behaves.

Without test discipline, agents tend to:

  • Implement before specifying behavior.
  • Change too much at once.
  • Mix features with refactors.
  • Declare success from plausibility, not proof.

How it works

Three phases. Each phase tells the agent what is allowed.

stateDiagram-v2
    OFF --> SPECIFYING : /proof or proof_start
    SPECIFYING --> IMPLEMENTING : test fails
    IMPLEMENTING --> REFACTORING : tests pass
    REFACTORING --> SPECIFYING : new turn
    REFACTORING --> OFF : /proof or proof_done

SPECIFYING. The agent writes a failing test. Production write and edit calls are blocked. Test files and config files pass through. A failing test advances to IMPLEMENTING. If the test fails because a module cannot be imported, the agent gets a one-shot allowance to create a minimal stub so the test can load — the allowance clears after the next run.

IMPLEMENTING. The agent writes the smallest code that makes the test pass. A passing test advances to REFACTORING.

REFACTORING. The agent restructures. Failing tests tell the agent to revert. No new behavior here.

A new turn — not proof_done — closes the cycle and returns to SPECIFYING. The cycle counter ticks then.

Phase transitions ride on test results. The extension runs tests after every file write and parses the output. SPECIFYING only advances after it sees a test file written or a manual test run; unrelated failures do not push the phase forward.

Some files skip the loop: configs, lockfiles, docs, scaffolding. The extension recognizes them by path.

Test integration

The extension finds your test command from what it sees in the project:

| Detected | Runs | |----------|------| | package.json with test script | npm test | | Cargo.toml | cargo test | | go.mod | go test ./... | | pytest.ini or pyproject.toml | pytest |

If it cannot tell, it asks once.

It recognizes test files by name: *.test.*, *.spec.*, *_test.*, *_spec.*, plus files under __tests__/ or test/.

It parses output from:

| Language | Frameworks | |----------|-----------| | JS/TS | Jest, Vitest, Mocha, Bun, AVA | | Python | pytest, unittest | | Go | go test | | Rust | cargo test | | Ruby | RSpec, Minitest | | Java/Kotlin | Gradle; JUnit/Maven (summary) | | C# | dotnet test | | Swift | XCTest, Swift Testing | | PHP | PHPUnit, Pest | | Elixir | ExUnit | | Universal | TAP |

When per-test lines aren't found, the parser falls back to summary regex. Parsed results show in the tool result and in the HUD.

HUD

When proof mode is on, a widget shows:

  • The phase, color-coded.
  • The cycle count.
  • Passed, failed, duration.
  • Up to seven test results, with an overflow indicator.

It updates after each test run.

Tools

| Interface | What it does | |-----------|--------------| | proof_start | Agent tool. Enters proof mode. | | proof_done | Agent tool. Exits proof mode. | | /proof | Slash command. Manual toggle. |

The legacy tdd_start, tdd_done, and /tdd still work.

Limits

This extension enforces the loop, not the quality of the tests.

  • Shallow user stories give shallow confidence.
  • Proof mode is opt-in per task. The extension does not force it on every change.
  • Only SPECIFYING blocks writes. IMPLEMENTING and REFACTORING steer through prompts.
  • Shell-based production writes during SPECIFYING are warned, not blocked.
  • The import-only stub allowance lets SPECIFYING produce a minimal production stub when the test file cannot load.
  • A new turn closes the cycle, not proof_done. A long turn can stay in REFACTORING across many writes.
  • No state between sessions.
  • No LLM review. The extension trusts the test runner.

Development

git clone [email protected]:kreek/agent-booster-pack.git
cd agent-booster-pack/agent-booster-pack-proof
npm install
npm run install-hooks
npm test

The pre-commit hook runs biome check --staged.

src/
  index.ts        Extension entry, phase machine, HUD, tools
  parsers.ts      Test output parsers (13 frameworks)
test/
  parsers.test.ts Parser tests

To add a parser, append a TestLineParser to defaultParsers in src/parsers.ts. For development installs from a local checkout, npm run install-ext symlinks the repo into ~/.pi/agent/extensions/agent-booster-pack-proof.

Eval

The extension ships with an eval harness built on pi-do-eval. It runs Pi with agent-booster-pack-proof loaded against small coding projects and scores proof-first compliance, test quality, and correctness.

cd eval
npm install
npm run eval -- list                                          # list trials, variants, suites
npm run eval -- run small                                     # fast regression
npm run eval -- run --trial temp-api --variant typescript-vitest
npm run view                                                  # http://localhost:3333
npm run eval -- regress small                                 # compare against previous run

small is for day-to-day changes. full is for releases.

Suites run serially. --concurrency opts into parallel runs, but the harness refuses values above 1 when the worker or judge provider is subscription-backed.

License

MIT