npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-evaluate

v0.1.3

Published

Adversarial post-execute evaluation skill for pi — verifies implementation against contract, GAN-inspired

Readme

pi-evaluate

An adversarial post-execute evaluation skill for pi.

After a complex execution, you're staring at a large diff and don't know where to look. pi-evaluate reads your contract (what you asked for) and your outputs (what was built), then tells you exactly where to focus — and what you can safely skip.

Inspired by the GAN discriminator pattern: a second agent that sees only the contract and the output, never the implementation plan, and returns a structured verdict.


What it does

pi-evaluate acts as an adversarial discriminator:

  • Reads your contract — brief + specs (reespec), or freeform text you paste in
  • Reads your actual outputs — files, test results, documents
  • Returns a structured verdict per capability: ✅ SATISFIED / ⚠️ PARTIAL / ❌ UNSATISFIED / ❓ UNCLEAR
  • Produces a triage summary: safe to skip, worth a look, human call

It does NOT read tasks.md, design.md, or any implementation intent. It is blind to the "how" — it only judges whether the "what" was delivered.

It does NOT fix gaps. It reports them. You decide what to do.


Installation

npm install pi-evaluate

Then restart pi or run /reload. The evaluate skill will appear in your available skills.


Reespec mode

If you use reespec, pi-evaluate detects your project automatically.

After completing an execute phase, invoke the skill:

/skill:evaluate

The evaluator will:

  1. Detect your active reespec request
  2. Load brief.md and specs/ as the contract silently
  3. Scan your outputs
  4. Return a verdict per spec capability + triage summary

Example output:

Evaluating request: my-feature

### user-auth-capability
verdict:  ⚠️ PARTIAL
reason:   brief says "support OAuth and password login" — found OAuth handler,
          no password login handler found in src/auth/
focus:    src/auth/ — password login handler is missing

### error-handling-capability
verdict:  ✅ SATISFIED
reason:   all error paths covered in tests/errors.test.mjs

## Triage
✅ Safe to skip:   error-handling, logging
⚠️  Worth a look:  user-auth (password login missing)

Standalone mode

No reespec? No problem. The skill works with any project.

Invoke it:

/skill:evaluate

You'll be asked:

"What's the contract? Paste your original ask, acceptance criteria, or whatever defines done."

Paste anything — a paragraph, a bullet list, a copied ticket, a Slack message. No structure required.

Example:

What's the contract?

> Build a user settings page. It should let users change their email and password.
> There should be a confirmation dialog before saving. Mobile-friendly. No external
> auth libraries.

(contract: user-supplied)

### change-email
verdict:  ✅ SATISFIED
reason:   src/settings/email.tsx exists, email change form found with validation

### change-password
verdict:  ⚠️ PARTIAL
reason:   password field found but no confirmation dialog present in src/settings/
focus:    src/settings/ — confirmation dialog before save is missing

### mobile-friendly
verdict:  ❓ UNCLEAR
reason:   contract says "mobile-friendly" but no breakpoints or responsive tests defined —
          cannot verify without clearer criteria
focus:    human call — define what mobile-friendly means for this project

## Triage
✅ Safe to skip:   change-email
⚠️  Worth a look:  change-password (missing confirmation dialog)
❓  Human call:    mobile-friendly (underspecified)

The GAN idea

GANs (Generative Adversarial Networks) pit two neural networks against each other: a generator that creates fake data, and a discriminator that judges whether the data is real or fake. The discriminator never sees how the generator made the data — it only sees the output and the training data (what "real" looks like).

pi-evaluate borrows this pattern:

| GAN | pi-evaluate | |---|---| | Generator | Your agent (execute phase) | | Discriminator | The evaluator skill | | Training data ("real") | The contract (brief + specs) | | Generated output ("fake") | The implementation | | "Is this real?" | "Does this satisfy the contract?" |

The key insight: the discriminator is blind to implementation intent. It can't be charitable about what the generator "meant to do" — it only sees what exists. This is what makes it useful. A self-review by the same agent that built the thing will always be biased. A blind discriminator won't.


Verdicts

| Label | Meaning | |---|---| | ✅ SATISFIED | All requirements for this capability are clearly present | | ⚠️ PARTIAL | Some requirements present, some missing | | ❌ UNSATISFIED | No evidence of this capability in the outputs | | ❓ UNCLEAR | Contract is too underspecified to judge — flag for human |


Philosophy

  • Optional — never a hard gate. You decide what to do with the verdict.
  • Adversarial — looks for gaps, not confirmation. Absence of evidence is flagged.
  • Focused — the triage summary is the primary output. The human reads this first.
  • Honest about uncertainty — UNCLEAR is not failure. It means your contract needs more detail.

License

MIT


Made with reespec and ♥ in EU