npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tryinget/pi-evalset-lab

v0.2.0

Published

pi extension for fixed-task-set eval runs and prompt/system comparisons

Readme


summary: "Overview and quickstart for @tryinget/pi-evalset-lab." read_when:

  • "Starting work in this package workspace."
  • "Using /evalset run or /evalset compare." system4d: container: "Monorepo package for a pi fixed-task-set evaluation extension." compass: "Keep prompt/system comparisons small, reproducible, and easy to inspect." engine: "Define dataset -> run or compare variants -> export JSON/HTML report -> review deltas." fog: "Model/provider nondeterminism can make brittle checks noisy."

@tryinget/pi-evalset-lab

Monorepo package for fixed-task-set eval workflows in Pi (/evalset run|compare) with reproducible JSON reports and static HTML export.

  • Workspace path: packages/pi-evalset-lab
  • Release component key: pi-evalset-lab
  • Former legacy standalone source: ~/programming/pi-extensions/pi-evalset-lab
  • Canonical package status: canonicalized here; the legacy repo was archived to ~/programming/pi-extensions/pi-evalset-lab-final-archive.tar.gz and removed after validation.
  • Session-history migration: no legacy Pi session-history directory existed for the old path, so relocation was recorded as skip-no-history.

Primary category fit: Model & Prompt Management, Review & Quality Loops, UX & Observability, Safety & Governance.

Runtime dependencies and packaged files

This package expects Pi host runtime APIs and declares them as peerDependencies:

  • @mariozechner/pi-coding-agent
  • @mariozechner/pi-ai

The npm package uses a files whitelist so required runtime artifacts are explicitly included:

  • extensions/evalset.ts
  • prompts/
  • examples/ (sample datasets + sample report UI)
  • scripts/export-evalset-report-html.mjs

Quickstart

Install package dependencies for local validation:

cd packages/pi-evalset-lab
npm install
npm run check

Install into Pi from the package directory containing package.json:

pi install /absolute/path/to/pi-extensions/packages/pi-evalset-lab
# then in Pi: /reload

For ad hoc source testing from this package directory:

pi -e ./extensions/evalset.ts

evalset command

/evalset help
/evalset init [dataset-path] [--force]
/evalset run <dataset.json> [--system-file <path>] [--system-text <text>] [--variant <name>] [--max-cases <n>] [--temperature <n>] [--out <report.json>]
/evalset compare <dataset.json> <baseline-system.txt> <candidate-system.txt> [--baseline-name <name>] [--candidate-name <name>] [--max-cases <n>] [--temperature <n>] [--out <report.json>]

/evalset is a Pi slash command, not a shell executable.

Interactive mode:

pi -e ./extensions/evalset.ts
# then inside Pi:
/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt

Non-interactive mode:

pi -e ./extensions/evalset.ts -p "/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt"
# or, if installed/enabled:
pi -p "/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt"

Interactive sessions use Pi UI hooks (ctx.ui) for status/notify updates. Non-interactive -p mode skips those UI calls when ctx.hasUI === false.

Included datasets and sample output

  • examples/fixed-task-set.json — tiny smoke set (3 cases)
  • examples/fixed-task-set-v2.json — larger first pass set
  • examples/fixed-task-set-v3.json — less brittle checks (recommended)
  • examples/evalset-compare-sample-embedded.html — self-contained report UI with embedded compare JSON
  • examples/evalset-compare-sample.png — screenshot preview of that HTML report
  • examples/system-baseline.txt and examples/system-candidate.txt — compare inputs

Preview:

Evalset compare sample screenshot

Reports are written to explicit --out <path> when provided, otherwise .evalset/reports/*.json under the current project directory.

Each report includes run identity metadata (runId, datasetHash, casesHash, and variant hashes). Session messages keep lightweight report metadata only, not full report bodies.

Export report JSON to static HTML

npm run evalset:export-html -- --in .evalset/reports/compare-your-dataset-YYYYMMDDTHHMMSS.json
# optional:
npm run evalset:export-html -- --in .evalset/reports/run-your-dataset-YYYYMMDDTHHMMSS.json --out .evalset/reports/run-your-dataset.html --title "Evalset run report"

Script: scripts/export-evalset-report-html.mjs

Validation and release checks

Package-local validation:

npm run check
npm run release:check:quick

Monorepo-scoped validation:

cd ../..
bash ./scripts/package-quality-gate.sh ci packages/pi-evalset-lab
node ./scripts/release-components.mjs validate

Release metadata is root-managed through x-pi-template.releaseConfigMode=component and component key pi-evalset-lab.

The scoped package @tryinget/pi-evalset-lab is the canonical npm identity for future releases. The old unscoped [email protected] package remains historical registry state, not the canonical development target.

Optional core hooks (future, not required)

This extension works today without Pi core changes. Optional hardening could include stable agent-level lineage IDs, explicit reproducibility metadata in pi-ai, shared provider payload hashing, or a headless agent-eval API for tool-heavy/full agent-loop benchmark runs.