npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@aliou/pi-evals

v0.3.0

Published

Eval framework for pi coding agent

Readme

@aliou/pi-evals

Eval framework for the pi coding agent.

Installation

pnpm add @aliou/pi-evals

Quick Start

Create an eval file in evals/:

// evals/hello.eval.ts
import { evaluate, Scorers } from "@aliou/pi-evals";

evaluate("Create hello file", {
  config: {
    model: "claude-sonnet-4-20250514",
    provider: "anthropic",
  },
  data: [
    {
      input: 'Create a file called hello.txt containing "Hello World"',
      expected: { files: { "hello.txt": "Hello World" } },
    },
  ],
  scorers: [Scorers.files()],
});

Run evals:

npx pi-evals

Configuration

Create pi-evals.config.ts:

import { defineConfig } from "@aliou/pi-evals";

export default defineConfig({
  defaults: {
    model: "claude-sonnet-4-20250514",
    provider: "anthropic",
  },
  evalsDir: "./evals",
  delayBetweenTests: 500,
  timeout: 60_000,
  warnTestCount: 30,
});

CLI Options

pi-evals [options]

Options:
  -h, --help              Show help
  -f, --filter <pattern>  Filter evals by name
  -t, --threshold <pct>   Minimum pass percentage to exit 0
  -c, --config <path>     Config file path
  -m, --model <model>     Override model
  -p, --provider <name>   Override provider
  -v, --verbose           Verbose output
  --json                  Output results as JSON

Environment Variables:
  PI_EVAL_MODEL           Override model (lower priority than -m)
  PI_EVAL_PROVIDER        Override provider (lower priority than -p)

Examples:

pi-evals                                # Run all evals
pi-evals -p github-models -m gpt-4o     # Use GitHub Models
PI_EVAL_PROVIDER=github-models pi-evals # Via env var

Built-in Scorers

Scorers.files()

Checks that expected files exist with expected content.

{
  expected: { files: { "hello.txt": "Hello World" } },
  scorers: [Scorers.files()],
}

Scorers.outputContains()

Checks that the agent's output contains expected substring.

{
  expected: { output: "created file" },
  scorers: [Scorers.outputContains()],
}

Scorers.outputMatches(pattern)

Checks that the agent's output matches a regex.

{
  scorers: [Scorers.outputMatches(/function \w+\(/)],
}

Scorers.bash(command, options?)

Runs a command and checks the exit code.

{
  scorers: [Scorers.bash("npm test")],
}

Options:

  • exitCode: Expected exit code (default: 0)
  • timeout: Command timeout in ms (default: 30000)

Scorers.llmJudge(options)

Uses an LLM to evaluate the output.

{
  scorers: [
    Scorers.llmJudge({
      criteria: "The response correctly explains the solution",
      model: "gpt-4o-mini", // optional
      provider: "openai", // optional
    }),
  ],
}

Test Case Options

{
  input: "Create a file",
  expected: { files: { "file.txt": "content" } },
  setup: {
    files: { "existing.txt": "existing content" },
    commands: ["npm init -y"],
  },
  timeout: 30_000,
  only: false, // Run only this test
  skip: false, // Skip this test
}

Custom Scorers

const customScorer: Scorer = {
  name: "custom",
  score: async (ctx) => {
    const fileExists = await fs.access(path.join(ctx.cwd, "output.txt"))
      .then(() => true)
      .catch(() => false);

    return {
      name: "custom",
      score: fileExists ? 1 : 0,
      reason: fileExists ? "File exists" : "File not found",
    };
  },
};

CI Integration

GitHub Models (built in)

github-models support is built in. No repo-local extension file required.

export default defineConfig({
  defaults: {
    provider: "github-models",
    model: "gpt-4o",
  },
});

For GitHub Actions, grant model permission and pass GITHUB_TOKEN:

permissions:
  contents: read
  models: read

- name: Run evals
  env:
    PI_EVAL_PROVIDER: github-models
    PI_EVAL_MODEL: gpt-4o
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: npx pi-evals --json > results.json

Reusable workflow (for other repos)

Use this repo's reusable workflow:

jobs:
  evals:
    uses: aliou/pi-evals/.github/workflows/pi-evals.yml@main
    permissions:
      contents: read
      models: read
    secrets: inherit
    with:
      package-manager: npm
      install-command: npm ci
      eval-command: npx pi-evals --json

For pnpm projects:

with:
  package-manager: pnpm
  install-command: pnpm install --frozen-lockfile
  build-command: pnpm build
  eval-command: npx pi-evals --json

GitHub Action (short uses: aliou/pi-evals@...)

You can also call the composite action directly:

jobs:
  evals:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      models: read
    steps:
      - uses: actions/checkout@v4

      - uses: aliou/[email protected]
        with:
          package-manager: npm
          install-command: npm ci
          eval-command: npx pi-evals --json
          github-token: ${{ secrets.GITHUB_TOKEN }}

Replace vX.Y.Z with the package version you want to pin.

For pnpm projects, set package-manager: pnpm and install-command: pnpm install --frozen-lockfile.

License

MIT