npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@latent-variable/pi-terminal-bench

v1.0.7

Published

Self-contained benchmark suite for Pi. Runs QuixBugs and other coding tasks locally — no Docker, no Python frameworks, no external dependencies.

Readme

pi-terminal-bench

68 coding tasks for pi. No Docker, no frameworks, no API keys — just the pi CLI and Python 3.10+. You watch the agent work in real time.

Install

pi install /path/to/pi-terminal-bench

Then restart pi or run /reload.

Requirements

  • pi CLI
  • Python 3.10+ and bash
  • Optional: numpy, pandas, sympy, word2number — a handful of ported Terminal-Bench tasks use these. The verify scripts auto-install on demand, so you usually don't have to care.

Runs against any model pi has configured — local (OMLX, LM Studio, Ollama) or remote (Anthropic, OpenAI). Defaults to your active model; append provider/model to any command to override.

Commands

| Command | What it does | |---|---| | /bench-list [filter] | List tasks. filter matches name, category, or tag | | /bench-run <task\|category\|all> | Run one task, a whole category, or everything | | /bench-results [N] | Recent runs. With N, per-task detail for run N | | /bench-doctor | Check prerequisites | | /bench-cleanup | Kill stray benchmark processes |

Tasks — 68 across 11 categories

| Category | Count | Command | What it tests | |---|---|---|---| | QuixBugs | 40 | /bench-run quixbugs | Single-line Python bug fixes (upstream) | | Terminal-Bench ports | 8 | /bench-run terminal-bench | Tasks ported from Terminal-Bench, Docker-free | | Hard | 7 | /bench-run hard | Multi-step algorithms, parsing, concurrency | | Long Context | 6 | /bench-run long-context | Multi-file refactors, test generation, API migrations | | Code Generation | 3 | /bench-run codegen | Build CLIs, REST APIs, state machines from a spec | | Performance | 2 | /bench-run perf | Optimize O(n²) code | | Security | 2 | /bench-run security | Fix SQL injection and path traversal | | File Operations | 2 | /bench-run file-operations | Read/write/transform files | | Mathematics | 2 | /bench-run math | Symbolic math, arithmetic puzzles | | Games | 2 | /bench-run games | Game-logic and puzzle solvers | | Data Science | 1 | /bench-run data-science | pandas ETL | | Debugging | 1 | /bench-run debugging | Fix a diverging ML training loop |

Run /bench-list <category> to see individual task names.

Example

/bench-run quixbugs-python-bitcount                         # one task
/bench-run hard                                             # one category
/bench-run quixbugs anthropic/claude-sonnet-4-20250514      # override model
/bench-run all                                              # everything
/bench-results                                              # past runs
/bench-results 1                                            # per-task detail

Results are written as JSON to ~/.pi/agent/pi-terminal-bench/results/.

Adding tasks

Drop a JSON file in tasks/:

{
  "name": "my-task",
  "description": "What this tests",
  "instruction": "What the agent sees",
  "setup_files": { "buggy.py": "...", "test.py": "..." },
  "verify": "cd $BENCH_WORK_DIR && python3 test.py",
  "timeout": 180000,
  "tags": ["custom"]
}

$BENCH_WORK_DIR is replaced with the task's workspace. verify passes iff exit code is 0. Keep verifies fast (< 30s), deterministic, and scoped to the workspace.

Safety

Every task runs in an isolated temp directory with a pi-bench. prefix ($TMPDIR/pi-bench.XXXXXX). After each task — pass, fail, or abort — the runner kills lingering processes (including descendants reparented to launchd) and removes the workspace. Active workspaces are persisted to ~/.pi/agent/pi-terminal-bench/active-workdirs.txt, so /bench-cleanup can sweep orphans from crashed sessions.

Every cleanup path is scoped strictly to paths matching pi-bench. — Homebrew, Xcode, git, and any other tool's temp files are untouchable.

Timeouts

Each task has a timeout (default 180s; harder tasks use 240s or 360s). If a command hangs, the agent gets a 2× extended window to recover with a steer message explaining the timeout. If the agent makes no file changes, the task is recorded as FAIL — never a false PASS.

Contributing

PRs welcome. Terminal-Bench has 241 Docker-based tasks; we've ported a subset that runs without Docker and will expand over time.