npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

cwv-harness

v0.2.0

Published

Rigorous experimentation harness for Core Web Vitals optimization, operable by an AI agent in any React/Remix/Next.js repo

Readme

cwv-harness

A rigorous experimentation harness for improving Core Web Vitals in any React / Remix / Next.js (or any buildable+servable) web repo, designed to be operated by an AI agent (or a human) running many experiments over time — with machinery that makes it hard to self-deceive about whether a change is actually an improvement.

Instantiated from the reusable harness template (docs/REUSABLE_HARNESS_PROMPT.md in this workspace), which was itself distilled from a 48-experiment BTC trading research program. The founding domain analysis is PHASE0.md; the methodology is PROTOCOL.md. The short version: lab CWV measurements are noisy, optimization pressure manufactures false wins by default, and the counters are paired interleaved measurement, a frozen protocol, an append-only trial ledger feeding a multiple-testing bar, guardrails on everything the primary metric doesn't see, and a sealed verdict tier (held-out pages + post-ship field data).

Architecture: built once, instantiated thinly per repo

  • The package (this directory) — measurement engine, statistics, gates, ledgers, CLI, local scheduler, protocol docs, agent operating prompt. Shared by every target repo; never reimplemented per repo.
  • Per target repo (generated by cwv-harness init)cwv.config.json (build/ serve commands, page set, field source, budgets), harness/FOUNDING.md (repo-specific Phase-0 answers), harness/CLAUDE.md (agent operating instructions), and that repo's own ledgers and experiment directories. Ledgers are per-repo by design — each site is its own population (see PHASE0.md "One ledger per repo").

Local-first by design: everything — measurement, scheduling, integrity — runs on a developer machine with no cloud infrastructure. Field snapshots are scheduled with cwv-harness schedule install (launchd on macOS, systemd/cron on Linux); integrity is cwv-harness verify-integrity before commits. GitHub workflows for both exist as an opt-in extra (cwv-harness init --github-workflows) but nothing depends on them — real RUM sources usually need auth that CI can't have anyway.

Install into a target repo

# from the target repo root — ALWAYS as a devDependency:
npm i -D cwv-harness   # or yarn add -D / pnpm add -D — lighthouse and
                       # chrome-launcher are bundled as dependencies
npx cwv-harness init          # local bin — resolves to this repo's node_modules
npx cwv-harness doctor        # verifies Chrome, build, serve, field source, scheduler

Chrome/Chromium itself is the one system requirement (chrome-launcher finds an installed browser; it doesn't download one) — cwv-harness doctor checks.

Invocation note: the command is cwv-harness, same as the package — there is deliberately no short cwv bin (an unrelated npm package owns that name, and the mismatch confused operators). Install as a devDependency and invoke via your package runner (npx cwv-harness / yarn cwv-harness / pnpm exec cwv-harness): running unpinned from the remote npx cache means the harness version isn't locked by your repo, so the frozen protocol can drift between sessions — cwv-harness doctor detects and explains this.

init detects the framework (Next/Remix/Vite/CRA, overridable), scaffolds cwv.config.json with TODOs the operator must fill (page set, baseline ref, field source), splits pages into measured/held-out, copies the protocol docs and agent prompt, and points you at cwv-harness schedule install to start the prospective field record immediately.

Serving real production apps

Generic preview-server defaults don't survive contact with a clustered prod server. The config gives each reality a first-class seam instead of forcing everything into one serve.command string:

  • build.installCommand: null auto-detects the package manager from the lockfile (yarn/pnpm/npm) for the baseline worktree.
  • serve.envFile — KEY=VALUE file loaded into the env of every harness-run command (install, build, prepare, serve, smoke).
  • serve.prepare — command run in BOTH the candidate root and the baseline cache worktree after build, for gitignored-but-required runtime artifacts (generated locales, env copies, seed data) so both variants reach runtime parity.
  • {port}/$PORT plus {port2}/$PORT2 — two guaranteed-free ports per server, so an auxiliary listener (metrics, websockets) doesn't collide when baseline and candidate run concurrently. Derive any further ports from $PORT.
  • serve.readyTimeoutMs defaults to 240 s — clustered prod servers cold-boot slowly, and a ready server is never slowed by a high ceiling.
  • smoke.command must be self-serving: it runs after the battery's servers are torn down.

The CLI

| command | tier | what it does | |---|---|---| | cwv-harness init [--github-workflows] | — | instantiate the harness in this repo (workflows opt-in; local-first otherwise) | | cwv-harness doctor | — | verify Chrome/build/serve/field source/scheduler work here | | cwv-harness measure [--pages ...] [--rounds N] [--save runs.jsonl] | exploration | free paired measurement with absolute numbers + deltas; never citable | | cwv-harness profile --page /x [--variant V] [--save lhr.json] | exploration | one throttled run: long tasks, per-script CPU, shipped transfer bytes | | cwv-harness screen --name slug | exploration | scaffold a pre-registered GO/NO-GO analysis doc in harness/analysis/ | | cwv-harness new --name NNNN-slug | — | scaffold an experiment from the template | | cwv-harness run --experiment NNNN | confirmation | the frozen battery; appends a trial to the ledger | | cwv-harness regrade --experiment NNNN --profile P | free | re-grade stored raw runs under another profile (new record, not a replacement) | | cwv-harness leaderboard | — | all ledgered trials, gates, verdicts | | cwv-harness field snapshot [--dry-run] | prospective | append current field p75s (RUM/CrUX) to the field ledger (dry-run: validate without polluting it) | | cwv-harness schedule install [--at HH:MM] \| status \| uninstall | prospective | local daily snapshot scheduling: launchd / systemd user timer / cron | | cwv-harness verdict --experiment NNNN --i-accept-the-cost | verdict | held-out pages / field window; PROMOTE only; ledgered | | cwv-harness verify-integrity | — | append-only ledgers, immutable page partition, unweakened gates, report/ledger consistency | | cwv-harness selftest | — | null-runner battery (A vs A): any detected "edge" is a harness exploit |

How a measurement works

  1. Baseline ref (pinned in config) is built in a cached git worktree; the candidate (working tree) is built alongside. serve.prepare then brings both to runtime parity; orphaned servers from a previous crashed run are reaped first; free ports are allocated.
  2. Both are served concurrently on two local ports.
  3. Each Lighthouse run executes in its own subprocess that exits when done — orchestrator memory stays flat across an arbitrarily long battery (Lighthouse caches parsed sourcemaps in module state; on heavyweight apps this OOMs any in-process loop). Fresh Chrome profile per run. Battery runs compute only the metric audits — the sourcemap-based diagnostics the full performance category drags in (most of the per-run time on heavy apps) are skipped; the trace is captured during load either way, so the measured numbers are unchanged. Diagnostics live in cwv-harness profile, which keeps the full category; cwv-harness run --keep-lhr stores full per-run reports for one-off forensics. Under simulated throttling, the two variants of a round run concurrently (same machine-drift window, half the wall clock); under throttlingMethod: "devtools" they run sequentially (real CPU pinning can't be shared). Rounds are ABBA-interleaved either way. runner.parallelArms: true (opt-in) additionally runs the perturbation arms alongside the main arm — about half the battery wall-clock again, at the cost of pair variance on a busy machine (never bias: pairs stay simultaneous). Quiet machine only; protocol-relevant.
  4. Raw per-round results are written to the experiment's raw/ dir before grading — a crash in the stats loses nothing (cwv-harness regrade recovers).
  5. Statistics are computed on paired differences only: per-page median of per-round %Δ, aggregated as the median across pages, significance by a bootstrap that respects page clustering, deflated by the ledgered trial count (Bonferroni).
  6. The full panel — LCP, TBT (INP lab proxy), CLS, FCP, TTFB, LCP element identity, shipped JS bytes, console errors — is recorded on every run regardless of the experiment's primary metric, and the gate battery (validity / objective profile / ruin) produces PROMOTE or ITERATE.

The cloaking check tolerates request-scoped server state (session UUIDs, trace ids, serialized __remixContext/__NEXT_DATA__ blobs): it first establishes the page's own determinism with two same-UA fetches, then compares exactly when deterministic or structurally (tag sequence) when not. Extra app-specific patterns go in cloaking.stripPatterns.

What's tested vs what doctor verifies

The statistical core, ledger append-only enforcement, gates, config validation, and the full battery against a deterministic fake runner are covered by node --test in this package. The Lighthouse/Chrome/serve path cannot be exercised in every environment — that's what cwv-harness doctor validates inside each target repo before the first real measurement.

Changelog

0.2.0 (2026-06-12)

Hardening from the first real campaign (clustered Express/Remix prod app, 16 GB machine):

  • Lighthouse runs in a subprocess per run — fixes the orchestrator OOM on sourcemap-heavy apps (the in-process module cache grew unbounded). One automatic retry per run for transient Chrome failures.
  • Raw runs persist before grading; reruns write raw-rerun-N/ instead of overwriting the original.
  • Server log buffers are detached once ready; orphaned servers from crashed runs are reaped on the next start; ports are allocated free instead of hardcoded, with {port2}/$PORT2 for auxiliary listeners.
  • Paired runs execute concurrently under simulated throttling (runner.concurrentPairs: "auto") — ~half the battery wall-clock.
  • Cloaking check rebuilt for real servers: UUID/traceparent/hex-id and framework-state normalization, determinism probe, structural fallback, cloaking.stripPatterns for app-specific state.
  • Local-first: cwv-harness schedule install|status|uninstall (launchd/systemd/ cron) replaces the GitHub field-snapshot workflow as the default capture path; workflows are opt-in via cwv-harness init --github-workflows. Doctor checks the scheduler is installed and the ledger is still accruing.
  • serve.envFile, serve.prepare, lockfile-aware install detection, 240 s default ready timeout.
  • New exploration tools: cwv-harness profile (long tasks, script CPU, shipped transfer bytes — never sourcemap byte sums), cwv-harness screen (pre-registered GO/NO-GO scaffold), cwv-harness measure absolute numbers + --save, cwv-harness field snapshot --dry-run, field.windowDays.
  • Doctor: live rumCommand validation, npx-cache detection, package-runner- aware install hints.
  • The command is now cwv-harness, matching the package name; the cwv bin is removed. An unrelated npm package owns the name cwv (remote npx cwv fetched it), and the package-vs-command mismatch confused operators. All docs, templates, prompts, and generated scripts say cwv-harness.
  • lighthouse and chrome-launcher are now regular dependencies (were optional peers) — one npm i -D cwv-harness install, no resolution failures, and the repo's lockfile pins the measuring instrument's version. Chrome itself remains the only system requirement.
  • The instrument is part of the frozen protocol: the harness package version is recorded in experiment.json at pre-registration and cwv-harness run refuses on mismatch — upgrading or hot-patching the harness mid-experiment now fails loudly instead of silently changing the measurement (0.1.0-era experiments without the field are exempt).
  • Battery runs trimmed to metric audits only (onlyAudits): no sourcemap fetching/parsing per run — a large cut in per-run time and memory on sourcemap-heavy apps, with identical measured numbers (the trace is captured before those audits would run). Side effect fixed in passing: errors-in-console is a best-practices audit and never actually ran under onlyCategories: ["performance"] — the console-errors ruin gate saw an empty list; it now genuinely executes. cwv-harness profile keeps the full performance category (and now reports the unused-JS estimate); cwv-harness run --keep-lhr stores full per-run LHRs under raw*/lhr/ for forensics.
  • runner.parallelArms (opt-in, default false): perturbation arms run concurrently with the main arm — roughly halves battery wall-clock again (≈21 → ≈11 pair-slots), at up to 6 concurrent Chromes. No bias (pairs stay simultaneous; contention is common-mode), but added pair variance — use on a quiet machine. Protocol-relevant: set before pre-registering.
  • cwv-harness init now also installs a Claude Code skill (.claude/skills/cwv-harness/SKILL.md) — the on-demand operational playbook (commands, flows, sharp edges); harness/CLAUDE.md stays the short always-loaded rules.

Upgrade note: new runner defaults (throttlingMethod, concurrentPairs) change protocolHash — in-flight pre-registered experiments will refuse to run (by design: the measurement process changed). Finish open experiments on 0.1.0 or re-scaffold them under the new protocol.

0.1.0 (2026-06-11)

Initial release, built inside the btc-research workspace and distilled from a 48-experiment trading-research program.