npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@wifo/factory-harness

v0.0.14

Published

Scenario runner for software-factory specs — runs `test:` satisfaction lines and scores `judge:` lines via LLM

Readme

@wifo/factory-harness

The scenario runner. Executes test: and judge: lines against a parsed spec; produces a typed HarnessReport.

@wifo/factory-harness powers the runtime's validatePhase. Given a parsed Spec (from @wifo/factory-core), it walks each scenario's Satisfaction: block, runs bun test for test: lines and dispatches to an LLM judge for judge: lines, and returns a typed report. You usually don't reach for this package directly — the runtime does.

For AI agents: start at AGENTS.md (top-level). This README is detailed reference.

Install

pnpm add @wifo/factory-harness

Pre-installed via factory init (the scaffold's runtime depends on it).

When to reach for it

  • Programmatically run a spec's scenarios without going through the full runtime. Use runHarness({ spec, ... }) to get a HarnessReport.
  • Build your own validate phase. Compose runTestSatisfaction + a custom judge client to define a domain-specific validatePhase.
  • Implement a custom judge client. The exported JudgeClient interface is what the runtime + spec-reviewer + dodPhase all consume. Provide your own (e.g., a different LLM provider) and pass it in.
  • Parse a test: line manually. parseTestLine strips the locked syntax (file path + optional "name" filter) and tolerates stray backticks.

What's inside

CLI

factory-harness run <spec-path> [flags]

| Flag | Default | Notes | |---|---|---| | --scenario <ids> | all | Comma-separated scenario id filter (e.g., S-1,S-2,H-1). | | --visible | off | Only visible scenarios (skip holdouts). | | --holdouts | off | Only holdout scenarios. | | --no-judge | off | Skip judge: lines (status skipped). | | --model <name> | claude-haiku-4-5 | Override judge model. | | --timeout-ms <n> | 60000 | Per-judge timeout. |

The CLI is mostly used in tests + ad-hoc inspection. Production code reaches for runHarness() programmatically (or — more likely — uses the runtime's validatePhase).

Public API

import { runHarness, runTestSatisfaction, parseTestLine, formatReport }
  from '@wifo/factory-harness';

import type {
  HarnessReport, HarnessScenarioResult, HarnessSatisfactionResult,
  HarnessOptions, JudgeClient, Judgment,
  TestRunnerOptions, ParsedTestLine, ReporterKind,
} from '@wifo/factory-harness';

Concepts

Two satisfaction kinds.

  • test: <path> "<name>" — spawns bun test <path> [-t "<name>"]. Pass/fail from exit code. The harness strips a leading + trailing backtick from both the path and the name (since v0.0.6) — bare paths are canonical but legacy backticked paths still work.
  • judge: "<criterion>" — calls a JudgeClient (default: Anthropic Claude via @anthropic-ai/sdk with tool-use for structured pass/score/reasoning output). The reviewer + the runtime's validatePhase and dodPhase all reuse this client interface.

Coverage trip detection (v0.0.13+). Per-scenario bun test --test-name-pattern <name> runs only exercise a slice of a file, so a host repo's bunfig.toml coverage threshold trips on the slice and bun exits non-zero even though every scenario assertion passed. The harness parses bun's output: when bun exits non-zero AND the output contains 0 fail AND the canonical coverage threshold of <n> not met marker, the satisfaction is classified as pass with detail prefix harness/coverage-threshold-tripped: <marker>; <existing tail> rather than fail. The conservative match requires both signals — a non-zero exit without the marker is still classified as fail. Coverage is a holistic property, meaningful only when the whole suite runs; the host's coverage gate runs separately at DoD time on the full suite. (v0.0.12 attempted the carve-out via --coverage=false, but bun 1.3.x rejects that flag — v0.0.13 ships the stdout-parse path instead.)

Quote-char normalization in test-name patterns (v0.0.12+). Stylistic apostrophes drift between a spec's test: line (e.g. "v0.0.10's hash") and the test's actual it() name (e.g. 'v0.0.10s hash' — auto-stylized during implementation), so an exact substring match no-matches correct work. The harness now normalizes quote-like characters (ASCII + curly apostrophes, ASCII + curly double-quotes, backticks) on the pattern before passing -t to bun. The companion factory spec lint rule spec/test-name-quote-chars catches non-ASCII quote chars at scoping time so authors can rewrite cleanly.

Test-name regex matching (v0.0.14). The v0.0.12 strip-everything carve-out caused the opposite bug: a spec that genuinely uses "slug's log" got stripped to "slugs log" before -t, while the actual test in the file kept the apostrophe → bun's regex matched 0 tests → 5 phantom no-converge iterations on the v0.0.13 BASELINE. v0.0.14 narrows the strip set:

  • Apostrophes (ASCII ' and curly ‘’): preserved as literal characters on both sides of the comparison. Modern Claude reliably emits apostrophes in it() names; the strip caused false negatives.
  • Curly double-quotes (“”): still converted to ASCII " (helpful when authors paste from rich-text editors).
  • Backticks: still stripped (existing behavior).

A complementary safety net: when bun reports regex "<pattern>" matched 0 tests and exits non-zero, the satisfaction is classified as status: 'error' (NOT fail) with detail prefix harness/test-name-regex-no-match: <marker> in <file>; <existing tail>. The runtime treats error as a tooling-mismatch halting condition rather than re-running the implement phase trying to fix non-existent assertion failures. The detector mirrors the v0.0.13 coverage-trip shape; the coverage-trip path takes precedence when both signals appear.

JudgeClient interface. A single method judge(args) that takes { criterion, scenario, artifact, model, timeoutMs } and returns { pass, score, reasoning }. The runtime ships claudeCliJudgeClient (subprocess-based) in @wifo/factory-spec-review; you can implement your own (e.g., for a different LLM provider).

Status enum. Each scenario's satisfaction lines aggregate into one of pass, fail, error, skipped. runHarness aggregates per-scenario results into the report.

Worked example

import { runHarness } from '@wifo/factory-harness';
import { parseSpec } from '@wifo/factory-core';

const spec = parseSpec(await Bun.file('docs/specs/foo.md').text());

const report = await runHarness({
  spec,
  cwd: process.cwd(),
  noJudge: false,
  // optional: provide a custom judge client
  // judgeClient: myCustomJudgeClient,
});

console.log(report.summary); // { pass: 3, fail: 0, error: 0, skipped: 0 }
for (const scenario of report.scenarios) {
  console.log(scenario.id, scenario.status);
}

CLI:

$ pnpm exec factory-harness run docs/specs/foo.md --no-judge
spec=foo  scenarios=3
  S-1: pass
  S-2: pass
  S-3: pass
summary: 3 pass, 0 fail, 0 error, 0 skipped

See also

Status

Pre-alpha. APIs may break in point releases until v0.1.0.