npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@wifo/factory-spec-review

v0.0.14

Published

LLM-judged spec quality reviewer — runs subscription-paid `claude -p` judges against software-factory specs and emits findings in factory-spec-lint's output format

Downloads

1,273

Readme

@wifo/factory-spec-review

The LLM-judged spec quality reviewer. 8 claude -p-backed judges score specs for quality issues lint can't catch.

@wifo/factory-spec-review is the spec-side analog of the harness. Harness runs judge: lines against scenarios at runtime; the reviewer runs structured prompts against the spec itself before any agent token is spent. Output mirrors factory spec lint's shape (file:line severity code message), in the review/... namespace. Cache-backed by content-addressable spec hash + judge rule-set hash, so re-runs on unchanged specs cost zero claude spawns.

For AI agents: start at AGENTS.md (top-level). This README is detailed reference.

Install

pnpm add -D @wifo/factory-spec-review @wifo/factory-core

Pre-installed via factory init. Invoked via factory spec review (dispatched from @wifo/factory-core).

When to reach for it

  • Score spec quality before running. factory spec review docs/specs/<id>.md — runs all 8 enabled judges. Subscription-paid via claude -p.
  • Score a directory. factory spec review docs/specs/ — recurses; one finding stream per file.
  • Restrict to specific judges. --judges <a,b,c> runs only those (e.g., quick --judges dod-precision for a fast DoD sanity check).
  • Programmatically review. Import runReview({ spec, judgeClient, ... }) to get a typed ReviewFinding[] array.
  • Build a custom judge. Implement the JudgeDef interface, register it via your own loadJudgeRegistry wrapper, run it via runReview. The default-enabled list is configurable.

What's inside

CLI

factory spec review <path> [flags]                # dispatched from factory-core

| Flag | Default | Notes | |---|---|---| | --cache-dir <path> | .factory-spec-review-cache | Per-spec-bytes cache. Re-runs on unchanged specs are free. | | --no-cache | off | Disable cache (always run every judge). | | --judges <a,b,c> | all 8 | Comma-separated subset (e.g. internal-consistency,dod-precision). | | --claude-bin <path> | claude on PATH | Override (test injection). | | --technical-plan <path> | auto-resolved | Override path to paired technical-plan. | | --timeout-ms <n> | 60000 | Per-judge timeout. |

Auto-resolution of paired technical-plan: docs/specs/<id>.mddocs/technical-plans/<id>.md (and done/ subdirs).

Auto-loading of depends-on deps (v0.0.7+): when reviewing a spec with non-empty depends-on, the CLI walks <projectRoot>/docs/specs/<dep-id>.md and <projectRoot>/docs/specs/done/<dep-id>.md to load each dep's body, then threads it through to the judges that consume JudgePromptCtx.deps (currently cross-doc-consistency and internal-consistency).

Exit codes: 0 (clean or warnings only), 1 (errors found).

Public API (10 exports)

import { runReview, formatFindings, loadJudgeRegistry, claudeCliJudgeClient }
  from '@wifo/factory-spec-review';

import type {
  RunReviewOptions, ReviewFinding, ReviewCode, ReviewSeverity,
  JudgeDef, ClaudeCliJudgeClientOptions,
} from '@wifo/factory-spec-review';

The 8 judges (v0.0.10)

All ship at severity: 'warning' by default — even findings don't escalate exit codes. Promotion to 'error' happens per-judge in point releases, post-calibration.

| Code | Catches | Notes | |---|---|---| | review/internal-consistency | Constraints reference deps not declared; scenarios reference test files outside cwd. | v0.0.4. Dep-aware since v0.0.9 (loads depends-on deps' Constraints). | | review/judge-parity | Asymmetric satisfaction kinds across same-category scenarios. | v0.0.4. | | review/dod-precision | Vague DoD checks ("X validates Y" without operator). | v0.0.4. | | review/holdout-distinctness | Holdouts that overlap with visible scenarios (overfit risk). | v0.0.4. | | review/cross-doc-consistency | Spec ↔ technical-plan disagreement on names, defaults, deferral list. | v0.0.4. Dep-aware since v0.0.7. | | review/api-surface-drift | Public API names in spec Constraints don't appear in tech-plan §4 (or vice versa). | v0.0.10. Applies only when paired technical-plan is present. | | review/feasibility | Subtask LOC estimates that don't match file-path counts. | v0.0.10. Applies when Subtasks contain LOC numbers. | | review/scope-creep | Subtasks naming future-version work; missing anti-goals in DEEP specs. | v0.0.10. Always applies. |

Plus three meta-codes:

  • review/judge-failed — judge subprocess errored (severity: error). Pipeline continues with other judges.
  • review/section-missing — judge skipped because target section absent (severity: info).
  • review/dep-not-found — declared depends-on dep file missing during CLI dep-load (severity: warning).

Concepts

Cache. cacheKey = sha256(specBytes : ruleSetHash : sortedJudges). ruleSetHash covers each judge's static prompt content — editing a judge's CRITERION text invalidates correctly. The cache stores BOTH success and failure findings; if a judge errors due to flaky network, you must --no-cache after fixing.

JudgeDef shape. Each judge is { code, defaultSeverity, applies(spec, ctx), buildPrompt(spec, sliced, ctx) }. applies() decides whether the judge runs at all (gates on hasTechnicalPlan, hasDod, depsCount); buildPrompt() produces a { criterion, artifact } pair fed to the LLM via JudgeClient.judge.

Subscription auth path. The default claudeCliJudgeClient spawns claude -p --allowedTools '[]' --output-format json per judge. Strict-JSON-in-text parsing with regex-extract fallback for prefixed prose. No ANTHROPIC_API_KEY required — auth comes from the claude CLI's active subscription session.

Dep-aware judges. cross-doc-consistency (v0.0.7+) and internal-consistency (v0.0.9+) both consume JudgePromptCtx.deps — when scoring spec N with non-empty depends-on, they get each dep's body as available context. Closes the false-positive on /scope-project's shared-constraints-in-first-spec pattern.

Worked example

# Lint first (fast, free); review only on lint-clean specs
pnpm exec factory spec lint docs/specs/foo.md && \
  pnpm exec factory spec review docs/specs/foo.md

# Subset run for a quick sanity check
pnpm exec factory spec review docs/specs/foo.md \
  --judges internal-consistency,dod-precision

# Re-run with cache disabled (e.g., after editing a judge's CRITERION)
pnpm exec factory spec review docs/specs/foo.md --no-cache

Programmatic:

import { runReview, claudeCliJudgeClient } from '@wifo/factory-spec-review';
import { parseSpec } from '@wifo/factory-core';

const spec = parseSpec(await Bun.file('docs/specs/foo.md').text());
const judgeClient = claudeCliJudgeClient();

const findings = await runReview({
  specPath: 'docs/specs/foo.md',
  spec,
  judgeClient,
  cacheDir: './.factory-spec-review-cache',
});

for (const f of findings) {
  console.log(`${f.file}:${f.line ?? '?'}  ${f.severity}  ${f.code}  ${f.message}`);
}

See also

Status

Pre-alpha. The reviewer's exit-1 condition is dormant — all 8 judges ship at severity: 'warning'. APIs may break in point releases until v0.1.0.