ushman-corpus-runner
v0.4.0
Published
Build, run, and score synthetic fixture corpora that calibrate the ushman pipeline. Sister package to ushman-corpus (the data).
Maintainers
Readme
ushman-corpus-runner
Build, run, and score synthetic fixture corpora that calibrate the ushman pipeline. This package is the runner;
ushman-corpusis the read-only fixture dataset.
Runtime
- Bun is required at runtime. The package uses
Bun.file,Bun.write, andBun.YAML. - The standalone CLI is useful for
init,build,report,validate,pin,prune, andxv-check. measureis intentionally headless. It only works when a consumer injects aPipelineRunner.
Install
bun add ushman-corpus-runnerThe npm bin entry (ushman-corpus) is a launcher inside this package; it is not a path to ushman-corpus the dataset. Fixture bytes live in the separate ushman-corpus git repo.
Corpus checkout
This package does not bundle fixtures. Resolve the dataset in one of these ways:
- Published default — if you omit
ushmanCorpusin the workspacepackage.json, commands fetchhttps://github.com/ragaeeb/ushman-corpus.gitat branchv0.5.0into~/.cache/ushman/corpus/. - Explicit init —
ushman-corpus init <workspace> --pin=v0.5.0writes a pin and populates the cache (add--corpus=<path>for a local clone). - Monorepo dev — with
../ushman-corpusbeside the workspace or this package, a sibling checkout is detected automatically. - Env override —
USHMAN_CORPUS_DIR=/absolute/path/to/ushman-corpusfor catalogue validation and default pin resolution.
Quick Start
ushman-corpus init ./calibration-workspace
ushman-corpus build ./calibration-workspace --fixtures=f001-syntheticLocal clone instead of the published git default:
ushman-corpus init <workspace> --corpus=/absolute/path/to/ushman-corpus --pin=local-2026-05-08
ushman-corpus build <workspace> --fixtures=f001-synthetic
ushman-corpus report <workspace> --format=md
ushman-corpus validate --catalogue=/absolute/path/to/ushman-corpus --retros-dir=/absolute/path/to/ushman/reviews/retros --format=md --strict
ushman-corpus xv-check <workspace>
ushman-corpus prune --keep-latest=2CLI
ushman-corpus init <workspace> [--corpus=<path>] [--pin=<commit>] [--fetch=git|tarball] [--force]
ushman-corpus build <workspace> [--fixture=<id>] [--fixtures=<id,id>] [--tier=<T0,T1>] [--parallel=<n>]
ushman-corpus measure <workspace> [--fixture=<id>] [--fixtures=<id,id>] [--pipeline=<stage>] [--mode=jaccard|tree-edit] [--tier=<T0,T1>] [--parallel=<n>] [--no-stability-check] [--include-held-out|--xv-allow] [--xv-justification=<reason>]
ushman-corpus report <workspace> [--format=md|html|json] [--baseline=<path>] [--xv-allow] [--xv-justification=<reason>]
ushman-corpus validate [--catalogue=<path>] [--retros-dir=<path>] [--format=md|json|both] [--evidence-threshold=<n>] [--baseline=<path>] [--strict]
ushman-corpus pin <workspace> --commit=<sha-or-label> [--fetch=git|tarball] [--repo=<path>]
ushman-corpus xv-check <workspace>
ushman-corpus prune [--keep-latest=<n>] [--keep-tags=<tag,tag>]--fixture and --fixtures are both accepted for compatibility with the extraction docs. --format is the canonical report flag; --out is still accepted as an alias.
build and measure both accept --parallel=<n>. Measurement work fans out across fixtures, but the returned and printed summaries stay in fixture selection order. --parallel must be an integer between 1 and the runtime cap derived from local CPU parallelism.
Catalogue Validation
validatecompares the published v4 catalogue artifacts against every retro inreviews/retros/*.md.--catalogueaccepts a single artifact file, a family directory such assdk-catalogue/, or theushman-corpusrepo root. The latest versioned artifact in each family is resolved automatically.- When validating
sdk-cataloguein isolation, the runner still resolves siblingruntime-fingerprintrules for runtime-gap suppression. If no sibling runtime catalogue can be found, validation fails fast instead of silently reporting inaccurate SDK gaps. --retros-dirshould point at theretros/directory itself. The validator reads*.mdfiles and reports missing## 3/## 5sections as warnings.runtime-fingerprintrules can declareevidenceRetrosper rule. The validator counts those citations, reports stale evidence, and treats runtime-family evidence gaps the same way it treatssdk-catalogueandvendor-globals.--format=bothis the default. It prints a markdown report followed by a JSON payload. Use--format=jsonfor CI or other machine readers.--evidence-thresholddefaults to2. Entries below that distinct-retro count are reported as low evidence, but the command still exits0as long as the inputs are readable and schema-valid.--strictreturns a non-zero exit code whenever the report status is notpass.--baselinecompares the current JSON report against a previous JSON report and returns non-zero if counts regress or matched coverage drops.- Default path resolution is: explicit flag, then
USHMAN_CORPUS_DIR/USHMAN_RETROS_DIR(orUSHMAN_REPO_ROOT/reviews/retrosfor retros), then the local sibling checkout layout if it exists.
Held-Out Discipline
- Held-out fixtures are never scored implicitly.
- To score or report on held-out fixtures, pass both
--xv-allowand--xv-justification=<reason>. - Measurement runs persist that justification in
measurements/<fixture>/xv-override.jsonand append an audit line tomeasurements/.xv-allow.log. - Under parallel measurement runs, the shared
.xv-allow.logis appended after worker completion and preserves fixture selection order for the recorded entries. xv-checkvalidates registry/manifests and scans workspace sources, docs, prompts, and briefs for held-out references.
PipelineRunner
measure requires an injected PipelineRunner:
type PipelineRunner = {
readonly run: (input: {
fixtureRoot: string
outputRoot: string
timeoutMs?: number
}) => Promise<{
cleanedWorkspaceRoot: string
elapsedMs: number
perStageWallClockMs?: Partial<Record<PipelineStage, number>>
}>
}The runner must write stage outputs beneath:
<outputRoot>/stages/01-seed
<outputRoot>/stages/02-vendor-extract
<outputRoot>/stages/03-cleanFixtures that declare vendorExpectations are also scored against:
<outputRoot>/stages/02-vendor-extract/vendor-boundaries.jsonIf that artifact is missing, malformed, or omits an expected vendor, the measurement is recorded as a scoring failure.
When measure --parallel is greater than 1, the injected runner must tolerate concurrent run() calls for different fixtures.
Execution Results
buildFixture,runFixture,runCorpusMeasurement, andrunCatalogueValidationall return discriminated unions with stableissues[].codevalues plus human-readableissues[].messagetext.runCorpusMeasurementreturnsmeasurement-run-abortedonly for setup/preflight problems. A completed run returnsmeasurement-run-completeand then reports per-fixture outcomes underoutcomes.- Score failures that still write
measurements/<fixture>/calibration-result.jsonare represented only infailures, with the saved artifact exposed onpersistedResult. They do not also appear inmeasurements. runFixturedistinguishes setup problems (run-setup-failed) from pipeline execution crashes (run-pipeline-failed).runCorpusMeasurement(...).okmeans "no fixture failures occurred." Skipped fixtures are tracked separately underskipped.
Primary issue-code families:
- Build:
build-command-failed,build-io-error,build-lockfile-missing,build-stability-hash-mismatch - Run:
run-fixture-locked,run-setup-failed,run-pipeline-failed - Measure:
measure-held-out-requires-override,measure-pipeline-runner-missing,measure-setup-failed - Score:
score-unexpected-error,score-vendor-missing,score-vendor-artifact-invalid,score-vendor-import-plan-missing - Validate:
validate-input-invalid,validate-runtime-rules-missing,validate-baseline-regression,validate-strict-status,validate-report-build-failed
Public API
runCorpusMeasurement({
corpusRoot,
corpusCommit,
stages,
pipelineRunner,
fixtureIds,
noStabilityCheck,
parallel,
tiers,
xvAllowJustification
}): Promise<RunCorpusMeasurementResult>
runCatalogueValidation({
cataloguePath,
retrosDir,
baselineReport,
evidenceThreshold,
strict
}): Promise<CatalogueValidationResult>
scoreFixture(opts): Promise<ScoreFixtureResult>
loadCorpusManifest(corpusRoot): Promise<CorpusManifest>
loadFixtureManifest(fixtureDir): Promise<FixtureManifest>
listFixtures(corpusRoot, filters?): Promise<readonly FixtureManifest[]>
buildCalibrationAggregate(opts): CalibrationAggregate
renderCalibrationAggregate(aggregate, format): string
validateCatalogues(opts): Promise<CatalogueValidationReport>
findCatalogueValidationRegressions(current, baseline): CatalogueValidationRegression[]
renderCatalogueValidationReport(report, format): string
PIPELINE_STAGES
PIPELINE_STAGE_OUTPUT_DIRECTORIESrunCorpusMeasurement() defaults noStabilityCheck to false and validates parallel against the same runtime cap as the CLI.
Scorers
Identifier Match
- Input: reference source tree and recovered source tree, optionally with a precomputed file-structure mapping.
- Output:
0..1, plus per-file alignment counts and type-only exclusions. - Known failure modes: parse failures degrade to empty token streams; anonymous default exports align approximately.
File Structure Match
- Input: reference source tree and recovered source tree.
- Output:
0..1, plus matched file mapping and unmapped file lists. - Known failure modes: split/merged files are matched heuristically with a fixed
0.5similarity threshold.
Semantic Distance
- Input: reference source tree and recovered source tree, optionally with a precomputed file-structure result.
- Output:
0..1similarity score injaccardortree-editmode. - Known failure modes: syntax errors degrade to empty ASTs;
tree-editis an AST-sequence approximation rather than a full tree-edit algorithm.
Vendor Expectations
- Input: fixture-manifest
vendorExpectationsplusstages/02-vendor-extract/vendor-boundaries.json. - Output: pass/fail validation metadata listing expected vendors, detected vendors, and any missing vendor boundaries.
- Known failure modes: missing or malformed
vendor-boundaries.jsonfails every declared expectation; the check is skipped when stagestage-02-vendor-extractis not part of the run.
Aggregate Report
- Input:
CalibrationResultfixtures plus an optional baseline aggregate. - Output: per-tier percentiles, pass counts, regressions, improvements, and any vendor-expectation failures captured in the fixture results.
- Known failure modes:
passeduses a report-only threshold of0.8across all three latest-stage scores.
Where This Fits
| | |
|---|---|
| Sister to | ushman-corpus |
| Headless by design | The orchestrator owns the actual capture session and implements PipelineRunner |
| Scope | Fixture build orchestration, pipeline measurement, scoring, and aggregate reporting |
