context-probe
v0.1.0
Published
CLI for evidence-based design measurement
Readme
AI-Assisted Design Measurement Platform
Japanese version: README.ja.md
This is a docs-first repository for measuring design quality with evidence, using AI-assisted extraction together with deterministic analyzers. The current implementation covers domain-design and architecture-design evaluation, and the documentation is split so future packs can extend the same measurement model without changing the reading model.
Japanese documents are the primary source of truth. English documents mirror the same structure and document roles.
Start Reading
- docs/README.md
- docs/guides/user-guide.md
- docs/guides/repo-apply-playbook.md
- docs/concepts/measurement-model.md
- docs/reference/domain-design-metrics.md
- docs/reference/architecture-design-metrics.md
- docs/implementation/runtime-and-commands.md
Documentation Map
- docs/README.md: documentation index
- docs/guides/user-guide.md: quickest path for first-time CLI use
- docs/guides/repo-apply-playbook.md: practical path for applying
context-probeto an existing repo - docs/concepts/: conceptual specifications and measurement model
- docs/reference/: how to interpret metrics and summary scores
- docs/implementation/: how the current CLI computes and reports the metrics
- docs/operations/: policy, CI, and collector operation guidance
- docs/roadmap/: phased rollout and experimental notes
First Commands To Learn
score.computereport.generategate.evaluatereview.list_unknowns
Core Principles
- Use AI as an evidence extractor and ambiguity reducer, not as the scorer.
- Compute scores through fixed formulas and deterministic analysis.
- Attach
evidence,confidence,unknowns, andprovenanceto every metric. - Prefer candidate comparison and time-series comparison over cross-organization ranking.
- Add new evaluation domains as packs on top of a shared measurement foundation.
Current Implementation
- TypeScript / Node CLI implementation is available.
- Phase 1 capabilities include dependency analysis, boundary-leak detection, evolutionary locality, score computation, reporting, and gate evaluation.
- Phase 2 entry points include external CLI extractors for
doc.extract_*, evidence-backed term links viatrace.*, and review log support viareview.resolve. - Pack boundaries for
domain_designandarchitecture_designare already present for future expansion.
Quick Start
Choose the shortest path for your goal:
- First-time CLI use: docs/guides/user-guide.md
- Apply
context-probeto an existing repo: docs/guides/repo-apply-playbook.md - Run and maintain this repository's self-measurement: docs/operations/self-measurement-runbook.md
npm install
npm run dev -- --helpIf you want to try the published package entry point:
npx context-probe --helpIf you want the compiled CLI as well:
npm run build
node dist/src/cli.js --helpScaffold Inputs First
If you do not have a reviewed --model or --constraints file yet, scaffold one first and inspect result.yaml.
npm run dev -- model.scaffold \
--repo . \
--docs-root docsnpm run dev -- constraints.scaffold \
--repo .constraints.scaffold also returns reviewable starter drafts for architecture inputs in result.drafts: scenarioCatalog, topologyModel, and boundaryMap. Use those drafts as the starting point for docs-first repos when you need architecture input files before the first scoring run.
Treat every scaffold output as a review-first draft, not as an authoritative input. If you are applying context-probe to a new repo, the stable path is:
- scaffold
modelandconstraints - curate the YAML you actually want to keep
- run a starter
score.compute - add observation snapshots where proxy-heavy metrics matter
- save an assessment note alongside the curated inputs
The full repo-application workflow is documented in docs/guides/repo-apply-playbook.md.
Measure Domain Design
npm run dev -- score.compute \
--repo . \
--model config/self-measurement/domain-model.yaml \
--policy fixtures/policies/default.yaml \
--domain domain_designAdd --docs-root docs when you want document-derived metrics included in the run.
For large repos, domain_design is an authoritative run: keep the full model and docs inputs, let it finish, and read the final status, result, unknowns, diagnostics, and provenance instead of switching to a reduced profile just to shorten the wall time.
When you run the CLI in a TTY, or set CONTEXT_PROBE_PROGRESS=1, the scorer emits stage progress lines to stderr while it works. That output is advisory; the final JSON response remains the source of truth.
Generate a Markdown Report
npm run dev -- report.generate \
--repo . \
--model config/self-measurement/domain-model.yaml \
--policy fixtures/policies/default.yaml \
--domain domain_design \
--format mdMeasure Architecture Design
npm run dev -- score.compute \
--repo . \
--constraints config/self-measurement/architecture-constraints.yaml \
--policy fixtures/policies/default.yaml \
--domain architecture_designFor architecture runs, --constraints is required instead of --model.
If you want self-measurement with fewer proxy fallbacks in QSF, TIS, OAS, and EES, pass the reviewed supporting inputs from config/self-measurement/ as well.
Refresh the measured and derived architecture snapshots before a self-measurement run:
npm run self:architecture:refreshCapture the intentional IPS contract baseline separately:
npm run self:architecture:baselineAudit freshness drift without rewriting snapshots:
npm run self:architecture:auditRun the CI-shaped local check that combines the advisory freshness audit with a score smoke:
npm run self:architecture:checkFor release-time validation and packaging, use docs/operations/release-preflight.md.
npm run dev -- score.compute \
--domain architecture_design \
--repo . \
--constraints config/self-measurement/architecture-constraints.yaml \
--complexity-export config/self-measurement/architecture-complexity-export.yaml \
--boundary-map config/self-measurement/architecture-boundary-map.yaml \
--contract-baseline config/self-measurement/architecture-contract-baseline.yaml \
--scenario-catalog config/self-measurement/architecture-scenarios.yaml \
--scenario-observations config/self-measurement/architecture-scenario-observations.yaml \
--topology-model config/self-measurement/architecture-topology.yaml \
--runtime-observations config/self-measurement/architecture-runtime-observations.yaml \
--telemetry-observations config/self-measurement/architecture-telemetry-observations.yaml \
--pattern-runtime-observations config/self-measurement/architecture-pattern-runtime-observations.yaml \
--delivery-observations config/self-measurement/architecture-delivery-observations.yaml \
--policy fixtures/policies/default.yamlThese are reviewable snapshots, not live collectors. scenario-observations comes from local benchmarks. telemetry, pattern runtime, delivery, and the raw architecture-complexity-snapshot.yaml remain curated observation inputs. complexity-export is a derived artifact built from that raw complexity snapshot. npm run self:architecture:refresh refreshes the measured scenario-observations and the derived boundary-map. npm run self:architecture:complexity regenerates architecture-complexity-export.yaml from the curated complexity snapshot. npm run self:architecture:baseline captures the current contract surface into a reviewable IPS baseline and intentionally stays outside refresh so baseline deltas remain meaningful. npm run self:architecture:audit is the CI-friendly advisory check for freshness drift, and npm run self:architecture:check is the local/CI operational check that runs that audit plus a score smoke.
The expected maintenance loop is:
npm run self:architecture:refresh
npm run self:architecture:complexity
npm run self:architecture:baseline # only when you intentionally want a new IPS comparison point
npm run self:architecture:checkCoverage is also part of the quality gate now:
npm run test:coverageThe operational sequence is summarized in docs/operations/self-measurement-runbook.md.
For this repository specifically, some architecture unknowns are still expected limitations of a small CLI codebase: ALR, FCC, SICR, and SLA remain evidence-limited, and PCS remains a proxy composite. Treat those as self-measurement caveats, not immediate defects.
Ingest Brownfield Evidence Through Source Config
npm run dev -- score.compute \
--domain architecture_design \
--repo fixtures/validation/scoring/qsf/repo \
--constraints fixtures/validation/scoring/qsf/constraints.yaml \
--policy fixtures/policies/default.yaml \
--contract-baseline-source fixtures/examples/architecture-sources/contract-baseline-source.file.yaml \
--scenario-catalog fixtures/validation/scoring/qsf/scenarios.yaml \
--scenario-observation-source fixtures/examples/architecture-sources/scenario-observation-source.command.yaml \
--telemetry-source fixtures/examples/architecture-sources/telemetry-source.command.yaml \
--telemetry-normalization-profile fixtures/validation/scoring/oas/raw-normalization-profile.yaml \
--complexity-source fixtures/examples/architecture-sources/complexity-source.command.yaml \
--profile layeredCollector and source-config details are documented in docs/operations/architecture-source-collectors.md.
List Unknowns That Need Human Review
npm run dev -- review.list_unknowns \
--repo . \
--model config/self-measurement/domain-model.yaml \
--policy fixtures/policies/default.yaml \
--domain domain_designAdvanced: Extract Glossary Terms With Codex CLI
npm run dev -- doc.extract_glossary \
--docs-root docs \
--extractor cli \
--provider codexAdvanced: Re-run Extraction With a Review Log
npm run dev -- doc.extract_glossary \
--docs-root docs \
--extractor cli \
--provider claude \
--review-log path/to/review-log.json \
--apply-review-logMeasure This Repository
Minimal self-measurement definitions are stored in config/self-measurement/domain-model.yaml, config/self-measurement/architecture-constraints.yaml, config/self-measurement/architecture-complexity-snapshot.yaml, and config/self-measurement/architecture-complexity-export.yaml.
1. Enable Git History
ELS reads Git history. In an uninitialized environment it will fall back to warnings and low confidence. If you have not initialized Git locally yet, run:
git init
git add .
git -c user.name="Context Probe" -c user.email="[email protected]" commit -m "chore: initialize context-probe"2. Compute Domain-Design Score
npm run dev -- score.compute \
--domain domain_design \
--repo . \
--model config/self-measurement/domain-model.yaml \
--policy fixtures/policies/default.yaml3. Compute Architecture-Design Score
npm run dev -- score.compute \
--domain architecture_design \
--repo . \
--constraints config/self-measurement/architecture-constraints.yaml \
--complexity-export config/self-measurement/architecture-complexity-export.yaml \
--boundary-map config/self-measurement/architecture-boundary-map.yaml \
--scenario-catalog config/self-measurement/architecture-scenarios.yaml \
--scenario-observations config/self-measurement/architecture-scenario-observations.yaml \
--topology-model config/self-measurement/architecture-topology.yaml \
--runtime-observations config/self-measurement/architecture-runtime-observations.yaml \
--telemetry-observations config/self-measurement/architecture-telemetry-observations.yaml \
--pattern-runtime-observations config/self-measurement/architecture-pattern-runtime-observations.yaml \
--delivery-observations config/self-measurement/architecture-delivery-observations.yaml \
--policy fixtures/policies/default.yamlWithout those supporting files, architecture self-measurement will still run, but many metrics fall back to neutral or proxy behavior.
4. Generate a Markdown Report
npm run dev -- report.generate \
--domain domain_design \
--repo . \
--model config/self-measurement/domain-model.yaml \
--policy fixtures/policies/default.yaml \
--format mdVerification
npm run check
npm testnpm test includes extraction-quality regression checks backed by the curated golden corpus under fixtures/validation/extraction/. Those checks exercise the existing CLI commands such as doc.extract_*, trace.link_terms, and review.list_unknowns with must_include, must_exclude, must_link_to_code, and max_review_items.
