goldenanalysis
v0.1.0
Published
Read-only cross-cutting analysis, metrics, and reporting for the Golden Suite. TypeScript port of the goldenanalysis Python library.
Maintainers
Readme
goldenanalysis (TypeScript)
Read-only cross-cutting analysis, metrics, and reporting for the Golden Suite — the
TypeScript port of the Python goldenanalysis package.
Phase 3a ships the generic frame path with cross-surface parity; Phase 3b adds the cross-run layer (
ReportHistory, regression detection, narrative,trend/regressionsCLI); Phase 3c adds the suite analyzers (match.rates/cluster.distribution/quality.rollup) + theanalyzeMatch/analyzePipelineentry points.
Quickstart
import { analyze, toMarkdown } from "goldenanalysis";
const rows = [
{ name: "Alice", email: "[email protected]", age: 30 },
{ name: "Alice", email: "[email protected]", age: 30 },
{ name: "Bob", email: null, age: 41 },
];
const report = analyze(rows, ["frame.summary"], { dataset: "customers" });
console.log(toMarkdown(report));
report.metrics; // row_count, column_count, null_ratio_mean, duplicate_row_ratio, memory_bytesCLI:
goldenanalysis-js report customers.json --format markdown # or a .csv
goldenanalysis-js report customers.csv --analyzers frame.summarySuite analyzers
Beyond the generic frame path, three analyzers read the artifacts other Golden Suite
stages produce. They consume the same snake_case artifact keys the Python sibling
reads (scored_pairs / match_stats / clusters / findings / manifest /
recall_certificate), so a serialized Python PipeResult.artifacts feeds the TS
analyzers identically.
| Analyzer | Consumes | Emits |
|---|---|---|
| match.rates | scored_pairs, match_stats (+ match_threshold, recall_certificate) | match.pair_count / match_rate / threshold / recall_estimate / recall_safe_bound / mean_pair_score + a score_histogram |
| cluster.distribution | clusters (+ match_stats) | cluster.count / record_count / singleton_ratio / size_p50 / size_p95 / size_max / reduction_ratio + a cluster_size_histogram |
| quality.rollup | findings, manifest (+ profile) | quality.findings_total / columns_with_findings / score + flow.rows_changed / rules_fired + a findings_by_class |
Two suite entry points assemble a report directly from a producer's result object (duck-typed — no goldenmatch/goldenpipe import):
import { analyzeMatch, analyzePipeline } from "goldenanalysis";
// A GoldenMatch DedupeResult-like object -> match.rates + cluster.distribution.
const report = analyzeMatch(dedupeResult, {
dataset: "customers",
certificate: { estimate: 0.94, safe_bound: 0.89 }, // optional recall cert
});
// A GoldenPipe PipeResult-like object -> fans out to every analyzer whose
// consumed artifacts are present in result.artifacts (frame.summary is skipped —
// a PipeResult exposes no `frame`).
const pipeReport = analyzePipeline(pipeResult);The match/flow/check/pipe artifact adapters (matchArtifacts / flowArtifacts /
checkArtifacts / pipeArtifacts) are also exported for building an AnalyzerInput
by hand. (The goldencheck load(df) adapter variant is deferred — TS has no
goldencheck dependency yet.)
Cross-run (trend + regressions)
ReportHistory is an append-only JSONL log of AnalysisReports, keyed by
(analysisName, dataset, runId) (last-wins). It powers trend series and
direction-aware regression detection across runs. The pure decision logic
(detectRegressions / buildTrend / buildNarrative + the models) is edge-safe in
goldenanalysis/core; the file-backed store needs node:fs and lives in
goldenanalysis/node.
import { ReportHistory } from "goldenanalysis/node";
const hist = new ReportHistory({ path: ".golden/analysis.jsonl" });
hist.append(report); // after each run
// A per-metric 2% gate on recall catches a drop a global 10% gate would miss.
const flagged = hist.detectRegressions("customers", {
baseline: "rolling_median", // or "previous" / "last_known_good" / a pinned runId
policy: { defaultPct: 10, perMetric: { "match.recall_safe_bound": 2 } },
});
const series = hist.trend("cluster.singleton_ratio", "customers", { lastN: 30 });Regression flags are direction-aware: a higher_better metric flags only on a
drop, lower_better only on a rise, neutral either way. rolling_median is immune
to one noisy night where previous would alternately flag and un-flag.
CLI:
# trend of one metric across the run history
goldenanalysis-js trend cluster.singleton_ratio --history .golden/analysis.jsonl --dataset customers
# detect regressions in the latest run vs history (exit 1 on any flag, for CI gating)
goldenanalysis-js regressions --history .golden/analysis.jsonl --dataset customers \
--policy "match.recall_safe_bound=2,*=10" --fail-on-regressionJSONL only. Node 20 has no stable built-in SQLite (
node:sqliteis experimental in 22+); the Python sibling's optional SQLite backend is a documented follow-up. Python's default backend is also JSONL, so the surface parity holds.
Cross-surface parity
The AnalysisReport / Metric / AnalysisTable wire types use snake_case (the
documented exception in packages/typescript/CLAUDE.md) so reports cross the JSON
wire between the Python and TypeScript surfaces without remapping.
tests/parity/frameSummary.parity.test.ts asserts the TS frame.summary report is
byte-identical to the Python-locked report_frame_summary.json on the
engine-independent metrics: frame.row_count, frame.column_count,
frame.null_ratio_mean, frame.duplicate_row_ratio, and the per_column columns
column / null_ratio / n_unique.
Out of the parity contract (engine-specific, emitted but not asserted):
frame.memory_bytes (the Python sibling uses polars estimated_size()) and the
per_column dtype column (polars dtype names). tests/fixtures/report_frame_summary.json
is a byte-identical copy of the Python fixture and must stay in sync.
GoldenCheck vs GoldenAnalysis
GoldenAnalysis is read-only and cross-cutting — it consumes any stage's outputs (including GoldenCheck's) and reports across them. It depends on other packages' types, never the reverse; it does not replace GoldenCheck's ingest-time profiling.
Develop
npm run build # tsup -> dist/ (ESM + CJS + .d.ts)
npm test # vitest
npm run typecheck # tsc --noEmitLicense
MIT.
