goldenpipe
v0.2.0
Published
Golden Suite orchestrator — chains GoldenCheck, GoldenFlow, and GoldenMatch into one adaptive pipeline. TypeScript port of the goldenpipe Python library.
Downloads
276
Maintainers
Readme
goldenpipe
Golden Suite orchestrator for TypeScript — chains GoldenCheck → GoldenFlow → GoldenMatch into one adaptive, pluggable pipeline. TypeScript port of the goldenpipe Python library.
It composes the edge-safe cores of the three sibling packages:
goldencheck— data-quality scan (scanData)goldenflow— transforms / standardization (TransformEngine)goldenmatch— dedupe / entity resolution (dedupe)
Data flows through the pipeline as Row[] (arrays of plain objects).
Install
npm install goldenpipe
# the three siblings come along as dependenciesyaml is an optional peer dependency, needed only for YAML config loading:
npm install yamlQuick start
import { runDf } from "goldenpipe";
const rows = [
{ first_name: "John", last_name: "Smith", email: "[email protected]" },
{ first_name: "Jon", last_name: "Smith", email: "[email protected]" },
{ first_name: "Jane", last_name: "Doe", email: "[email protected]" },
];
// Zero-config: runs goldencheck.scan -> goldenflow.transform -> goldenmatch.dedupe
const result = await runDf(rows);
console.log(result.status); // "success"
console.log(result.inputRows); // 3
console.log(result.artifacts.golden); // golden (canonical) records
console.log(result.artifacts.unique); // distinct recordsAsync: the runner is async because GoldenMatch's
dedupeis async.runDf,runStages,Pipeline.run, and the noderun(source)all return promises.
From a CSV file (Node)
import { run } from "goldenpipe/node";
const result = await run("people.csv"); // zero-config
const result2 = await run("people.csv", { config: "pipeline.yml" });Custom pipeline config
import { runDf, makePipelineConfig, makeStageSpec } from "goldenpipe";
const config = makePipelineConfig({
pipeline: "check-and-dedupe",
stages: [
"goldencheck.scan",
makeStageSpec({ use: "goldenmatch.dedupe", config: { threshold: 0.9 } }),
// omit goldenflow.transform to skip transformation
],
});
const result = await runDf(rows, config);Programmatic stages
import { runStages, stage, StageStatus } from "goldenpipe";
const myStage = stage(
{ name: "tagger", produces: ["tag"], consumes: ["df"] },
(ctx) => {
ctx.artifacts.tag = (ctx.df ?? []).length;
return { status: StageStatus.SUCCESS };
},
);
const result = await runStages([myStage], rows);CLI
goldenpipe-js run people.csv [-c pipeline.yml] [-v] # run the chain on a CSV
goldenpipe-js stages # list registered stages
goldenpipe-js validate -c pipeline.yml # dry-run wiring validation
goldenpipe-js init [-d .] # scaffold a goldenpipe.yml
goldenpipe-js mcp-serve # run the MCP server (stdio)
goldenpipe-js agent-serve [-p 8250] # run the A2A agent server (HTTP)
goldenpipe-js serve [-p 8000] # run the REST API server (HTTP)Servers (MCP / A2A / REST)
GoldenPipe ships three server surfaces, each exposing the same 4 operations as
the Python sibling — list_stages, validate_pipeline, run_pipeline,
explain_pipeline:
- MCP (stdio, JSON-RPC 2.0):
goldenpipe-js mcp-serveor thegoldenpipe-mcpbin. - A2A (HTTP, port 8250):
goldenpipe-js agent-serve— agent card at/.well-known/agent.json, skill dispatch atPOST /tasks. - REST (HTTP, port 8000):
goldenpipe-js serve—GET /stages,POST /validate,POST /run.
Wire the MCP server into a client (e.g. Claude Desktop):
{ "mcpServers": { "goldenpipe": { "command": "goldenpipe-mcp" } } }Architecture
flowchart LR
L[load] --> C[goldencheck.scan]
C --> F[goldenflow.transform]
F --> M[goldenmatch.dedupe]| Stage | Wraps | Produces |
|-------|-------|----------|
| load | built-in | df |
| goldencheck.scan | scanData(TabularData) | findings, profile, column_contexts |
| goldenflow.transform | new TransformEngine(cfg).transformDf(rows) | df, manifest |
| goldenmatch.dedupe | await dedupe(rows, { config }) | clusters, golden, unique, dupes, match_stats, scored_pairs |
The engine layer mirrors the Python design:
- registry — a STATIC registry (
buildDefaultRegistry()) replacing Python's entry-point discovery. - resolver — builds an
ExecutionPlan, auto-prependsload, validatesconsumes/produceswiring. - router — applies a stage's
Decision(skip / insert / abort) to the remaining plan. - runner — async stage execution with per-stage error handling +
skipIfgating. - reporter — assembles the
PipeResult(status, stages, artifacts, errors, reasoning, timing).
A column-context pipeline carries semantic metadata across stages: GoldenCheck builds ColumnContexts (name-regex classification + IQR cardinality banding + identifier inference), GoldenFlow enriches them (date transforms confirm date type), and GoldenMatch consumes them to build a targeted dedupe config (buildConfigFromContexts) instead of re-profiling.
Decisions (adaptive routing)
severityGate, piiRouter, and rowCountGate are ported. They are not wired into the default chain — add them to a custom runner / stage that returns their Decision.
TS sibling skew: GoldenCheck-JS
Finding.severityis a numeric enum (INFO/WARNING/ERROR) with no"critical"level, and there is no"pii_detection"check. SoseverityGateandpiiRouterare effectively no-ops against current GoldenCheck-JS output — they exist for structural parity and so custom stages emitting those findings still route.
Deferred (not in this v1 port)
identity_resolvestage — GoldenMatch-JS Identity Graph wiring through the pipeline. The edge-safeInMemoryIdentityStoreexists ingoldenmatch, but the pipeline-drivenresolveClusterspopulation is not yet exposed.infer_schemastage — InferMap-based schema inference is not ported.- Textual TUI — the Python Textual TUI is not ported. (The MCP, A2A, and REST servers are ported — see above.)
Sibling version-skew artifacts
The TS siblings are version-skewed from the Python ones, so some artifacts the Python pipeline surfaces are shaped differently or absent here:
goldenartifact maps to GoldenMatch-JSDedupeResult.goldenRecords(the Python sibling exposes.golden).scored_pairsis GoldenMatch-JSresult.scoredPairs(camelCase).matchkey_usedis derived from the built config's first matchkey — the JSDedupeResultdoes not carry the resolved matchkey list back (the Python result does after auto-config).- The Python
goldencheck.scanadapter callsscan_file(path), so the in-memoryrun_dfpath fails that stage. GoldenCheck-JS'sscanDataoperates on rows, so the TS adapter's scan succeeds in both the in-memory (runDf) and file (run) paths.
Cross-language parity
tests/parity/pipe-parity.test.ts asserts skew-robust invariants (status, input_rows, ordered per-stage status/skip sequence, final golden/unique counts) against Python-generated goldens in tests/fixtures/pipe_parity.json. Regenerate the goldens with:
uv run --project packages/python/goldenpipe python \
packages/python/goldenpipe/scripts/emit_ts_parity_fixtures.pyLicense
MIT
