npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

goldenmatch

v0.13.0

Published

Entity resolution toolkit — deduplicate, match, and create golden records

Readme

GoldenMatch (TypeScript)

Entity resolution toolkit for Node.js and edge runtimes. Deduplicate, match, and create golden records — in TypeScript.

npm install goldenmatch

npm Node License: MIT Tests


Why this port?

  • Edge-safe core — the matching engine runs in browsers, Workers, Vercel Edge Runtime, Deno
  • Pure TypeScript — no native dependencies required; peer deps unlock performance (hnswlib, ONNX, piscina)
  • Feature parity with Python goldenmatch — same scorers, same clustering, same YAML configs
  • 590 tests, strict TypeScriptnoUncheckedIndexedAccess, exactOptionalPropertyTypes

Quick Start

import { dedupe } from "goldenmatch";

const rows = [
  { id: 1, name: "John Smith", email: "[email protected]", zip: "12345" },
  { id: 2, name: "Jon Smith",  email: "[email protected]", zip: "12345" },
  { id: 3, name: "Jane Doe",   email: "[email protected]", zip: "54321" },
];

const result = dedupe(rows, {
  fuzzy: { name: 0.85 },
  blocking: ["zip"],
  threshold: 0.85,
});

console.log(result.stats);
// { totalRecords: 3, totalClusters: 2, matchRate: 0.67, ... }

for (const record of result.goldenRecords) {
  console.log(record);
}

Auto-Config Verification (v0.3)

Auto-generated configs are now checked both before the pipeline runs and after scoring finishes, so you get actionable diagnostics instead of silent failures on edge-case data.

Preflight — six static checks

When you call autoConfigureRows(rows), the returned config ships with a _preflightReport summarising six config-time checks:

  1. missing_column — matchkey/blocking references a column not in the data
  2. cardinality_high — a column is near-unique (poor blocking signal)
  3. cardinality_low — a column has too few distinct values to discriminate
  4. block_size — a blocking key would produce oversized blocks
  5. remote_asset — a scorer requires a model download (gated offline)
  6. weight_confidence — a weighted matchkey's weights look unbalanced

Many findings trigger auto-repairs (field dropped, scorer swapped, weight clamped). hasErrors === true on unrepairable errors raises ConfigValidationError with the full report attached.

import { autoConfigureRows, ConfigValidationError } from "goldenmatch";

const cfg = autoConfigureRows(rows);
for (const f of cfg._preflightReport!.findings) {
  console.log(`[${f.severity}] ${f.check}/${f.subject}: ${f.message}`);
}

Defaults are offline-safe: remote-asset scorers (cross-encoder, remote embeddings) are dropped unless you opt in with allowRemoteAssets: true.

Postflight — four runtime signals

Inside dedupe() / match(), after scoring but before clustering, the pipeline computes four signals attached as result.postflightReport:

  1. scoreHistogram — 100-bin pair-score distribution
  2. blockSizePercentiles + preliminaryClusterSizes — p50/p95/p99/max
  3. thresholdOverlapPct — fraction of pairs near the current threshold
  4. oversizedClusters — components above size limit, with bottleneck pair

If the score distribution is clearly bimodal, postflight proposes a threshold adjustment. In strict mode (autoConfigureRows(rows, { strict: true }) or manual _strictAutoconfig: true) the signals are still emitted but the threshold is never touched — use this for reproducible CI pipelines.

See examples/verificationInspection.ts and examples/strictModeParity.ts for runnable demos.

Three entrypoints

import { dedupe, match, scoreStrings } from "goldenmatch";         // edge-safe core
import { readFile, writeCsv } from "goldenmatch/node";              // Node-only file I/O
// CLI: `npx goldenmatch-js dedupe data.csv --output golden.csv`

Feature matrix

Scoring algorithms

  • Exact, Jaro-Winkler, Levenshtein, Token-Sort, Soundex, Dice, Jaccard, Ensemble
  • Probabilistic (Fellegi-Sunter with Splink-style EM)
  • LLM scorer (OpenAI/Anthropic via fetch — edge-safe)
  • Cross-encoder reranking (via @huggingface/transformers)

Blocking strategies

  • Static, multi-pass, sorted-neighborhood, adaptive
  • ANN (approximate nearest neighbor via hnswlib-node peer dep or brute-force)
  • Canopy (TF-IDF)
  • Learned (data-driven predicate selection)

Golden record strategies

  • most_complete, majority_vote, source_priority, most_recent, first_non_null
  • Full provenance tracking

Pipeline features

  • PPRL (privacy-preserving record linkage, 3 security levels with HMAC-SHA256)
  • Graph ER (multi-table entity resolution with evidence propagation)
  • Sensitivity analysis (parameter sweep with CCMS/TWI)
  • Streaming (incremental single-record matching)
  • Memory (persistent corrections + threshold learning)
  • Review queue (human-in-the-loop)

Optional peer deps

Zero-dep install works. These unlock advanced paths:

| Peer dep | What it enables | |---|---| | yaml | YAML config file loading | | hnswlib-node | True sub-linear ANN blocking (vs brute-force) | | @huggingface/transformers | ONNX cross-encoder reranking (MiniLM) | | piscina | Worker-thread parallel block scoring | | ink + react | Interactive terminal UI | | ink-table, ink-select-input, ink-text-input, ink-spinner, ink-gradient | Richer TUI widgets | | pg | Postgres connector + sync | | @duckdb/node-api | DuckDB connector | | snowflake-sdk, @google-cloud/bigquery, @databricks/sql | Cloud warehouse connectors |

Servers

# MCP server (for Claude Desktop / Code)
npx goldenmatch-js mcp-serve

# REST API
npx goldenmatch-js serve --port 8000

# A2A agent server
npx goldenmatch-js agent-serve --port 8200

# Interactive TUI
npx goldenmatch-js tui data.csv

CLI commands

goldenmatch-js dedupe <files...>    Deduplicate records
goldenmatch-js match <target> <ref> Match target against reference
goldenmatch-js score <a> <b>        Score similarity between two strings
goldenmatch-js info                 Show scorers, strategies, transforms
goldenmatch-js profile <file>       Profile a dataset
goldenmatch-js demo                 Run a quick demo on synthetic data
goldenmatch-js mcp-serve            Start MCP server (stdio)
goldenmatch-js serve                Start REST API
goldenmatch-js agent-serve          Start A2A agent
goldenmatch-js tui                  Interactive terminal UI

Examples

See examples/ for 10+ full examples covering basic dedupe, CSV pipelines, probabilistic matching (Fellegi-Sunter), PPRL, streaming, LLM scoring, explanations, and evaluation.

Documentation

Full docs: https://benseverndev-oss.github.io/goldenmatch/typescript

License

MIT. See LICENSE.