infermap
v0.4.0
Published
Schema-to-schema field mapping engine. TypeScript port of the infermap Python library.
Maintainers
Readme
infermap
Map messy source columns to a known target schema — accurately, explainably, with zero config.
npm install infermapinfermap is a schema-mapping engine: give it any two field collections (records, CSVs, database tables) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Built as a faithful TypeScript port of the Python infermap package, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.
- 🪶 Zero runtime dependencies in the core entrypoint
- ⚡ Edge-runtime compatible — Next.js Server Components, Route Handlers, Edge Functions
- 🧠 Six built-in scorers — exact name, alias, semantic-type regex, statistical profile, fuzzy name, LLM (pluggable)
- ⚖️ Optimal 1:1 assignment via vendored O(n³) Hungarian algorithm
- 🗄️ Optional Node DB providers — SQLite, Postgres, DuckDB
- 🛠️ CLI —
map,apply,inspect,validate - 📐 Strict TypeScript —
noUncheckedIndexedAccess,exactOptionalPropertyTypes, full.d.ts
Table of contents
- What it does
- Install
- Quick start
- Inputs
- Next.js usage
- Database sources
- Config
- Custom scorers
- CLI
- Parity with Python
- Exports
- Links
What it does
You have data with messy column names. You want it mapped to a clean canonical schema. Without infermap:
// 50 lines of brittle if/else, hardcoded synonyms, and regret
if (col === "fname" || col === "first_nm") canonical[i] = "first_name";
else if (col === "email_addr" || col === "e_mail" || col === "mail") canonical[i] = "email";
// ...With infermap:
import { map } from "infermap";
const result = map(
{ records: [{ fname: "John", lname: "Doe", email_addr: "[email protected]", tel: "555-0100" }] },
{ records: [{ first_name: "", last_name: "", email: "", phone: "" }] }
);
for (const m of result.mappings) {
console.log(`${m.source} → ${m.target} (${m.confidence.toFixed(2)})`);
}
// fname → first_name (0.44)
// lname → last_name (0.48)
// email_addr → email (0.69)
// tel → phone (0.39)Each mapping comes with a per-scorer confidence breakdown, so when something goes wrong you can see exactly which signal contributed.
Install
npm install infermap
# or
pnpm add infermap
# or
yarn add infermapRequires Node ≥ 20. The default entrypoint is edge-runtime compatible.
Quick start
import { map } from "infermap";
const crm = [
{ fname: "John", lname: "Doe", email_addr: "[email protected]", signup_dt: "2024-01-15" },
{ fname: "Jane", lname: "Smith", email_addr: "[email protected]", signup_dt: "2024-02-20" },
];
const canonical = [
{ first_name: "", last_name: "", email: "", created_at: "" },
];
const result = map({ records: crm }, { records: canonical });
console.log(result.mappings);
// [
// { source: "fname", target: "first_name", confidence: 0.44, breakdown: {...}, reasoning: "..." },
// { source: "lname", target: "last_name", confidence: 0.48, breakdown: {...}, reasoning: "..." },
// { source: "email_addr", target: "email", confidence: 0.69, breakdown: {...}, reasoning: "..." },
// { source: "signup_dt", target: "created_at", confidence: 0.41, breakdown: {...}, reasoning: "..." },
// ]Inputs
map() accepts any of these shapes for both source and target:
type MapInput =
| SchemaInfo // pre-extracted
| { records: Array<Record<string, unknown>>; sourceName? } // plain records
| { csvText: string; sourceName? } // CSV as string
| { jsonText: string; sourceName? } // JSON array as string
| { schemaDefinition: string | object; sourceName? }; // JSON schema fileNode users can read files directly:
import { extractSchemaFromFile } from "infermap/node";
import { MapEngine } from "infermap";
const src = await extractSchemaFromFile("./crm.csv");
const tgt = await extractSchemaFromFile("./canonical.json");
const result = new MapEngine().mapSchemas(src, tgt);Next.js usage
Works in any Next.js context — Server Components, Route Handlers, Server Actions, Edge Functions. The default entrypoint has zero Node built-ins, so the Edge Runtime works without any special config.
// app/api/infer/route.ts
import { map, mapResultToReport } from "infermap";
export const runtime = "edge"; // remove if you need Node APIs
export async function POST(req: Request) {
const { sourceCsv, targetCsv } = await req.json();
const result = map(
{ csvText: sourceCsv },
{ csvText: targetCsv }
);
return Response.json(mapResultToReport(result));
}For filesystem or database access, switch to Node runtime and import from infermap/node.
Database sources
Optional Node-only providers. Install only the driver you need:
npm install better-sqlite3 # for sqlite://
npm install pg # for postgresql://
npm install @duckdb/node-api # for duckdb://import { extractDbSchema } from "infermap/node";
const schema = await extractDbSchema(
"postgresql://user:pass@host/mydb",
{ table: "customers" }
);Config
Reweight scorers and extend the alias table via a JSON config object:
import { map } from "infermap";
const result = map(source, target, {
config: {
scorers: {
LLMScorer: { enabled: false },
FuzzyNameScorer: { weight: 0.3 },
},
aliases: {
order_id: ["order_num", "ord_no"],
customer_id: ["cust_id", "customer_number"],
},
},
});You can also persist a computed mapping and reload it:
import { mapResultToConfigJson, fromConfig } from "infermap";
import { writeFile, readFile } from "node:fs/promises";
await writeFile("mapping.json", mapResultToConfigJson(result));
// later:
const restored = fromConfig(await readFile("mapping.json", "utf8"));Custom scorers
import { MapEngine, defaultScorers, defineScorer, makeScorerResult } from "infermap";
const domainScorer = defineScorer(
"DomainMatcher",
(source, target) => {
// return null to abstain, or a ScorerResult in [0, 1]
if (source.name.startsWith("cust_") && target.name.startsWith("customer_")) {
return makeScorerResult(0.9, "shared customer prefix");
}
return null;
},
0.6 // weight
);
const engine = new MapEngine({
scorers: [...defaultScorers(), domainScorer],
});CLI
npx infermap map ./crm.csv ./canonical.csv
npx infermap inspect ./crm.csv
npx infermap map ./crm.csv ./canonical.csv --format json -o mapping.json
npx infermap apply ./crm.csv --config mapping.json --output renamed.csv
npx infermap validate ./crm.csv --config mapping.json --required email,id --strictThe CLI uses only node:util/parseArgs — no extra runtime deps.
Parity with Python
This package is a faithful port of infermap on PyPI. Mapping decisions, confidence scores, and unmapped lists are verified to agree with the Python engine to 4 decimal places via shared golden tests that run on every CI build.
If a Python scorer changes, the golden generator must be re-run and the TS parity tests must pass before anything merges. You can't accidentally ship drift. If you find a parity bug, please file an issue with both inputs and both outputs.
See the Python vs TypeScript wiki page for a feature parity matrix and migration guide.
Exports
| Path | Contents | Runtime |
|------|----------|---------|
| infermap / infermap/core | Types, engine, all 6 scorers, Hungarian assignment, in-memory / CSV / JSON / schema-file providers, JSON config loader, map() | edge-safe |
| infermap/node | Filesystem file reader, DB providers (SQLite / Postgres / DuckDB) | Node only |
Links
- 📦 npm package
- 📘 TypeScript API reference
- 🔄 Python vs TypeScript migration guide
- 🧪 Runnable examples
- 🐍 Python sister package
- 🐛 Issue tracker
- 💬 Discussions
