npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

entity-resolve

v0.2.0

Published

Deduplicate and merge entity mentions across documents

Readme

entity-resolve

Deduplicate and merge entity mentions across documents using multi-method similarity scoring.

npm version npm downloads license node


Description

entity-resolve is a zero-dependency entity resolution library for TypeScript and JavaScript. It takes a set of entity mentions -- names with types, aliases, and metadata extracted from one or more documents -- identifies which mentions refer to the same real-world entity, and merges them into canonical entities with consolidated aliases, properties, and provenance.

The resolution pipeline consists of six stages: name normalization, blocking/candidate generation, pairwise multi-method similarity scoring, confidence-based match classification, transitive closure, and entity merging. All similarity algorithms (Jaro-Winkler, Levenshtein, Dice coefficient, Soundex, Metaphone, exact match, abbreviation detection) are implemented in pure TypeScript with no runtime dependencies.

Use cases:

  • Knowledge graph construction -- deduplicate entities extracted from multiple documents before building graph nodes
  • CRM data cleanup -- identify and merge duplicate company or contact records
  • Cross-document coreference -- consolidate "Walt Disney", "Disney", and "The Walt Disney Company" into a single entity
  • Agent memory deduplication -- prevent AI agents from storing redundant entity records across conversations
  • Document processing pipelines -- match incoming entity mentions against a canonical registry

Installation

npm install entity-resolve

Requires Node.js 18 or later. Zero runtime dependencies.


Quick Start

import { resolve } from 'entity-resolve';

const mentions = [
  { name: 'IBM', type: 'organization' },
  { name: 'International Business Machines', type: 'organization' },
  { name: 'Google', type: 'organization' },
  { name: 'Dr. Jane Smith', type: 'person' },
  { name: 'Jane Smith', type: 'person' },
];

const result = resolve(mentions);

console.log(result.entities.length);
// 3 -- IBM + International Business Machines merged, Jane Smith merged, Google standalone

console.log(result.stats);
// { totalMentions: 5, canonicalEntities: 3, mentionsMerged: 2, ... }

console.log(result.mergeMap);
// { "IBM": "IBM", "International Business Machines": "IBM", ... }

Features

  • Multi-method similarity scoring -- Combines six string similarity algorithms plus abbreviation detection into a single weighted score per entity pair.
  • Configurable thresholds -- Separate thresholds for auto-merge (high confidence) and review (medium confidence) classifications.
  • Blocking strategies -- Reduce O(n^2) comparisons using prefix, phonetic, or type-based blocking.
  • Type-aware resolution -- Entities of incompatible types are never merged, even if names are identical. Supports type hierarchies.
  • Name normalization -- Strips honorifics (Dr., Mr., Prof.), suffixes (Jr, Sr, Inc, Corp, LLC), collapses whitespace, and applies Unicode NFC normalization before comparison.
  • Alias matching -- Each entity can carry aliases that participate in pairwise comparison, enabling matches like "USA" against "United States".
  • Abbreviation detection -- Automatically detects acronyms ("IBM" matches "International Business Machines").
  • Transitive closure -- If A matches B and B matches C, all three are merged into a single canonical entity.
  • Configurable merge strategies -- Control canonical name selection (longest, most frequent, first seen) and property merging (union, first-wins).
  • Stateful resolver -- The createResolver factory returns a persistent resolver that supports incremental addEntity() calls alongside batch resolution.
  • Zero runtime dependencies -- All algorithms are self-contained TypeScript. No native modules, no external services.
  • Full TypeScript support -- Ships with declaration files and source maps.

API Reference

resolve(entities, options?)

Perform batch entity resolution on an array of entity mentions.

function resolve(
  entities: EntityMention[],
  options?: ResolverOptions
): ResolutionResult;

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | entities | EntityMention[] | Array of entity mentions to resolve | | options | ResolverOptions | Optional configuration (see Configuration) |

Returns: ResolutionResult

const result = resolve([
  { name: 'Barack Obama', type: 'person' },
  { name: 'Barack Obama', type: 'person' },
  { name: 'Google', type: 'organization' },
]);

result.entities;       // CanonicalEntity[] -- deduplicated entities
result.matches;        // MatchPair[] -- all evaluated pairs with scores
result.unresolved;     // EntityMention[] -- mentions that could not be resolved
result.mergeMap;       // Record<string, string> -- mention name -> canonical name
result.stats;          // ResolutionStats -- timing and counts

similarity(a, b, options?)

Compute the composite similarity score between two entity mentions without performing full resolution.

function similarity(
  a: EntityMention,
  b: EntityMention,
  options?: ResolverOptions
): SimilarityResult;

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | a | EntityMention | First entity mention | | b | EntityMention | Second entity mention | | options | ResolverOptions | Optional configuration |

Returns: SimilarityResult

const result = similarity(
  { name: 'Amazon', type: 'organization' },
  { name: 'Amazone', type: 'organization' }
);

result.score;            // number between 0.0 and 1.0
result.methodScores;     // { jaroWinkler: 0.96, levenshtein: 0.83, ... }
result.typesCompatible;  // true

If the two entities have incompatible types, typesCompatible is false and score is 0.


createResolver(config?)

Create a stateful resolver instance with pre-configured options. Supports both batch resolution and incremental entity addition.

function createResolver(config?: ResolverOptions): EntityResolver;

Returns: EntityResolver

const resolver = createResolver({ autoMergeThreshold: 0.85 });

// Incremental addition
const r1 = resolver.addEntity({ name: 'Tesla', type: 'organization' });
// { action: 'added', canonicalEntity: {...}, similarity: 0 }

const r2 = resolver.addEntity({ name: 'Tesla Inc', type: 'organization' });
// { action: 'merged', canonicalEntity: {...}, similarity: 0.92 }

// Query state
resolver.size;                    // 1
resolver.getEntities();           // CanonicalEntity[]
resolver.getEntity('Tesla');      // CanonicalEntity | undefined

// Pairwise similarity
resolver.similarity(entityA, entityB);  // SimilarityResult

// Batch resolution (resets internal state)
const result = resolver.resolve(mentions);

// Reset
resolver.reset();

Types

EntityMention

Input type representing a single entity reference from a document or data source.

interface EntityMention {
  name: string;                          // Entity name as it appears in the source
  type: string;                          // Entity type (e.g., "person", "organization", "location")
  aliases?: string[];                    // Known alternative names
  properties?: Record<string, unknown>;  // Arbitrary metadata
  source?: string;                       // Provenance identifier
}

CanonicalEntity

Output type representing a resolved, deduplicated entity produced by merging one or more mentions.

interface CanonicalEntity {
  name: string;                          // Canonical name (selected by nameStrategy)
  type: string;                          // Entity type
  aliases: string[];                     // All merged surface forms
  properties: Record<string, unknown>;   // Merged metadata
  mentions: EntityMention[];             // Original mentions that were merged
  mentionCount: number;                  // Total number of merged mentions
}

MatchPair

Describes an evaluated pair of entities with their similarity score and classification.

interface MatchPair {
  entityA: string;                          // Name of first entity
  entityB: string;                          // Name of second entity
  similarity: number;                       // Composite similarity score (0.0 to 1.0)
  classification: 'same' | 'possible' | 'different';  // Match verdict
  methodScores: Record<string, number>;     // Per-method scores
}

Classifications:

  • 'same' -- score >= autoMergeThreshold (default 0.90). Entities are automatically merged.
  • 'possible' -- score >= reviewThreshold (default 0.70) but below auto-merge. Flagged for review.
  • 'different' -- score below review threshold or types are incompatible.

ResolutionResult

The complete output of a resolve() call.

interface ResolutionResult {
  entities: CanonicalEntity[];           // Deduplicated canonical entities
  matches: MatchPair[];                  // All evaluated pairs
  unresolved: EntityMention[];           // Mentions that could not be resolved
  mergeMap: Record<string, string>;      // mention name -> canonical entity name
  stats: ResolutionStats;               // Performance and summary statistics
}

ResolutionStats

Summary statistics from a resolution run.

interface ResolutionStats {
  totalMentions: number;     // Number of input mentions
  canonicalEntities: number; // Number of output canonical entities
  mentionsMerged: number;    // Number of mentions merged into existing entities
  candidatePairs: number;    // Number of candidate pairs evaluated
  sameCount: number;         // Number of pairs classified as 'same'
  possibleCount: number;     // Number of pairs classified as 'possible'
  durationMs: number;        // Wall-clock time in milliseconds
}

SimilarityResult

Output of a pairwise similarity computation.

interface SimilarityResult {
  score: number;                         // Weighted composite score (0.0 to 1.0)
  methodScores: Record<string, number>;  // Individual method scores
  typesCompatible: boolean;              // Whether entity types are compatible
}

ResolverOptions

Configuration object accepted by resolve(), similarity(), and createResolver().

interface ResolverOptions {
  autoMergeThreshold?: number;   // Minimum score for automatic merge (default: 0.90)
  reviewThreshold?: number;      // Minimum score for 'possible' classification (default: 0.70)
  methods?: {
    exactMatch?: { weight?: number };     // default weight: 2.0
    jaroWinkler?: { weight?: number };    // default weight: 1.5
    levenshtein?: { weight?: number };    // default weight: 1.0
    dice?: { weight?: number };           // default weight: 1.0
    soundex?: { weight?: number };        // default weight: 0.5
    metaphone?: { weight?: number };      // default weight: 0.5
  };
  blocking?: ('prefix' | 'phonetic' | 'type')[];  // Blocking strategies
  nameStrategy?: 'longest' | 'mostFrequent' | 'firstSeen';  // Canonical name selection (default: 'firstSeen')
  propertyMerge?: 'union' | 'firstWins';  // Property merge strategy (default: 'union')
  typeHierarchy?: Record<string, string>; // Type parent mapping (e.g., { company: 'organization' })
}

EntityResolver

Interface returned by createResolver().

interface EntityResolver {
  resolve(entities: EntityMention[]): ResolutionResult;
  addEntity(entity: EntityMention): {
    action: 'merged' | 'added';
    canonicalEntity: CanonicalEntity;
    similarity: number;
  };
  similarity(a: EntityMention, b: EntityMention): SimilarityResult;
  getEntities(): CanonicalEntity[];
  getEntity(name: string): CanonicalEntity | undefined;
  readonly size: number;
  reset(): void;
}

Configuration

Thresholds

Control how match pairs are classified:

resolve(entities, {
  autoMergeThreshold: 0.85,  // Lower threshold = more aggressive merging
  reviewThreshold: 0.60,     // Lower threshold = fewer 'different' classifications
});

Method Weights

Adjust the contribution of each similarity method to the composite score. Set weight to 0 to effectively disable a method:

resolve(entities, {
  methods: {
    exactMatch: { weight: 3.0 },     // Heavily favor exact matches
    jaroWinkler: { weight: 1.5 },
    levenshtein: { weight: 1.0 },
    dice: { weight: 1.0 },
    soundex: { weight: 0.0 },        // Disable soundex
    metaphone: { weight: 0.0 },      // Disable metaphone
  },
});

Default weights:

| Method | Default Weight | |--------|---------------| | exactMatch | 2.0 | | jaroWinkler | 1.5 | | levenshtein | 1.0 | | dice | 1.0 | | soundex | 0.5 | | metaphone | 0.5 |

Blocking Strategies

Blocking reduces the number of pairwise comparisons from O(n^2) by grouping entities into blocks and only comparing within blocks. When no blocking strategies are specified, all pairs are compared.

resolve(entities, {
  blocking: ['prefix', 'type'],
});

| Strategy | Grouping Key | Description | |----------|-------------|-------------| | 'prefix' | First 3 characters of normalized name | Groups entities sharing a name prefix | | 'phonetic' | Soundex code of first word | Groups entities that sound alike | | 'type' | Entity type string | Only compares entities of the same type |

Multiple strategies can be combined. A pair is evaluated if it appears in any block (union of all strategies).

Type Hierarchy

Define parent-child relationships between entity types. By default, entities of different types are never merged:

resolve(entities, {
  typeHierarchy: {
    company: 'organization',
    startup: 'company',
    university: 'organization',
  },
});
// Now "company" and "organization" types are considered compatible

Merge Strategies

Control how canonical entities are constructed when mentions are merged:

resolve(entities, {
  nameStrategy: 'longest',      // Pick the longest name as canonical
  propertyMerge: 'firstWins',   // Earlier entity's properties take precedence
});

Name strategies:

| Strategy | Behavior | |----------|----------| | 'firstSeen' | Keep the name of the first mention encountered (default) | | 'longest' | Pick the longest name among all merged mentions | | 'mostFrequent' | Pick the name that appears most often across mentions |

Property merge strategies:

| Strategy | Behavior | |----------|----------| | 'union' | Merge all properties from all mentions; canonical entity's values win on key conflict (default) | | 'firstWins' | Only keep the canonical (first-seen) entity's properties; new properties from later mentions are ignored |


Error Handling

entity-resolve is designed to handle edge cases gracefully without throwing:

  • Empty input -- resolve([]) returns a valid ResolutionResult with empty arrays and zero-valued stats.
  • Single entity -- resolve([entity]) returns one canonical entity with no merges.
  • Incompatible types -- Entities with different types (and no type hierarchy mapping) receive a similarity score of 0 and are classified as 'different'.
  • Empty names -- Similarity algorithms return 0.0 for empty string inputs.
  • Missing optional fields -- All optional fields on EntityMention (aliases, properties, source) default to safe empty values internally.

Advanced Usage

Incremental Resolution

Use createResolver() to build a canonical entity set incrementally as new mentions arrive:

import { createResolver } from 'entity-resolve';

const resolver = createResolver({
  autoMergeThreshold: 0.90,
  nameStrategy: 'longest',
});

// Process mentions one at a time
const mentions = [
  { name: 'Dr. Albert Einstein', type: 'person' },
  { name: 'Einstein', type: 'person' },
  { name: 'A. Einstein', type: 'person', properties: { field: 'physics' } },
  { name: 'Google', type: 'organization' },
];

for (const mention of mentions) {
  const { action, canonicalEntity } = resolver.addEntity(mention);
  console.log(`${mention.name}: ${action} -> ${canonicalEntity.name}`);
}

console.log(resolver.size); // 2 (Einstein group + Google)

Custom Scoring Profiles

Tune weights for domain-specific resolution. For person names, phonetic matching is more useful; for organizations, exact match and abbreviation detection matter more:

// Person-optimized profile
const personResult = resolve(personMentions, {
  methods: {
    exactMatch: { weight: 1.0 },
    jaroWinkler: { weight: 2.0 },
    levenshtein: { weight: 1.5 },
    dice: { weight: 1.0 },
    soundex: { weight: 1.5 },
    metaphone: { weight: 1.5 },
  },
});

// Organization-optimized profile
const orgResult = resolve(orgMentions, {
  methods: {
    exactMatch: { weight: 3.0 },
    jaroWinkler: { weight: 1.0 },
    levenshtein: { weight: 0.5 },
    dice: { weight: 0.5 },
    soundex: { weight: 0.0 },
    metaphone: { weight: 0.0 },
  },
});

Alias-Driven Resolution

Provide aliases on entity mentions to improve matching for known equivalences:

const result = resolve([
  { name: 'United States', type: 'location', aliases: ['USA', 'US', 'United States of America'] },
  { name: 'USA', type: 'location' },
  { name: 'U.S.', type: 'location' },
]);

console.log(result.entities.length); // 1

Inspecting Match Details

The matches array in the resolution result provides full transparency into every evaluated pair:

const result = resolve(entities);

for (const match of result.matches) {
  if (match.classification === 'possible') {
    console.log(
      `Review: "${match.entityA}" vs "${match.entityB}" ` +
      `(score: ${match.similarity.toFixed(3)})`,
      match.methodScores
    );
  }
}

Scaling with Blocking

For large entity sets, blocking is essential. Without it, every pair is compared (O(n^2)). With blocking, only entities in the same block are compared:

// 10,000 entities -- use blocking to avoid 50M comparisons
const result = resolve(largeEntitySet, {
  blocking: ['prefix', 'phonetic', 'type'],
});

console.log(result.stats.candidatePairs); // Much less than n*(n-1)/2

TypeScript

entity-resolve is written in TypeScript and ships with declaration files and source maps. All types are exported from the package entry point:

import { resolve, similarity, createResolver } from 'entity-resolve';
import type {
  EntityMention,
  CanonicalEntity,
  MatchPair,
  ResolutionResult,
  ResolutionStats,
  SimilarityResult,
  ResolverOptions,
  EntityResolver,
} from 'entity-resolve';

The package targets ES2022 and uses CommonJS module output.


License

MIT