entity-predictor
v1.3.1
Published
Lightweight, Zero Dependency Node.js library for entity name prediction and normalization.
Maintainers
Readme
Entity Predictor
A lightweight, Zero Dependency Node.js library for entity name prediction and normalization.
It uses fuzzy matching to identify entities from messy input, supporting:
- Aliases & Acronyms (e.g., "SBI" -> "STATE BANK OF INDIA")
- Confidence Scoring ("Trustable", "High Confidence", etc.)
- Top-N Matches (Get the top 3 best guesses)
- Configurable Stop Words (Ignore "The", "Inc", etc.)
Features
- Fuzzy Matching: Matches inputs to entities even with typos or partial names.
- Alias Support: Handles acronyms (e.g., "SBI" -> "STATE BANK OF INDIA") and alternative names.
- Confidence Scoring: Returns a confidence score and a human-readable trust level ("Trustable", "High", "Moderate").
- Normalization: Automatically normalizes input to ignore case and special characters.
Installation
npm install entity-predictorUsage
1. Import and Initialize
You can initialize the predictor with a list of entities. Entities can be simple strings or objects defining aliases.
import { EntityPredictor } from "entity-predictor";
const entities = [
// Simple string entity
"ICICI BANK",
"AXIS BANK",
// Entity with aliases
{
name: "STATE BANK OF INDIA",
aliases: ["SBI", "State Bank", "S.B.I."],
},
{
name: "HDFC BANK",
aliases: ["HDFC", "Housing Development Finance Corporation"],
},
];
const predictor = new EntityPredictor(entities);2. Predict Entities
Use the predict() method to find the best match for an input string.
const result = predictor.predict("sbi");
console.log(result);
/*
Output:
{
entity: "STATE BANK OF INDIA",
confidence: 1,
confidenceLevel: "Trustable",
input: "sbi"
}
*/Handling Typos
const result = predictor.predict("icici bk");
console.log(result);
/*
Output:
{
entity: "ICICI BANK",
confidence: 0.71,
confidenceLevel: "Moderate Confidence"
}
*/3. Top-N Matches
Get a list of best matches instead of just one.
const results = predictor.predictTop("Apple", 3);
// Returns array of matches: [{ entity: "Apple Inc", ... }, ...]4. Handling Ambiguity (isAmbiguous)
Sometimes, an input matches multiple entities with the exact same confidence score. For example, "UCO" could match "UCO Bank", "Union Commercial Bank", etc.
The result object includes an isAmbiguous flag to warn you.
const result = predictor.predict("uco");
if (result.isAmbiguous) {
console.warn("Ambiguous input! Found multiple candidates.");
// Use predictTop to show options to the user
const options = predictor.predictTop("uco", 5);
console.log(options);
} else {
console.log("Found:", result.entity);
}5. Stop Words Filtering
Automatically remove noise words like "The", "Inc", "Ltd". Disabled by default.
// Enable with default list
const predictor = new EntityPredictor(entities, { ignoreStopWords: true });
// Enable with custom list
const predictor = new EntityPredictor(entities, {
ignoreStopWords: true,
stopWords: ["inc", "co", "corp"],
});6. Custom Normalization
Pass a custom normalizer to clean data your way.
const predictor = new EntityPredictor(entities, {
normalizer: (text) => text.toUpperCase(),
});7. Redis Datasets Support
Load entities directly from a Redis source (requires your own redis client).
import Redis from "ioredis"; // or any redis client
import { EntityPredictor } from "entity-predictor";
const redis = new Redis();
const predictor = new EntityPredictor(); // Start empty or with some local entities
// Load from a Redis String (JSON)
// Key content: '["Apple", {"name": "Google", "aliases": ["Alphabet"]}]'
await predictor.loadFromRedis(redis, { key: "my_entities", type: "json" });
// Load from a Redis Set
// Key content: SMEMBERS -> ["Tesla", "SpaceX"]
await predictor.loadFromRedis(redis, { key: "my_set_key", type: "set" });
// Load from a Redis Hash
// Key content: HGETALL -> { "Amazon": '["AWS"]', "Netflix": "FLIX" }
await predictor.loadFromRedis(redis, { key: "my_hash_key", type: "hash" });8. Add Entities Dynamically
You can add new entities to an existing predictor instance.
predictor.addEntity("PUNJAB NATIONAL BANK", ["PNB"]);API Reference
new EntityPredictor(entities, options)
entities: Array of strings or objects{ name: string, aliases: string[] }.options: (Optional)ignoreStopWords: boolean (defaultfalse)stopWords: string[] (optional, defaults to internal list)normalizer: (text: string) => string
- Throws:
TypeErrorifentitiesis not an array.
predict(input, threshold)
input: String to search for.threshold: (Optional) Minimum confidence score (default0.6).- Returns: Best match object
{ entity: string, confidence: number, input: string, isAmbiguous: boolean, ... },{ entity: "UNKNOWN", ... }if no match found, ornullif input is invalid.
predictTop(input, limit, threshold)
limit: Max number of results (default5).- Returns: Array of match objects.
Typescript Support
Includes index.d.ts for full TypeScript support.
