@rbxts/fuzzy-search
v1.0.4
Published
Fuzzy search allows you to find the results even with the spelling mistaces or approximate search words
Readme
FuzzySearch
A comprehensive package providing fuzzy search algorithms and text processing utilities.
NPM
Namespaces
FuzzySearch.Scores: Core implementations of comparison algorithms (raw values)FuzzySearch.Similarities: Normalized implementations (0-1 similarity scores)FuzzySearch.Transform: Text preprocessing functionsFuzzySearch.Tokenization: String splitting utilitiesFuzzySearch.Sorting: Ranking functions based on similarity metrics
Algorithms
CosineTextSimilarity
Calculates cosine similarity between tokenized strings. Ideal for document comparison.
Input:"Hello World" vs "Hello Earth"
Tokenization:["Hello", "World"] vs ["Hello", "Earth"]
Similarity: ~0.5 (shared term "Hello", different terms "World"/"Earth")
Declarations:
FuzzySearch.Scores.CosineTextSimilarity(tokenized_term: string[], tokenized_query: string[]): number;
/**higher - better */
FuzzySearch.Sorting.CosineTextSimilarity(output_terms: string[], tokenized_terms: string[][], tokenized_query: string[]): [number, string][]
//tokenization
const tokenized_term = FuzzySearch.Tokenization.Word(term);
//Hello Wor'ld-and_this is128 big!1.0;
//["hello", "wor", "ld", "and", "this", "is128", "big", "1", "0"]Example:
// Comparing product descriptions
const desc1 = FuzzySearch.Tokenization.Word(
"Wireless Bluetooth Headphones Noise-Canceling",
);
const desc2 = FuzzySearch.Tokenization.Word(
"Bluetooth 5.0 Wireless Earbuds with ANC",
);
const similarity = FuzzySearch.Scores.CosineTextSimilarity(desc1, desc2); // ~0.34Damerau-Levenshtein
Measures edit operations including transpositions. Great for OCR error correction.
Input:"Thursday" vs "Thrusday"
Distance: 1 (single transposition r↔u)
Operations: Transpose 'r' and 'u'
Declarations:
// Basic implementation
FuzzySearch.Scores.DamerauLevenshteinDistance(term: string, query: string): number;
// Optimized for large strings
FuzzySearch.Scores.LargeStringsDamerauLevenshteinDistance(term: string, query: string): number;
// Similarity versions (0-1)
FuzzySearch.Similarities.DamerauLevenshteinDistance(term: string, query: string): number;
FuzzySearch.Similarities.LargeStringsDamerauLevenshteinDistance(term: string, query: string): number;
/**lower - better */
FuzzySearch.Sorting.DamerauLevenshteinDistance(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];
/**lower - better */
FuzzySearch.Sorting.LargeStringsDamerauLevenshteinDistance(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];
/**higher - better */
FuzzySearch.Sorting.DamerauLevenshteinDistanceSimilarity(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];
/**higher - better */
FuzzySearch.Sorting.LargeStringsDamerauLevenshteinDistanceSimilarity(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];Example:
FuzzySearch.Scores.DamerauLevenshteinDistance("apple", "appel"); // 1 (transposition)
FuzzySearch.Similarities.DamerauLevenshteinDistance("clasp", "claps"); // 0.8 (1 edit)Fuzzy Score
Default implementation for general-purpose fuzzy matching.
Input:"fuzzywuzzy" vs "fuzzy"
Score: High (exact initial match + continuation)
Use Case: Partial match scoring in search boxes
Declarations:
FuzzySearch.Scores.FuzzyScore(term: string, query: string): number;
/**higher - better */
FuzzySearch.Sorting.FuzzyScore(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];Example:
// Search box implementation
const results = FuzzySearch.Sorting.FuzzyScore(
["Astrophysics", "Particle Physics", "Quantum Mechanics"],
"Phys",
);Hamming Distance
Compares same-length strings. Ideal for error-correcting codes.
Input:"karolin" vs "kathrin" (equal length)
Distance: 3
Differences: Positions 3(r≠t), 4(o≠h), 6(i≠r)
Declarations:
FuzzySearch.Scores.HammingDistance(term: string, query: string): number;
FuzzySearch.Similarities.HammingDistance(term: string, query: string): number;
FuzzySearch.Similarities.HammingLevenshteinHybrid(term: string, query: string): number;
/**lower - better */
FuzzySearch.Sorting.HammingDistance(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];
/**higher - better */
FuzzySearch.Sorting.HammingDistanceSimilarity(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];
/**higher - better */
FuzzySearch.Sorting.HammingLevenshteinHybridSimilarity(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];Example:
// DNA sequence comparison
FuzzySearch.Scores.HammingDistance("GATACA", "GATTACA"); // 4 mismatches
FuzzySearch.Scores.HammingDistance("AGCT", "AGTT"); // 1 mismatchJaroWinkler
Optimized for name matching and short strings.
Input:"Martha" vs "Marhta"
Similarity: 0.961 (vs 0.944 in plain Jaro)
Benefit: Prefix bonus for "Mar" match
Declarations:
FuzzySearch.Scores.JaroWinkler(term: string, query: string, case_sensitive: boolean = true): number;
/**higher - better */
FuzzySearch.Sorting.JaroWinkler(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];Examples:
FuzzySearch.Scores.JaroWinkler("shackleford", "shackelford"); // 0.98
FuzzySearch.Scores.JaroWinkler("garbage", "gabrage"); // 0.96 (transposition handling)Levenshtein Distance
Classic edit distance algorithm. Good for general typo detection.
Input:"kitten" vs "sitting"
Operations: 3 edits
k→s (substitution), e→i (substitution), +g (insertion)
Declarations:
FuzzySearch.Scores.LevenshteinDistance(term: string, query: string): number;
FuzzySearch.Similarities.LevenshteinDistance(term: string, query: string): number;
/**lower - better */
FuzzySearch.Sorting.LevenshteinDistance(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];
/**higher - better */
FuzzySearch.Sorting.LevenshteinDistanceSimilarity(terms: string[], query: string, output_terms?: string[] = terms): [number, string][];Examples:
FuzzySearch.Similarities.LevenshteinDistance("kitten", "sitting"); // 0.571 (3/7 edits)
FuzzySearch.Sorting.LevenshteinDistance(["color", "colour", "colr"], "colour");
// [ [0, "colour"], [1, "color"], [2, "colr"] ]NGramCosine
Calculates cosine similarity using n-gram frequency vectors. Ideal for comparing documents by content structure.
Input:"The quick" vs "Quick brown" (n=3)
Trigrams:["The", "he ", "e q", " qu", ...] vs ["Qui", "uic", "ick", ...]
Similarity: Low - different n-gram sets
Declarations:
FuzzySearch.Scores.NGramCosine(n_gram_counts_tokenized_term: Map<string, number>, n_gram_counts_tokenized_query: Map<string, number>): number;
/**higher - better */
FuzzySearch.Sorting.NGramCosineSorting(output_terms: string[], n_gram_count_tokenized_terms: Map<string, number>[], n_gram_count_tokenized_query: Map<string, number>): [number, string][];
//tokenization
const n_gram_counts_tokenized_term = FuzzySearch.Tokenization.NGramCountsTokenization(term, n);
//Hello World! n = 3
//["Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld", "ld!"];Examples:
const n = 2;
const term1 = FuzzySearch.Tokenization.NGramCounts("hello", n);
const term2 = FuzzySearch.Tokenization.NGramCounts("holla", n);
FuzzySearch.Scores.NGramCosine(term1, term2); // 0.25 (shared "ll")NGramJaccard
Calculates similarity using n-gram set intersections. Effective for quick presence/absence comparisons.
Input:"night" vs "nacht" (n=2)
Bigrams:{"ni", "ig", "gh", "ht"} vs {"na", "ac", "ch", "ht"}
Similarity: 0.14 (1 shared "ht" / 7 total)
Declarations:
FuzzySearch.Scores.NGramJaccard(n_gram_set_tokenized_term: Set<string>, n_gram_set_tokenized_query: Set<string>): number;
/**higher - better */
FuzzySearch.Sorting.NGramJaccard(output_terms: string[], n_gram_set_tokenized_terms: Set<string>[], n_gram_set_tokenized_query: Set<string>): [number, string][] ;
//tokenization
const n_gram_counts_tokenized_term = FuzzySearch.Tokenization.NGramSetTokenization(term, n);
//Hello World! n = 3
//["Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld", "ld!"];
Examples:
const n = 3;
const set1 = FuzzySearch.Tokenization.NGramSet("Microsoft", n);
const set2 = FuzzySearch.Tokenization.NGramSet("Microsystems", n);
FuzzySearch.Scores.NGramJaccard(set1, set2); // 0.31 (shared "Mic", "cro", etc)Tokenization
NGram
Declarations:
//Hello World! n = 3
//["Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld", "ld!"];
FuzzySearch.Tokenization.NGram(text: string, n: number): string[];
//used for NGramJaccard
FuzzySearch.Tokenization.NGramSet(str: string, n: number): Set<string>
//used for NGramCosine
FuzzySearch.Tokenization.NGramCounts(str: string, n: number): Map<string, number>;Word
Tokenizes text into lowercase words, ignoring punctuation. Has various options
Declaration:
interface IWordTokenizationOptions {
PreserveAccents?: boolean;
PreserveHyphens?: boolean;
}
FuzzySearch.Tokenization.WordTo(text: string, options?: IWordTokenizationOptions): string[]
//Hello Wor'ld-and_this is128 big!1.0;
//["hello", "wor", "ld", "and", "this", "is128", "big", "1", "0"]Transform
Soundex
Phonetic encoding for name matching:
Declarations:
FuzzySearch.Transform.SoundexTransform(input: string): string;
//Robert = Rupert -> R163
//Ashcraft = Ashcroft -> A261
//Howard = Hayward = Hardy -> H630