quick-fuzzy

v1.0.3

Published

8 days ago

A high‑performance fuzzy string matching library written in TypeScript

0High
0Medium
0Low

amice13

fuzzy match fuzzy strings strings comparison

Quick Fuzzy

A high‑performance fuzzy string matching library written in TypeScript.

This is NOT a search library.
It performs approximate string matching, not indexing, ranking, or full‑text search.

Designed for:

deduplication
typo‑tolerant matching
fuzzy equality checks
names, regions, addresses
constrained candidate comparison

Core Benefits

Sub‑linear candidate evaluation via length‑based bucketing
Nilsimsa hashing for fast similarity pruning
Jaro‑Winkler scoring for final precision
LRU caching of query hashes
Unicode‑safe (Cyrillic, Ukrainian, diacritics)
Pluggable normalization pipeline
Zero runtime dependencies
Deterministic results (no randomness at match time)
Low memory retention (GC‑friendly)

This makes the library especially suitable for:

humanitarian registries
address / region normalization
beneficiary deduplication
data quality pipelines
offline or embedded environments

How It Works (High Level)

Normalization (optional)
Length bucketing to reduce candidate space
Nilsimsa hash comparison to prune unlikely matches
Adaptive tolerance based on entropy and string length
Jaro‑Winkler scoring for final selection

This hybrid approach combines speed and accuracy without maintaining a global index.

Installation

npm install @amice13/quick-fuzzy

Basic Usage

import createSearchInstance from 'quick-fuzzy'

const matcher = createSearchInstance({
  mode: 'static',
  ignoreCase: true,
  removeDiacritics: true,
  normalizeWhitespace: true,
  maxQueryCache: 1000,
  data: [
    'Київська область',
    'Львівська область',
    'Харківська область'
  ]
})

matcher.search('Київска область')
// ['Київська область']

matcher.search('Неіснуюча область')
// null

const dynamicMatcher = createSearchInstance()
dynamicMatcher.search('тринадцять', ['Мені', 'тринадцятий', 'минало', 'Я', 'пас', 'ягнята', 'за', 'селом'])
// ['тринадцятий']

Static vs Dynamic Mode

static — hashes are computed once, no new strings added
dynamic — hashes are computed lazily and cached

createSearchInstance({ mode: 'static' })

Static mode is recommended for fixed datasets and maximum predictability.

Other options

| Option | Type | Description | Lower Bound | Upper Bound | Recommended | Notes | | ----------------------- | --------------------------------------------------- | -------------------------------------- | ----------- | ----------- | ------------ | --------------------------------- | | mode | 'static' \| 'dynamic' | Hash materialization strategy | — | — | static | static = fastest & stable heap | | data | string[] \| undefined | Initial dataset | — | — | — | Ignored if hashMap is provided | | hashMap | Map<number, Map<string, Uint8Array>> \| undefined | Precomputed length-bucketed hash index | — | — | — | Enables zero-cost initialization | | maxQueryCache | number | LRU size for query hash cache | 0 | 50_000 | 500–10_000 | Each entry stores a 32-byte hash | | ignoreCase | boolean | Lowercase normalization | — | — | true | Improves recall | | ignoreSymbols | boolean | Strip punctuation | — | — | true | Useful for names & regions | | removeDiacritics | boolean | Accent removal | — | — | true | Critical for Ukrainian text | | normalizeWhitespace | boolean | Collapse whitespace | — | — | true | Prevents formatting mismatches | | disableNormalization | boolean | Disable all normalization | — | — | false | Overrides all normalization flags | | stringLengthTolerance | number | Length-based pruning ratio | 0.05 | 0.6 | 0.15–0.30 | Lower = faster | | hashMinTolerance | number | Minimum allowed hash delta | 0 | 16 | 3–6 | Guards weak matches | | hashBaseTolerance | number | Base hash similarity threshold | 5 | 32 | 12–20 | Main precision/recall control | | hashLengthPenalty | number | Length penalty factor | 0 | 3 | 0.5–1.5 | Penalizes long strings | | hashEntropyBoost | number | Low-entropy similarity boost | 0 | 12 | 2–6 | Helps repetitive/short strings |

Hot start

You can use getCache function to get the fully materialized internal cache and configuration of the search instance.

This function exposes the effective Options object, including the generated hash index (hashMap), allowing you to:

reuse a precomputed index
persist it between runs
create multiple instances without re-hashing
perform apples-to-apples benchmarks
warm-start production services

import createSearchInstance from 'quick-fuzzy'

const matcher = createSearchInstance({
  mode: 'static',
  ignoreCase: true,
  removeDiacritics: true,
  normalizeWhitespace: true,
  maxQueryCache: 1000,
  data: [
    'Київська область',
    'Львівська область',
    'Харківська область'
  ]
})

const savedOptions = matcher.getCache()
const anotherInstance = createSearchInstance(savedOptions)

Benchmark

Scenario

Dataset: 10,000 randomly generated strings
Queries: 1,000 fuzzy queries
Environment: Node.js (single thread)

Check ./benchmark folder

Environment

Date: 2026-01
OS Name: Ubuntu-22.04
Processor: 12th Gen Intel(R) Core(TM) i7-12700H, 2300 Mhz, 14 Core(s), 20 Logical Processor(s)
Installed RAM: 32.0 GB
System type 64-bit operating system, x64-based processor
Node: v24.1.0

Results

┌────────────────┬───────────┐
│ (index)        │ Values    │
├────────────────┼───────────┤
│ initTimeMs     │ '41.77'   │
│ benchTimeMs    │ '7127.10' │
│ peakHeapMB     │ '15.49'   │
│ retainedHeapMB │ '0.12'    │
│ truePositives  │ 941       │
│ gcTimeMs       │ '1.98'    │
│ baselineHeapMB │ '9.69'    │
│ libraryHeapMB  │ '0.43'    │
└────────────────┴───────────┘

Interpretation

High recall (94.1% true positives)
Minimal retained memory
Predictable GC behavior
No index build overhead

Code style

The code is written with JavaScript Standard Style.

License

MIT