@stll/fuzzy-search
v1.1.0
Published
Approximate substring matching for Node.js and Bun via a Rust Myers engine exposed through NAPI-RS.
Maintainers
Readme
@stll/fuzzy-search
NAPI-RS approximate substring matching for Node.js and Bun. Finds near-matches within edit distance k with stable UTF-16 offsets, replace-safe match ranges, and optional diacritics normalization.
Built on Myers' bit-parallel algorithm (1999), implemented in Rust and exposed to JavaScript via NAPI-RS.
Install
npm install @stll/fuzzy-search
# or
bun add @stll/fuzzy-searchThe companion @stll/fuzzy-search-wasm package is
available for browser builds.
If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:
import { defineConfig } from "vite";
import stllFuzzySearchWasm from "@stll/fuzzy-search-wasm/vite";
export default defineConfig({
plugins: [stllFuzzySearchWasm()],
});GitHub releases include npm tarballs, an SBOM, and third-party notices.
Prebuilts are available for:
| Platform | Architecture | | ------------- | ------------ | | macOS | x64, arm64 | | Linux (glibc) | x64, arm64 | | WASM | browser |
Usage
import { FuzzySearch } from "@stll/fuzzy-search";
const fs = new FuzzySearch(
[
{ pattern: "Gaislerová", distance: 1 },
{ pattern: "Novák", distance: 1 },
{ pattern: "Příbram", distance: 2 },
],
{
normalizeDiacritics: true,
wholeWords: true,
},
);
fs.findIter("Smlouva s Gais1erová v Pribram");
// [
// { pattern: 0, start: 10, end: 20,
// text: "Gais1erová", distance: 1 },
// { pattern: 2, start: 23, end: 30,
// text: "Pribram", distance: 0 },
// ]Patterns
Patterns can be strings (default distance 1) or objects with explicit distance and optional name:
const fs = new FuzzySearch([
"simple", // distance 1
{ pattern: "named", name: "entity" }, // distance 1
{ pattern: "precise", distance: 2 }, // distance 2
]);Distance must be less than pattern length.
Options
const fs = new FuzzySearch(patterns, {
// Strip diacritics before matching (NFD + remove
// combining marks). "Příbram" matches "Pribram"
// at distance 0.
normalizeDiacritics: true, // default: false
// Only match whole words. Uses Unicode
// is_alphanumeric() for boundary detection.
// CJK characters always pass (no inter-word
// spaces in CJK).
wholeWords: true, // default: true
// Case-insensitive matching (Unicode-aware).
caseInsensitive: true, // default: false
// Unicode word boundaries (reserved for future
// UAX#29 segmentation support).
unicodeBoundaries: true, // default: true
// Drop matches whose score is below threshold.
// Score = 1 - distance / pattern.length.
// Inclusive (score >= minScore keeps the match).
minScore: 0.7,
// Return only the top k matches by score, across
// all patterns. Tie-broken by start, then pattern.
kBest: 5,
});Scored output
Every match carries a normalized score in [0, 1],
computed as 1 - distance / pattern.length and
clamped at 0. Pair it with minScore and kBest for
top-N ranking without a follow-up sort:
const fs = new FuzzySearch(
[
{ pattern: "Novák", distance: 2 },
{ pattern: "Gaislerová", distance: 2 },
],
{ wholeWords: true, minScore: 0.7, kBest: 3 },
);
fs.findIter("Nowák a Gais1erova");
// [
// { pattern: 0, text: "Nowák", distance: 1, score: 0.8, ... },
// { pattern: 1, text: "Gais1erova", distance: 2, score: 0.8, ... },
// ]replaceAll always replaces every distance-qualified
match and ignores minScore / kBest, so the
replacements-by-pattern contract stays
deterministic.
Replace
fs.replaceAll("Smlouva s Gais1erová", [
"[REDACTED]",
"[REDACTED]",
"[REDACTED]",
]);
// "Smlouva s [REDACTED]"replacements[i] replaces pattern i.
Distance helper
import { distance } from "@stll/fuzzy-search";
distance("kitten", "sitting"); // 3
distance("abcd", "abdc", "damerau-levenshtein"); // 1Benchmarks
The repository includes a checked-in benchmark harness for synthetic and corpus-based searches. The inputs are public and the scripts are reproducible from the repo. Run them locally:
bun run bench:install
bun run bench:download
bun run bench:speed
bun run bench:correctnessThe speed harness compares practical JS ecosystem
alternatives, but not every comparator implements the
same exact semantics. @stll/fuzzy-search is solving
approximate substring search with offsets and
replacement-friendly match ranges; tools like
fuse.js and fuzzball are included as reference
points, not as exact drop-in equivalents. The
headline comparisons in this repo are the
substring-mode rows against sliding-window
Levenshtein baselines.
Representative baseline from the checked-in public harness on this machine:
- runtime: Bun
1.3.12 - platform: macOS
26.4.1(Darwin arm64)
| Scenario | @stll/fuzzy-search | Sliding-window JS baseline | Relative |
| -------------------------------- | -------------------- | -------------------------- | -------- |
| Czech legal, 64 KB, 5 names | 2.41 ms | 80.78 ms | 33.5x |
| Bible, 4.0 MB, 5 names | 239.91 ms | 3903.26 ms | 16.3x |
| Czech news, 4.8 MB, 5 names | 262.39 ms | 4350.52 ms | 16.6x |
| German news, 5.5 MB, 5 names | 405.72 ms | 6816.03 ms | 16.8x |
These rows are substring mode (wholeWords: false)
with edit distance 1-2, which is the core workload
this package is designed for.
- fastest-levenshtein + sliding window — fastest JS Levenshtein distance
- fuse.js — fuzzy search (scoring, not substring matching)
- fuzzball — Python rapidfuzz port
- naive JS — O(nm) Levenshtein per window position
Correctness
Correctness is covered by example-based tests and property tests. The property suite verifies distance bounds, oracle agreement, whole-word boundaries, UTF-16 offset stability, normalization behavior, and mixed option combinations over randomized inputs.
API
| Method | Returns | Description |
| ------------------------------------- | -------------- | ------------------------ |
| new FuzzySearch(patterns, options?) | instance | Build matcher |
| .findIter(haystack) | FuzzyMatch[] | Non-overlapping matches |
| .isMatch(haystack) | boolean | Any pattern matches? |
| .replaceAll(haystack, replacements) | string | Replace matched patterns |
| .patternCount | number | Number of patterns |
Types
type PatternEntry =
| string
| { pattern: string; distance?: number; name?: string };
type Options = {
normalizeDiacritics?: boolean; // default: false
wholeWords?: boolean; // default: true
caseInsensitive?: boolean; // default: false
unicodeBoundaries?: boolean; // default: true
minScore?: number; // drop matches below threshold
kBest?: number; // top-k by score, ties by start
};
type FuzzyMatch = {
pattern: number; // index into patterns array
start: number; // UTF-16 code unit offset
end: number; // exclusive
text: string; // matched substring
distance: number; // actual Levenshtein distance
score: number; // 1 - distance/pattern.length
name?: string; // pattern name (if provided)
};Match offsets are UTF-16 code unit indices,
compatible with String.prototype.slice().
Error handling
- Constructor throws if a pattern is empty, longer than 64 characters, or has distance >= pattern length.
replaceAllthrows ifreplacements.lengthdoes not equalpatternCount.
How it works
Myers' bit-parallel algorithm scans the text in O(n) per pattern for patterns up to 64 characters. No DFA construction, no state explosion at higher distances.
Start position recovery via small-window Levenshtein: for each match end position from Myers, a window of [m-k, m+k] characters is evaluated to find the exact start and distance.
Diacritics normalization: NFD decomposition + combining mark stripping (Unicode General Category M via
unicode-normalizationcrate). Covers all scripts.UTF-16 offset translation: character-level matching with incremental char→UTF-16 mapping for JS string compatibility.
Limitations
- Pattern length capped at 64 characters. Myers uses a single u64 bit-vector per pattern. Longer patterns would need multi-word vectors (not yet implemented).
- No streaming API. The full haystack must be in
memory. For chunked processing, use
@stll/aho-corasick'sStreamMatcherfor exact prefiltering and fuzzy-search on flagged regions. - WASM requires
SharedArrayBuffer. Browser builds needCross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: require-corpheaders.
Development
bun install
bun run build # native module (requires Rust)
bun test # 36 unit tests
bun run test:props # 36 property tests × 1000 runs
bun run bench:install # benchmark dependencies
bun run bench:download # download corpora
bun run bench:speed # speed comparison
bun run bench:correctness # oracle verification
bun run lint # oxlint
bun run format # oxfmt + rustfmt