@3leaps/string-metrics-wasm
v0.3.8
Published
High-performance string similarity metrics via WASM bindings to rapidfuzz-rs
Downloads
376
Maintainers
Readme
string-metrics-wasm
High-performance string similarity and fuzzy matching via WASM bindings to rapidfuzz-rs.
Description
This library provides blazing-fast string similarity metrics through WASM bindings to the Rust rapidfuzz-rs library, plus TypeScript implementations of advanced fuzzy matching algorithms. It combines the performance of compiled Rust/WASM with the flexibility of TypeScript for a comprehensive text similarity toolkit.
Features:
- WASM-powered distance metrics: Levenshtein, Damerau-Levenshtein, OSA, Jaro, Jaro-Winkler, Indel, LCS
- Fuzzy matching: Token-based comparison (order-insensitive, set-based)
- Process helpers: Find best matches from arrays with configurable scoring
- Unified API: Consistent interface across all metrics
- TypeScript extensions: Substring similarity, normalization presets, suggestions API
- Multi-runtime: Node.js, Bun, Deno support
Prerequisites
- Rust toolchain via rustup
wasm-pack(pinned to the version we build against)
Install wasm-pack once per machine:
cargo install wasm-pack --version 0.13.1Installation
npm install string-metrics-wasmQuick Start
import { levenshtein, ratio, tokenSortRatio, extractOne, score } from 'string-metrics-wasm';
// Basic edit distance
const dist = levenshtein('kitten', 'sitting');
console.log(dist); // 3
// Fuzzy matching (0-100 scale)
const fuzzy = ratio('hello', 'hallo');
console.log(fuzzy); // 80.0
// Order-insensitive comparison
const tokens = tokenSortRatio('new york mets', 'mets york new');
console.log(tokens); // 100.0
// Find best match from array
const choices = ['Atlanta Falcons', 'New York Jets', 'Dallas Cowboys'];
const best = extractOne('new york', choices);
console.log(best); // { choice: 'New York Jets', score: 57.14, index: 1 }
// Unified scoring API (0-1 scale)
const similarity = score('hello', 'world', 'jaroWinkler');
console.log(similarity); // 0.4666...API Documentation
Compatibility: All examples use camelCase option names and metric identifiers. For ecosystems that standardize on snake_case (e.g., Fulmen/Crucible fixtures), the same snake_case names are accepted as aliases and normalized internally.
Distance Metrics (WASM)
Edit distance metrics return raw integer distances (lower = more similar):
levenshtein(a: string, b: string): number
Minimum edits (insertions, deletions, substitutions) to transform a into b.
levenshtein('kitten', 'sitting'); // 3damerau_levenshtein(a: string, b: string): number
Levenshtein + transpositions (unrestricted).
damerau_levenshtein('abcd', 'abdc'); // 1osa_distance(a: string, b: string): number
Optimal String Alignment (restricted Damerau-Levenshtein).
osa_distance('abcd', 'abdc'); // 1indel_distance(a: string, b: string): number
Insertions and deletions only (no substitutions).
indel_distance('hello', 'hallo'); // 2lcs_seq_distance(a: string, b: string): number
Longest Common Subsequence distance.
lcs_seq_distance('AGGTAB', 'GXTXAYB'); // 3Similarity Metrics (WASM)
Normalized similarity scores (0.0-1.0 scale, higher = more similar):
normalized_levenshtein(a: string, b: string): number
Normalized Levenshtein similarity.
normalized_levenshtein('kitten', 'sitting'); // 0.5714jaro(a: string, b: string): number
Jaro similarity.
jaro('kitten', 'sitting'); // 0.7460jaro_winkler(a: string, b: string): number
Jaro-Winkler similarity (boosts prefix matches).
jaro_winkler('kitten', 'sitting'); // 0.7460indel_normalized_similarity(a: string, b: string): number
Normalized indel similarity.
indel_normalized_similarity('hello', 'hallo'); // 0.8lcs_seq_normalized_similarity(a: string, b: string): number
Normalized LCS similarity.
lcs_seq_normalized_similarity('AGGTAB', 'GXTXAYB'); // 0.5714Fuzzy Matching (WASM + TypeScript)
Fuzzy string comparison metrics (0-100 scale):
ratio(a: string, b: string): number (WASM)
Basic fuzzy similarity using Indel distance.
ratio('kitten', 'sitting'); // 61.54partialRatio(a: string, b: string): number (TypeScript)
Best matching substring using sliding window.
partialRatio('fuzzy', 'fuzzy wuzzy was a bear'); // 100.0tokenSortRatio(a: string, b: string): number (TypeScript)
Order-insensitive token comparison (sorts tokens first).
tokenSortRatio('new york mets', 'mets york new'); // 100.0tokenSetRatio(a: string, b: string): number (TypeScript)
Set-based token comparison (handles duplicates and order).
tokenSetRatio('hello world world', 'world hello'); // 100.0Process Helpers (TypeScript)
Find best matches from arrays:
extractOne(query: string, choices: string[], options?): ExtractResult | null
Find the single best match.
Options:
scorer?: (a: string, b: string) => number- Scoring function (default:ratio)processor?: (str: string) => string- Preprocessing functionscoreCutoff?: number- Minimum score threshold (default: 0)
const choices = ['Atlanta Falcons', 'New York Jets', 'Dallas Cowboys'];
const best = extractOne('jets', choices, { scoreCutoff: 30 });
// { choice: 'New York Jets', score: 35.29, index: 1 }extract(query: string, choices: string[], options?): ExtractResult[]
Find top N matches (sorted by score).
Options:
scorer?: (a: string, b: string) => number- Scoring functionprocessor?: (str: string) => string- Preprocessing functionscoreCutoff?: number- Minimum score thresholdlimit?: number- Maximum results to return
const results = extract('new york', choices, { limit: 2, scoreCutoff: 40 });
// [
// { choice: 'New York Jets', score: 57.14, index: 1 },
// { choice: 'New York Giants', score: 52.17, index: 2 }
// ]Unified API (TypeScript)
Metric-selectable interface with consistent scales:
distance(a: string, b: string, metric?: DistanceMetric): number
Calculate edit distance using any metric (returns raw distance).
Supported metrics: 'levenshtein' (default), 'damerauLevenshtein', 'osa', 'indel',
'lcsSeq'
distance('hello', 'world'); // 4 (default: levenshtein)
distance('hello', 'world', 'indel'); // 8score(a: string, b: string, metric?: SimilarityMetric): number
Calculate similarity using any metric (returns 0-1 normalized score).
Supported metrics: 'jaroWinkler' (default), 'levenshtein', 'damerauLevenshtein', 'osa',
'jaro', 'indel', 'lcsSeq', 'ratio', 'partialRatio', 'tokenSortRatio', 'tokenSetRatio'
score('hello', 'world'); // 0.4666... (default: jaroWinkler)
score('new york mets', 'mets york new', 'tokenSortRatio'); // 1.0
// Fulmen/Crucible users: override default metric if needed
score('hello', 'world', 'levenshtein'); // 0.5714 (edit distance-based)Normalization & Suggestions
normalize(input: string, preset?: NormalizationPreset, locale?: NormalizationLocale): string
Normalize text for comparison with optional locale-specific case folding.
Presets: 'none', 'minimal', 'default', 'aggressive'
Locales: 'tr' (Turkish), 'az' (Azerbaijani), 'lt' (Lithuanian), or undefined (default
Unicode casefold)
normalize('Naïve Café', 'default'); // 'naïve café'
// Turkish/Azerbaijani: dotted/dotless I handling
normalize('İstanbul', 'default', 'tr'); // 'istanbul' (İ→i)
normalize('IĞDIR', 'default', 'tr'); // 'ığdır' (I→ı dotless)
// Default Unicode casefold (no locale)
normalize('İstanbul', 'default'); // 'i̇stanbul' (İ→i + combining dot)Note: Most applications don't need locale-specific normalization. Only use when processing Turkish, Azerbaijani, or Lithuanian text where dotted/dotless I distinction matters.
suggest(query: string, candidates: string[], options?): Suggestion[]
Get ranked suggestions with detailed scoring.
const suggestions = suggest('pythn', ['python', 'java', 'javascript'], {
metric: 'jaroWinkler',
minScore: 0.6,
maxSuggestions: 3,
});
// [
// { value: 'python', score: 0.9555, ... },
// ...
// ]See Suggestions API docs for full details.
Implementation Details
WASM vs TypeScript
This library uses a hybrid approach for optimal performance and flexibility:
WASM Implementations (fastest):
- Core distance metrics:
levenshtein,damerau_levenshtein,osa_distance,jaro,jaro_winkler - RapidFuzz metrics:
ratio,indel_*,lcs_seq_*
TypeScript Implementations (flexible):
- Token-based fuzzy matching:
partialRatio,tokenSortRatio,tokenSetRatio - Process helpers:
extractOne,extract - Unified API:
distance(),score() - Suggestions and normalization
Token-based metrics benefit from TypeScript's array operations and avoid WASM serialization overhead. The unified API provides a convenient abstraction over both WASM and TypeScript implementations.
Supported Runtimes
- Node.js 16+ (ESM and CommonJS)
- Bun (native ESM support)
- Deno (use
npm:specifier)
Building from Source
- Install dependencies and tooling:
make bootstrap - Build WASM:
npm run build:wasmormake build - Build TS:
npm run build:ts
Development
This project uses a Makefile for common tasks:
make help # Show all available targets
make build # Build WASM and TypeScript (with version check)
make test # Run tests
make clean # Remove build artifacts
# Code quality
make quality # Run all quality checks (format-check, lint, rust checks)
make format # Format all code (Biome + Prettier + rustfmt)
make format-check # Check formatting without changes
make lint # Lint TypeScript code with Biome
make lint-fix # Lint and auto-fix TypeScript code
# Version management
make version-check # Verify package.json and Cargo.toml versions match
make bump-patch # Bump patch version (0.1.0 -> 0.1.1)
make bump-minor # Bump minor version (0.1.0 -> 0.2.0)
make bump-major # Bump major version (0.1.0 -> 1.0.0)
make set-version VERSION=x.y.z # Set explicit versionExplore the rest of the documentation under docs/. Start with the high-level
overview or jump straight to the contributor guide in
docs/development.md.
Code Quality Tools
This project uses modern, fast tooling for code quality:
- TypeScript/JavaScript: Biome for linting and formatting
- JSON/YAML/Markdown: Prettier for formatting
- Rust:
rustfmtfor formatting,clippyfor linting
Run make quality before committing to ensure all checks pass.
Version Management
This project maintains version sync between package.json (npm) and Cargo.toml (Rust). The
Makefile provides targets to bump versions and keep them in sync. Additionally, the test suite
includes a version consistency check that will fail if versions drift.
Important: Always use make bump-* or make set-version commands to update versions. This
ensures both files stay synchronized.
Performance
All string comparison operations complete in < 1ms:
- WASM metrics: 0.0003-0.0005ms per operation
- Token-based metrics: 0.0003-0.0017ms per operation
- Process helpers: 0.0008-0.001ms per operation
- Unified API: minimal dispatch overhead
Run node benchmark-phase1b.js for detailed benchmarks.
Testing
This project includes comprehensive test coverage:
- 119 unit tests covering all functions
- 80 YAML fixture test cases for reproducibility
- 100% regression-free across all releases
Run tests with npm test or make test.
Related Projects
- rapidfuzz-rs - Rust implementation of RapidFuzz
- rapidfuzz - Original Python implementation
- strsim-rs - String similarity metrics (deprecated in favor of rapidfuzz-rs)
Versioning
This project follows Semantic Versioning. Version history is maintained in CHANGELOG.md.
Current Status: See latest release for the current version and changes.
License
This project is licensed under the MIT License.
Contributing
Contributions welcome! Please see our contributing guidelines:
- Development setup: docs/development.md
- Release workflow (maintainers): docs/publishing.md
Governance
- Authoritative policies repository: https://github.com/3leaps/oss-policies/
- Code of Conduct: https://github.com/3leaps/oss-policies/blob/main/CODE_OF_CONDUCT.md
- Security Policy: https://github.com/3leaps/oss-policies/blob/main/SECURITY.md
- Contributing Guide: https://github.com/3leaps/oss-policies/blob/main/CONTRIBUTING.md
⚡ Fast Strings. Accurate Matches. ⚡
High-performance text similarity for modern TypeScript applications
Built with ⚡ by the 3 Leaps team
String Metrics • Fuzzy Matching • WASM Performance
