string-best-match
v1.1.1
Published
Finds the longest matching substring (or sequence) between two arrays of strings.
Downloads
18
Maintainers
Readme
string-best-match
Finds the longest matching substring (or sequence) between two arrays of strings using a fast Rolling Hash (Rabin-Karp) algorithm with binary search. Useful whenever you need to align or highlight shared runs of tokens.
Installation
npm install string-best-matchRequirements
- Node.js: 14.0.0+ (ES2020 or later)
- TypeScript: 4.0+ (if using TypeScript)
This library uses BigInt for precise hash calculations, which requires ES2020 or later. BigInt is supported in:
- Node.js 10.4.0+
- Chrome 67+
- Firefox 68+
- Safari 14+
- Edge 79+
Usage
import { findBestMatch } from 'string-best-match';
const source = ['A', 'B', 'C', 'D', 'E'];
const target = ['X', 'B', 'C', 'D', 'Y'];
const result = findBestMatch(source, target);
console.log(result); // { startIndex: 1, sequenceLength: 3 }Use Cases
- Text highlighting (e.g., showing best match between user selection and document text)
- Comparing DNA or code sequences
- Diff visualization or fuzzy search improvements
API
| Parameter | Type | Description |
| --- | --- | --- |
| sourceArray | string[] | Array of tokens you want to match against. |
| targetArray | string[] | Array to scan for the longest contiguous sequence also present in the source. |
| Returns | Type | Description |
| --- | --- | --- |
| BestMatchResult | { startIndex: number; sequenceLength: number; } | Start index of the best match inside targetArray and the length of that sequence (-1, 0 when no match is found). |
Performance
⭐ Algorithm: Rolling Hash (Rabin-Karp) + Binary Search
This is the fastest approach for large arrays (100k–1M+ elements).
Time Complexity: O((N + M) × log(min(N, M)))
- N = length of source array
- M = length of target array
- Significantly faster than naive O(N × M) for large datasets
Space Complexity: O(N + M)
- Uses hash maps to store rolling hashes of substrings
How it works:
- Binary search is used to find the optimal substring length L
- For each candidate length L, compute rolling hashes of all substrings of length L in both arrays
- Check if any hashes match between source and target
- Verify that matches are genuine (not hash collisions)
- Converge to the longest matching substring
This approach is ideal for:
- Large text processing (100k+ tokens)
- DNA sequence comparison
- Code diff algorithms
- Real-time text analysis
Performance Example:
For arrays with 50,000 elements each:
- Naive O(N × M): ~2.5 billion operations
- Rolling Hash O((N+M) log min(N,M)): ~1.6 million operations
- Speed improvement: ~1,500x faster
Real-world test results (100k total elements):
✓ performs well with very large arrays (108ms)Examples
See the examples/usage.ts file for comprehensive examples including:
- Basic usage with simple arrays
- Text token matching (highlighting user selections)
- Code diff comparison
- DNA sequence alignment
- Handling no matches
- Performance tests with large arrays (32k elements)
- Multiple matches of the same length
- Finding longest among partial overlaps
Run the examples:
npx tsx examples/usage.tsScripts
npm run build— Type-checks and emits ESM output plus type declarations todist/npm run test— Executes the Vitest suite intests/
License
MIT © Valentyn
