string-similarity-plus
v1.0.0
Published
String similarity calculation with enhanced special character normalization
Downloads
11
Maintainers
Readme
string-similarity-plus
A robust string similarity calculator that handles various special characters and Unicode variations.
Features
- Calculate similarity percentage between two strings
- Normalize special characters (quotes, dashes, spaces, etc.)
- Find similar strings in an array based on a threshold
- Works with multilingual text including CJK characters
Installation
npm install string-similarity-plusUsage
const { calculateStringSimilarity, findSimilarStrings } = require('string-similarity-plus');
// Calculate similarity between two strings
const str1 = "<h2>2. 無限供應肉類火鍋放題 - 牛摩</h2>";
const str2 = "<h2>2. 無限供應肉類火鍋放題 – 牛摩</h2>";
const similarity = calculateStringSimilarity(str1, str2);
console.log(Similarity: ${similarity.toFixed(2)}%); // Should show very high similarity
// Find similar strings in an array
const content = [
"<h2>2. 無限供應肉類火鍋放題 – 牛摩</h2>",
"<h2>Some other content</h2>",
"<h2>無限供應火鍋放題牛摩</h2>",
];
const searchString = "<h2>2. 無限供應肉類火鍋放題 - 牛摩</h2>";
const SIMILARITY_THRESHOLD = 80; // Set your desired similarity threshold
const matches = findSimilarStrings(searchString, content, SIMILARITY_THRESHOLD);
console.log(matches); // Will show matching itemsAPI
calculateStringSimilarity(str1, str2)
Calculates the similarity percentage between two strings.
- Parameters:
str1(string): First string to comparestr2(string): Second string to compare
- Returns: Number between 0-100 representing similarity percentage
findSimilarStrings(searchString, contentArray, threshold)
Finds strings in an array that are similar to the search string.
- Parameters:
searchString(string): String to search forcontentArray(array): Array of strings to search inthreshold(number, optional): Similarity threshold percentage (default: 80)
- Returns: Array of matching strings
Special Character Handling
This library normalizes various special characters including:
- Different types of quotes and apostrophes
- Various dashes and hyphens
- Different space characters
- Various brackets and parentheses
- Different types of dots, ellipses, and slashes
- And more...
License
MIT
