simtext

v0.1.7

Published

2 years ago

A lightweight, rule-based text similarity calculator that selects the most appropriate comparison algorithm based on input string lengths.

Downloads

324

0High
0Medium
0Low

anandarizki

SimText - Lightweight Text Similarity Calculator

SimText is a minimalistic and lightweight text similarity calculator designed for efficiency and ease-of-use. SimText provides a streamlined approach to measure textual likeness.

Features

🪶 Lightweight: Crafted with performance in mind, SimText ensures fast calculations without bogging down your applications.
🔍 Multiple Algorithms:
- Levenshtein Distance: Ideal for single, short words, offering a precise measure of character-level differences.
- Jaccard Similarity: Computes similarity between sets of words, making it great for longer texts.
- N-gram Similarity: Versatile and adaptable, it breaks down text into overlapping chunks for a nuanced similarity measure.
🎯 Contextual Selection: Based on the length and nature of your text inputs, SimText intelligently chooses the most suitable algorithm to offer you the best similarity results.

Installation


npm install  simtext  --save

Usage

This guide provides instructions on how to use the exported functions designed to measure the similarity between two strings. These methods include Levenshtein similarity, Jaccard similarity, n-gram similarity, and a general text comparison function.

1. levenshteinSimilarity(a: string, b: string): number

Compares two strings and returns a similarity score based on the Levenshtein distance.

Parameters:
- a: First string.
- b: Second string.
Return: Similarity score between 0 and 1. A score of 1 means the strings are identical.

import {levenshteinSimilarity} from 'simtext';

const score = levenshteinSimilarity("apples", "apple");
console.log(score);  // 0.8333333333333334

2. jaccardSimilarity(str1: string, str2: string): number

Calculates the Jaccard similarity between two strings, comparing the unique words in each string.

Parameters:
- str1: First string.
- str2: Second string.
Return: Similarity score between 0 and 1.

import {jaccardSimilarity} from 'simtext';

const score = jaccardSimilarity("apple pie", "apple crumble pie");
console.log(score);  // 0.6666666666666666

3. ngramSimilarity(str1: string, str2: string, n?: number): number

Computes the n-gram similarity between two strings. This divides the strings into 'n' consecutive characters and then compares them.

Parameters:
- str1: Fienter code hererst string.
- str2: Second string.
- n: (Optional) Number of characters for the n-gram. Default is 2.
Return: Similarity score between 0 and 1.

import {ngramSimilarity} from 'simtext';

const score = ngramSimilarity("Roses are red, violets are blue", "Roses are red and the sky is blue", 2);
console.log(score);  // 0.4166666666666667

4. compareText(str1: string, str2: string): number

A comprehensive function that determines the most appropriate similarity method based on the nature of the input strings.

Parameters:
- str1: First string.
- str2: Second string.
Return: Similarity score between 0 and 1, using the method deemed best for the input strings.

import {compareText} from 'simtext';

const score = compareText("apple", "appel");
console.log(score);  // 0.6.

Note: The compareText function uses heuristics to choose the similarity method. For example, if both strings are single words and under 10 characters, it uses the levenshteinSimilarity. If the character count of both strings combined is above 200, it uses jaccardSimilarity. Otherwise, it uses ngramSimilarity.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme