@mauriciolobo/semantic-comparer
v0.1.0
Published
Semantic text comparison using ML embeddings
Downloads
55
Maintainers
Readme
semantic-comparer
Semantic text comparison using ML embeddings. Compare texts by meaning, not just characters.
Installation
npm install semantic-comparerOptional: Pre-download Model
On first use, the package downloads an embedding model (~80MB). To download it during installation:
npx semantic-comparer download-modelBenefits:
- No network delay on first use
- Works offline after pre-download
- Faster startup time
Check if model is cached:
npx semantic-comparer download-model --verifySuppress postinstall tip:
SEMANTIC_COMPARER_SILENT=1 npm install semantic-comparerNote: The model downloads to node_modules/@xenova/transformers/.cache/. Without pre-downloading, the model downloads automatically on first use.
CLI Usage
Compare two texts directly from the command line:
npx semantic-comparer "The cat sat on the mat" "A feline rested on the rug"
# Output: 0.7234Special characters and quotes:
# Use single quotes for texts with double quotes
npx semantic-comparer 'He said "hello"' 'She replied "hi"'
# Or escape double quotes
npx semantic-comparer "He said \"hello\"" "She replied \"hi\""Script usage:
SCORE=$(npx semantic-comparer "text one" "text two")
echo "Similarity: $SCORE"Requirements
- Node.js >= 18
Usage
import { SemanticComparer } from 'semantic-comparer';
const comparer = await SemanticComparer.create();
// Compare two texts
const result = await comparer.compare(
"The cat sat on the mat",
"A feline was resting on the rug"
);
console.log(result.score); // ~0.75 (0-1 scale, higher = more similar)API
SemanticComparer.create(config?)
Creates a new comparer instance (async factory).
Options:
config.model- Model identifier (default:"Xenova/all-MiniLM-L6-v2")
Returns: Promise<SemanticComparer>
Example:
// Use default model
const comparer = await SemanticComparer.create();
// Use custom model (if supported)
const comparer = await SemanticComparer.create({
model: 'Xenova/all-MiniLM-L6-v2'
});comparer.compare(text1, text2)
Compares two texts semantically.
Parameters:
text1(string) - First text to comparetext2(string) - Second text to compare
Returns: Promise<{ score: number }> where score is between 0 (unrelated) and 1 (identical meaning).
Example:
const result = await comparer.compare(
"Machine learning is transforming technology",
"AI is revolutionizing tech"
);
console.log(result.score); // ~0.72SEMANTIC_THRESHOLD
Exported constant with the default similarity threshold (default: 0.7).
import { SEMANTIC_THRESHOLD } from 'semantic-comparer';
console.log(SEMANTIC_THRESHOLD); // 0.7Environment Variables
MODEL- Override the default embedding modelSEMANTIC_THRESHOLD- Default similarity threshold (default:0.7)SEMANTIC_COMPARER_SILENT- Set to1to suppress postinstall tip message
How It Works
- Embeddings: Converts text into high-dimensional vectors (384 dimensions) using
@xenova/transformers - Normalization: Vectors are normalized to unit length during embedding generation
- Similarity: Calculates semantic similarity using optimized dot product (equivalent to cosine similarity for normalized vectors)
- Score: Returns a value from 0 (unrelated) to 1 (identical meaning)
Performance: ~40% faster than standard cosine similarity by leveraging normalized embeddings.
Supported Models
Currently supports:
Xenova/all-MiniLM-L6-v2(default, ~80MB)
License
MIT
