@mauriciolobo/semantic-comparer

v0.1.0

Published

6 days ago

Semantic text comparison using ML embeddings

Downloads

0High
0Medium
0Low

mauriciolobo

semantic-comparer

Semantic text comparison using ML embeddings. Compare texts by meaning, not just characters.

Installation

npm install semantic-comparer

Optional: Pre-download Model

On first use, the package downloads an embedding model (~80MB). To download it during installation:

npx semantic-comparer download-model

Benefits:

No network delay on first use
Works offline after pre-download
Faster startup time

Check if model is cached:

npx semantic-comparer download-model --verify

Suppress postinstall tip:

SEMANTIC_COMPARER_SILENT=1 npm install semantic-comparer

Note: The model downloads to node_modules/@xenova/transformers/.cache/. Without pre-downloading, the model downloads automatically on first use.

CLI Usage

Compare two texts directly from the command line:

npx semantic-comparer "The cat sat on the mat" "A feline rested on the rug"
# Output: 0.7234

Special characters and quotes:

# Use single quotes for texts with double quotes
npx semantic-comparer 'He said "hello"' 'She replied "hi"'

# Or escape double quotes
npx semantic-comparer "He said \"hello\"" "She replied \"hi\""

Script usage:

SCORE=$(npx semantic-comparer "text one" "text two")
echo "Similarity: $SCORE"

Requirements

Node.js >= 18

Usage

import { SemanticComparer } from 'semantic-comparer';

const comparer = await SemanticComparer.create();

// Compare two texts
const result = await comparer.compare(
  "The cat sat on the mat",
  "A feline was resting on the rug"
);

console.log(result.score); // ~0.75 (0-1 scale, higher = more similar)

API

`SemanticComparer.create(config?)`

Creates a new comparer instance (async factory).

Options:

config.model - Model identifier (default: "Xenova/all-MiniLM-L6-v2")

Returns: Promise<SemanticComparer>

Example:

// Use default model
const comparer = await SemanticComparer.create();

// Use custom model (if supported)
const comparer = await SemanticComparer.create({ 
  model: 'Xenova/all-MiniLM-L6-v2' 
});

`comparer.compare(text1, text2)`

Compares two texts semantically.

Parameters:

text1 (string) - First text to compare
text2 (string) - Second text to compare

Returns: Promise<{ score: number }> where score is between 0 (unrelated) and 1 (identical meaning).

Example:

const result = await comparer.compare(
  "Machine learning is transforming technology",
  "AI is revolutionizing tech"
);
console.log(result.score); // ~0.72

`SEMANTIC_THRESHOLD`

Exported constant with the default similarity threshold (default: 0.7).

import { SEMANTIC_THRESHOLD } from 'semantic-comparer';
console.log(SEMANTIC_THRESHOLD); // 0.7

Environment Variables

MODEL - Override the default embedding model
SEMANTIC_THRESHOLD - Default similarity threshold (default: 0.7)
SEMANTIC_COMPARER_SILENT - Set to 1 to suppress postinstall tip message

How It Works

Embeddings: Converts text into high-dimensional vectors (384 dimensions) using @xenova/transformers
Normalization: Vectors are normalized to unit length during embedding generation
Similarity: Calculates semantic similarity using optimized dot product (equivalent to cosine similarity for normalized vectors)
Score: Returns a value from 0 (unrelated) to 1 (identical meaning)

Performance: ~40% faster than standard cosine similarity by leveraging normalized embeddings.

Supported Models

Currently supports:

Xenova/all-MiniLM-L6-v2 (default, ~80MB)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme