npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

cosine-similarity-threshold

v1.0.1

Published

Utility library for calculating cosine similarity thresholds

Downloads

125

Readme

Cosine Similarity Threshold

A library for calculating cosine similarity thresholds dynamically.

Installation

Install cosine-similarity-threshold

  npm install cosine-similarity-threshold

Usage/Examples

const { cosineSimilarity, cosineThreshold, kMeansThreshold } = require('cosine-similarity-threshold');

// Calculate cosine similarity between two vectors
const a = [1, 2, 3];
const b = [4, 5, 6];
const similarity = cosineSimilarity(a, b);
console.log(similarity); // Output: 0.9746318461970762

// Calculate cosine similarity threshold
const dataPoints = [
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9],
  [10, 11, 12],
  [13, 14, 15]
];
const threshold = cosineThreshold(dataPoints);
console.log(threshold); // Output: 0.9999999999999999

// Calculate cosine similarity threshold using K-means clustering
const threshold = await kMeansThreshold(dataPoints, 5);
console.log(threshold); // Output: 0.9999999999999999

API

cosineSimilarity(a, b)

Calculates the cosine similarity between two vectors a and b.

  • a: An array representing the first vector.
  • b: An array representing the second vector.
  • Returns: A number representing the cosine similarity between the vectors.

cosineThreshold(dataPoints)

Calculates the cosine similarity threshold for a set of data points.

  • dataPoints: An array of data points, where each data point is represented by an array of numbers.
  • options (optional): An object specifying additional options:
    • normalize: A boolean indicating whether to normalize the data points. Default: false.
    • type: The type of threshold to calculate. Possible values: 'median' (default), 'mean', 'min', 'max', 'percentile', 'deviation'.
    • percentile: The percentile to use when calculating the threshold. Only used when type is 'percentile'.
    • deviation: The number of standard deviations to use when calculating the threshold. Only used when type is 'deviation'.
  • Returns: A number representing the cosine similarity between the vectors.

The calculation of the cosine similarity threshold in the cosineThreshold function involves the following steps:

1. Calculate the pairwise cosine similarity between all pairs of data points.
2. Store the similarity values in an array.
3. Apply the specified threshold calculation type (median, mean, min, or max) to the similarity array.
4. Return the calculated threshold value.

kMeansThreshold(dataPoints, clusterCount, options)

Calculates the cosine similarity threshold using K-means clustering.

  • dataPoints: An array of data points, where each data point is represented by an array of numbers.
  • clusterCount: The number of clusters to form. Default value is 5.
  • options (optional): An object specifying additional options:
    • normalize: A boolean indicating whether to normalize the data points. Default: false.
    • type: The type of threshold to calculate. Possible values: 'median' (default), 'mean', 'min', 'max', 'percentile', 'deviation'.
    • percentile: The percentile to use when calculating the threshold. Only used when type is 'percentile'.
    • deviation: The number of standard deviations to use when calculating the threshold. Only used when type is 'deviation'.
  • Returns: A number representing the cosine similarity between the vectors.

The calculation of the cosine similarity threshold in the kMeansThreshold function is similar to cosineThreshold, but it involves additional steps due to the K-means clustering process:

1. If normalize option is true, normalize the data points using a normalization function.
2. Perform K-means clustering on the data points with the specified clusterCount.
3. Calculate the pairwise cosine similarity between the cluster centroids.
4. Store the similarity values in an array.
5. Apply the specified threshold calculation type (median, mean, min, or max) to the similarity array.
6. Return the calculated threshold value.

Error Handling

The library throws errors in the following cases:

  • The parameter must be an array.
  • The array must not be empty.
  • Array elements must be numbers.
  • The arrays must have the same length.
  • Invalid cluster count. Expected a positive number.
  • Each data point must be an array.
  • All the elements must have the same dimension.
  • Similarities must be an array.
  • Similarities array must not be empty.
  • Percentile must be a number between 0 and 100.
  • Deviation must be a positive number.
  • Invalid threshold type. Please handle these errors appropriately when using the library.

License

MIT