k-means-plus

v1.0.0

Published

4 years ago

K-means++ clustering a classification of data, so that points assigned to the same cluster are similar (in some sense). It is identical to the K-means algorithm, except for the selection of initial conditions.

0High
0Medium
0Low

goldensunliu

major-colors

K-means++ clustering a classification of data, so that points assigned to the same cluster are similar (in some sense). It is identical to the K-means algorithm, except for the careful selection of initial conditions. It is a way of avoiding the sometimes poor clusterings found by the standard k-means algorithm. k-means++ variance address this major theoretic shortcoming and guarantee an approximation ratio O(log k) in expectation (over the randomness of the algorithm), where k is the number of clusters used.

Install

npm i k-means-plus
yarn add k-means-plus

Usage

import KMeansCluster from 'k-means-plus';
const cluster = new KMeansCluster({ distance, maximumIterations, convergedFn });
const results = cluster.cluster(vectors, k);

import KMeansCluster from 'k-means-plus';

const data = [
  {'company': 'Microsoft' , 'size': 91259, 'revenue': 60420},
  {'company': 'IBM' , 'size': 400000, 'revenue': 98787},
  {'company': 'Skype' , 'size': 700, 'revenue': 716},
  {'company': 'SAP' , 'size': 48000, 'revenue': 11567},
  {'company': 'Yahoo!' , 'size': 14000 , 'revenue': 6426 },
  {'company': 'eBay' , 'size': 15000, 'revenue': 8700},
];

// Create the data 2D-array (vectors) describing the data
let vectors = [];
for (let i = 0 ; i < data.length ; i++) {
  vectors.push([ data[i]['size'] , data[i]['revenue']]);
}

const k = 4;

const cluster = new KMeansCluster();

const {
  model: { observations, centroids, assignments },
  iterations,
  durationMs
} = cluster.cluster(vectors, k);

Constructor

distance - (vector1: [number], vector2: [number]) => number distance function used for clustering between two vectors.
Default: euclidean distance
maximumIterations - number maximum iterations of clustering.
Default: 200
convergedFn - (centroids1: [[number]], centroids2: [[number]]) => boolean determine if two consecutive set of centroids are converged.
Default: the clusters are converged if the cluster assignments are the same.

Inputs

vectors - [[number]] point vectors to cluster
k - number of final clusters

Outputs

type result = {
  model: {
    observations: [[number]], // the original vectors to cluster
    centroids: [[number]], // vectors of the centers of cluster
    assignments: [number] // mapping from index of original vector to the index of cluter center it belongs to
  },
  iterations: number, // number of iterations ran before converging
  durationMs: number // the duration of the algorithm
}

Published

Vulnerabilities

Links

Maintainers

Keywords