clustering-tfjs

v0.3.1

Published

7 months ago

High-performance TypeScript clustering algorithms (K-Means, Spectral, Agglomerative) with TensorFlow.js acceleration and scikit-learn compatibility

0High
0Medium
0Low

crjfisher

clustering machine-learning kmeans spectral-clustering agglomerative-clustering tensorflow tensorflowjs data-science unsupervised-learning scikit-learn typescript browser nodejs

clustering-tfjs

Native TypeScript implementation of clustering algorithms powered by TensorFlow.js with full browser and Node.js support.

Features

✅ Pure TypeScript/JavaScript (no Python required)
✅ Multiple clustering algorithms (K-Means, Spectral, Agglomerative)
✅ Powered by TensorFlow.js for performance
✅ Works in both Node.js and browsers
✅ Platform-optimized bundles (49KB for browser, 163KB for Node.js)
✅ TypeScript support with full type definitions
✅ GPU acceleration available (WebGL in browser, CUDA in Node.js)
✅ Automatic backend selection
✅ Extensively tested for parity with scikit-learn

Quick Start

Install

# For Node.js with acceleration
npm install clustering-tfjs @tensorflow/tfjs-node

# For Node.js with GPU support
npm install clustering-tfjs @tensorflow/tfjs-node-gpu

# For browser usage (TensorFlow.js loaded separately)
npm install clustering-tfjs

Note: For Windows users or if you encounter native binding issues, see our Windows Compatibility Guide.

Basic Usage

Browser

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/[email protected]/dist/tf.min.js"></script>
<script src="https://unpkg.com/clustering-tfjs/dist/clustering.browser.js"></script>

<script>
async function demo() {
  // Initialize the library
  await ClusteringTFJS.Clustering.init({ backend: 'webgl' });
  
  // Use algorithms
  const kmeans = new ClusteringTFJS.KMeans({ nClusters: 3 });
  const data = [[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]];
  const labels = await kmeans.fitPredict(data);
  console.log(labels); // [0, 0, 1, 1, 0, 2]
}
demo();
</script>

Node.js

import { Clustering } from 'clustering-tfjs';

// Initialize (optional - auto-detects best backend)
await Clustering.init();

// Use algorithms
const kmeans = new Clustering.KMeans({ nClusters: 3 });
const data = [[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]];
const labels = await kmeans.fitPredict(data);
console.log(labels); // [0, 0, 1, 1, 0, 2]

Installation

For Node.js

# Basic installation (pure JavaScript backend)
npm install clustering-tfjs

# Recommended: With native acceleration
npm install clustering-tfjs @tensorflow/tfjs-node

# Optional: With GPU support
npm install clustering-tfjs @tensorflow/tfjs-node-gpu

For Browser

The browser bundle is available via CDN:

<!-- Load TensorFlow.js -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/[email protected]/dist/tf.min.js"></script>

<!-- Load clustering-tfjs -->
<script src="https://unpkg.com/clustering-tfjs/dist/clustering.browser.js"></script>

Or install via npm and use with a bundler:

npm install clustering-tfjs @tensorflow/tfjs

Algorithms

K-Means Clustering

Classic centroid-based clustering
Supports custom initialization methods
K-Means++ initialization by default

Spectral Clustering

Graph-based clustering using eigendecomposition
Ideal for non-convex clusters
Supports custom affinity functions

Agglomerative Clustering

Hierarchical bottom-up clustering
Multiple linkage criteria (ward, complete, average, single)
Memory efficient implementation

Validation Metrics

The library includes three validation metrics to evaluate clustering quality and optimize the number of clusters:

Silhouette Score

Measures how similar an object is to its own cluster compared to other clusters. Range: [-1, 1], higher is better.

Davies-Bouldin Index

Evaluates intra-cluster and inter-cluster distances. Range: [0, ∞), lower is better.

Calinski-Harabasz Index

Ratio of between-cluster to within-cluster dispersion. Range: [0, ∞), higher is better.

Finding Optimal Number of Clusters

The library includes a built-in findOptimalClusters function that automatically determines the optimal number of clusters:

import { findOptimalClusters } from 'clustering-tfjs';

// Find optimal k between 2 and 10 clusters
const result = await findOptimalClusters(data, {
  minClusters: 2,
  maxClusters: 10,
  algorithm: 'kmeans'  // or 'spectral', 'agglomerative'
});

console.log(`Optimal number of clusters: ${result.optimal.k}`);
console.log(`Silhouette score: ${result.optimal.silhouette}`);
console.log(`All evaluations:`, result.evaluations);

// Advanced usage with custom scoring
const customResult = await findOptimalClusters(data, {
  maxClusters: 8,
  algorithm: 'spectral',
  algorithmParams: { affinity: 'nearest_neighbors' },
  metrics: ['silhouette', 'calinskiHarabasz'],  // Skip Davies-Bouldin
  scoringFunction: (evaluation) => evaluation.silhouette * 2 + evaluation.calinskiHarabasz
});

Platform Detection & Backend Selection

The library automatically detects your environment and selects the best backend:

import { Clustering } from 'clustering-tfjs';

// Check current platform
console.log('Platform:', Clustering.platform); // 'browser' or 'node'

// Check available features
console.log('Features:', Clustering.features);
// {
//   gpuAcceleration: true,
//   wasmSimd: false,
//   nodeBindings: true,
//   webgl: false
// }

// Manually select backend
await Clustering.init({ backend: 'webgl' }); // Browser
await Clustering.init({ backend: 'tensorflow' }); // Node.js

Available Backends

| Backend | Environment | Use Case | Performance | |---------|------------|----------|-------------| | cpu | Both | Pure JS fallback | Baseline | | webgl | Browser | GPU acceleration | 5-10x faster | | wasm | Browser | CPU optimization | 2-3x faster | | tensorflow | Node.js | Native bindings | 10-20x faster |

The library automatically selects the best available backend if not specified.

API Reference

Common Interface

All algorithms implement the same interface:

interface ClusteringAlgorithm {
  fit(X: Tensor2D | number[][]): Promise<void>;
  predict(X: Tensor2D | number[][]): Promise<number[]>;
  fitPredict(X: Tensor2D | number[][]): Promise<number[]>;
}

KMeans

new KMeans({
  nClusters: number;
  init?: 'k-means++' | 'random' | number[][];
  nInit?: number;
  maxIter?: number;
  tol?: number;
  // backend selection coming in future version
})

SpectralClustering

new SpectralClustering({
  nClusters: number;
  affinity?: 'rbf' | 'nearest_neighbors';
  gamma?: number;
  nNeighbors?: number;
  // backend selection coming in future version
})

AgglomerativeClustering

new AgglomerativeClustering({
  nClusters: number;
  linkage?: 'ward' | 'complete' | 'average' | 'single';
  // backend selection coming in future version
})

Validation Metrics

// Silhouette Score: [-1, 1], higher is better
silhouetteScore(X: Tensor2D | number[][], labels: number[]): Promise<number>

// Davies-Bouldin Index: [0, ∞), lower is better  
daviesBouldin(X: Tensor2D | number[][], labels: number[]): Promise<number>

// Calinski-Harabasz Index: [0, ∞), higher is better
calinskiHarabasz(X: Tensor2D | number[][], labels: number[]): Promise<number>

Examples

Coming soon: Example notebooks and CodePen demos

Performance

Based on our benchmarks:

K-Means: 0.5ms - 200ms depending on dataset size
Spectral: 10ms - 2s (includes eigendecomposition)
Agglomerative: 5ms - 500ms

See benchmarks/ for detailed performance data.

Migration from scikit-learn

# scikit-learn
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
labels = kmeans.fit_predict(X)

// clustering-js
import { KMeans } from 'clustering-tfjs';
const kmeans = new KMeans({ nClusters: 3 });
const labels = await kmeans.fitPredict(X);

Scikit-learn Compatibility

This library has been extensively tested for numerical parity with scikit-learn. Our test suite includes:

Step-by-step comparisons with sklearn implementations
Identical results for standard datasets
Matching behavior for edge cases

See tools/sklearn_comparison/ for detailed comparison scripts and test/ for parity tests.

Contributing

See CONTRIBUTING.md for guidelines on contributing to this project.

License

MIT

Note: This library is under active development. APIs may change in future versions.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

clustering-tfjs

Features

Table of Contents

Quick Start

Install

Basic Usage

Browser

Node.js

Installation

For Node.js

For Browser

Algorithms

K-Means Clustering

Spectral Clustering

Agglomerative Clustering

Validation Metrics

Silhouette Score

Davies-Bouldin Index

Calinski-Harabasz Index

Finding Optimal Number of Clusters

Platform Detection & Backend Selection

Available Backends

API Reference

Common Interface

KMeans

SpectralClustering

AgglomerativeClustering

Validation Metrics

Examples

Performance

Migration from scikit-learn

Scikit-learn Compatibility

Contributing

License