npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

hdbscan-wasm

v0.1.0

Published

HDBSCAN clustering in Rust/WASM — ported from scikit-learn, optimized for 2D Euclidean distance

Readme

hdbscan-wasm

HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) clustering algorithm, implemented in Rust and compiled to WebAssembly.

Origin

This is a direct port of scikit-learn's HDBSCAN implementation to Rust/WASM. The core algorithm was ported from:

  • sklearn/cluster/_hdbscan/_tree.pyx — condensed tree construction, cluster stability, Excess of Mass (EOM) selection, Union-Find label assignment
  • sklearn/cluster/_hdbscan/_reachability.pyx — core distance and mutual reachability distance computation

The porting was done by reading the scikit-learn source code line by line and translating each step to Rust. This is not a reimplementation from the paper — it follows scikit-learn's specific algorithmic choices and data structures.

What it does

HDBSCAN finds clusters of varying density in a point cloud. Unlike DBSCAN, it does not require an eps parameter — it automatically discovers clusters at multiple density scales and selects the most stable ones.

Algorithm steps:

  1. Core distances — For each point, compute the distance to its k-th nearest neighbor (KD-tree, O(n log n))
  2. Mutual reachability distancemax(core_dist[a], core_dist[b], euclidean(a, b))
  3. Minimum spanning tree — Prim's algorithm on the mutual reachability graph (O(n²) time, O(n) memory — no distance matrix)
  4. Condensed tree — Walk the single-linkage dendrogram, splitting at nodes with fewer than min_cluster_size points
  5. Cluster stability — Compute stability scores via excess of mass (EOM)
  6. Label assignment — Union-Find over the condensed tree to assign each point to a cluster or noise (-1)

Installation

npm install hdbscan-wasm

Usage

Basic

import init, { hdbscan } from 'hdbscan-wasm';

// Initialize WASM (call once)
await init();

const xs = new Float64Array([0.0, 0.1, 0.2, 5.0, 5.1, 5.2]);
const ys = new Float64Array([0.0, 0.1, 0.0, 3.0, 3.1, 3.0]);

const labels = hdbscan(xs, ys, 3, 3);
// Int32Array: [0, 0, 0, 1, 1, 1]
// -1 means noise

Web Worker (recommended for large datasets)

// worker.ts
import init, { hdbscan } from 'hdbscan-wasm';

self.onmessage = async (e) => {
  await init();
  const { xs, ys, minClusterSize, minSamples } = e.data;
  const labels = hdbscan(
    new Float64Array(xs),
    new Float64Array(ys),
    minClusterSize,
    minSamples,
  );
  self.postMessage({ labels: Array.from(labels) });
};

Next.js / Vite / webpack

Most modern bundlers handle WASM automatically. For Next.js, ensure the component using hdbscan-wasm is loaded with dynamic(() => import(...), { ssr: false }) since WASM cannot run during server-side rendering.

API

Single function, 2D Euclidean distance only:

function hdbscan(
  xs: Float64Array,       // X coordinates (length n)
  ys: Float64Array,       // Y coordinates (length n)
  min_cluster_size: number,  // minimum points to form a cluster
  min_samples: number,       // k for core distance (controls density smoothing)
): Int32Array               // cluster labels (length n, -1 = noise)

Parameters

| Parameter | Description | Typical range | |---|---|---| | min_cluster_size | Minimum number of points to form a cluster. Larger values produce fewer, bigger clusters. | 5–50 | | min_samples | Number of neighbors for core distance estimation. Larger values are more conservative (more noise). | 1–20 |

Relationship: min_samples controls how "dense" a region must be to be considered a core point. min_cluster_size controls how many core points are needed to form a cluster. They are independent parameters.

Performance

  • Core distance: O(n log n) via KD-tree k-NN queries
  • MST: O(n²) via Prim's algorithm — computed on-the-fly without building a distance matrix (O(n) memory)
  • Condensed tree + labelling: O(n)

Approximate benchmarks (single-threaded WASM):

| Points | Time | |---|---| | 1,000 | ~50ms | | 5,000 | ~500ms | | 10,000 | ~2s |

The MST step is the bottleneck (O(n²)). For >20,000 points, consider downsampling (e.g., grid coarsening).

Limitations

  • 2D Euclidean only. The distance function is hardcoded as sqrt((x1-x2)² + (y1-y2)²). No support for higher dimensions, Haversine, or custom metrics.
  • No sample weights. Every point has equal weight. Uniform grids (e.g., 250m population grids) may produce fragmented clusters because density is measured by point count, not by an associated value like population.
  • No prediction. Batch algorithm only — no approximate_predict() for new points.
  • O(n²) MST. The Prim MST step scales quadratically. Practical limit is ~15,000 points in interactive use.
  • Tie-breaking. When multiple MST edges have identical weights, the order may differ from scikit-learn (which uses a different MST algorithm). This can cause minor differences in cluster boundaries for ambiguous points, but the overall clustering structure is equivalent.
  • No soft clustering. Returns hard labels only, no probability scores.
  • EOM only. Only Excess of Mass cluster selection is supported (no leaf clustering).

Comparison with scikit-learn

| Feature | scikit-learn | hdbscan-wasm | |---|---|---| | Dimensions | arbitrary | 2D only | | Distance metrics | many (euclidean, manhattan, ...) | euclidean only | | MST algorithm | dual-tree Boruvka (O(n log n) amortized) | Prim (O(n²)) | | Sample weights | no | no | | Soft clustering | yes (probabilities) | no | | Prediction | yes (approximate_predict) | no | | Cluster selection | EOM + leaf | EOM only | | Runtime | Python + Cython | WASM (runs in browser) |

Building from source

Prerequisites: Rust toolchain + wasm-pack

# Install wasm-pack
cargo install wasm-pack

# Build WASM
wasm-pack build --target bundler --release

# Run Rust tests
cargo test --release

Tests

5 test cases included in src/lib.rs:

  1. two_moons — Classic two interleaving half-circles (100 points)
  2. two_blobs_noise — Two Gaussian blobs with uniform noise (110 points)
  3. grid_clusters — Three separated grid clusters (108 points)
  4. small_debug — Minimal 10-point debug case
  5. gangnam_noise_in_hull — Real-world test with projected coordinates, verifying no noise points appear inside cluster convex hulls
cargo test --release

License

BSD 3-Clause — compatible with scikit-learn's license, as this is a derived work.