npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

clusternova

v1.5.4

Published

HDBSCAN clustering algorithm implementation in TypeScript

Readme

Clusternova 🚀

Simple data clustering library in TS

Discover Hidden Patterns in Your Data

Drop millions of recent social media posts and instantly surface trending topics. Feed in your support tickets and instantly have related issues be clustered together.

This TypeScript implementation easily brings clustering to your JavaScript ecosystem, whether you're running in Node.js (or another backend JS runtime) or directly in the browser. Built for real-world applications:

  • 📱 Social Media Intelligence: Surface trending topics from millions of posts in real-time
  • 🎯 Customer Insights: Transform raw feedback into actionable patterns
  • 🤖 AI/ML Pipelines: Cluster high-dimensional embeddings
  • 📊 Real-time Analytics: Process streaming data to detect emerging patterns
  • 🔍 Anomaly Detection: Automatically identify outliers and unusual behaviors

This TypeScript implementation of HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) is based on the paper "Density-Based Clustering Based on Hierarchical Density Estimates" by Ricardo J.G.B. Campello, Davoud Moulavi, and Joerg Sander.

This implementation draws inspiration from:

Why HDBSCAN?

HDBSCAN offers several advantages over traditional clustering algorithms:

  • No need to specify number of clusters: Unlike k-means, HDBSCAN automatically determines the optimal number of clusters
  • Handles variable density clusters: Can find clusters of different shapes and densities
  • Noise handling: Automatically identifies and handles noise points
  • Hierarchical clustering: Provides insights into the hierarchical structure of your data

Performance

Built from scratch in TypeScript in a single file with zero dependencies, this TypeScript implementation includes several optimizations, but not all the optimizations used in the HDBSCAN Python library. In my personal testing, it's shown comparable or better performance than the Python implementation.

  • ✨ Zero dependencies
  • 🚀 Pure TypeScript implementation
  • ⚡️ Optimized for JS runtimes
  • 🔒 Type-safe API
  • 📦 Small bundle size

Setup

npm install clusternova

Usage

import HDBSCAN, { findCentralElements, cosine, VectorPoint } from "clusternova";
// manhattan, euclidean are also available imports

// Extend VectorPoint with your additional fields
interface MyDataPoint extends VectorPoint {
  title: string;        // Optional fields
  timestamp: Date;      // that you might need
}

const data: MyDataPoint[] = [
  {
    id: "1",
    vector: [1, 2, 3],
    title: "First point",
    timestamp: new Date()
  },
  {
    id: "2",
    vector: [4, 5, 6],
    title: "Second point",
    timestamp: new Date()
  }
  // ... more points
];

const hdbscan = new HDBSCAN(data, 3, cosine); // minimum points = 3, cosine distance metric
const { clusters, outliers } = hdbscan.run();
// Types of returned data:
// clusters: MyDataPoint[][] - Array of clusters, each containing array of your data points
// outliers: MyDataPoint[] - Array of outlier points

// Example output:
{
    clusters: [
    [
        { id: "1", vector: [1, 2, 3], title: "First point", timestamp: "2024-01-01..." },
        { id: "2", vector: [4, 5, 6], title: "Second point", timestamp: "2024-01-01..." }
    ],
    // ... more clusters
    ],
    outliers: [
    { id: "7", vector: [7, 8, 9], title: "Outlier point", timestamp: "2024-01-01..." }
    // ... more outliers
    ]
}

// Find central elements of each cluster - returns array of elements with distance field (distance from the center of the cluster)
clusters.forEach(cluster => {
  const centralElements = findCentralElements(cluster, 3, cosine);
  // Returns: (MyDataPoint & { distance: number })[]
  // Example:
  // [
  //   { id: "1", vector: [1,2,3], title: "First point", distance: 0.2 },
  //   { id: "2", vector: [4,5,6], title: "Second point", distance: 0.3 }
  // ]
});

Example Projects

We have included an examples directory with web apps that demonstrate how to use Clusternova in real-world scenarios. One of them, social-media-clustering, is deployed and available for you to try (bring your own API key):

Social Media Clustering Demo: https://astonishing-donut-b3686d.netlify.app

This demo showcases clustering of the text embeddings of media posts (tweets) and summarizing each cluster using GPT3.5.

You can also clone the repo and run it yourself.

Example Use Cases

HDBSCAN is particularly valuable in:

Text Analysis

Despite theoretical limitations of clustering in high dimensions, I've found HDBSCAN works exceptionally well with text embeddings using cosine distance, making it particularly useful for NLP applications.

Examples:

  • Clustering social media posts to identify trending topics
  • Grouping customer support tickets to identify common issues
  • Organizing document collections by theme
  • Identifying similar product reviews or feedback

Image Processing

  • Grouping similar images in large collections
  • Identifying distinct objects in computer vision applications
  • Clustering visual features for object recognition

Scientific Applications

  • Gene expression clustering

Business Intelligence

  • Market segmentation
  • Anomaly detection in transactions
  • Customer behavior pattern identification

License information

This project is licensed under the MIT License.

Copyright (c) 2025 Kevin Ma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.