npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

clusterkw

v0.3.2

Published

A package for clustering keywords using OpenAI embeddings

Downloads

10

Readme

ClusterKW

A Node.js package for clustering keywords using OpenAI embeddings. This package can be used to cluster Google Ads keywords, AI chat topics, project tasks, or any other text-based items into semantically similar groups.

Features

  • Generate embeddings for keywords using OpenAI API
  • Calculate distances between keywords
  • Create clusters of semantically similar keywords
  • Automatically generate cluster names and descriptions using OpenAI API
  • Multiple clustering algorithms:
    • Simple: Basic distance-based clustering
    • K-means: Centroid-based clustering with configurable number of clusters
    • Hierarchical: Agglomerative clustering with different linkage methods
    • Direct: LLM-based clustering without embeddings (sends keywords directly to the model)

Installation

npm install clusterkw

Usage

import { KeywordClusterer } from 'clusterkw';

// Initialize with your OpenAI API key
const clusterer = new KeywordClusterer({
  apiKey: 'your-openai-api-key',
  // Optional configuration
  embeddingModel: 'text-embedding-3-small',
  completionModel: 'gpt-4o-mini-2024-07-18',
  minClusterSize: 3,
  distanceThreshold: 0.2,
  algorithm: 'direct',
  context: 'AI chat topics'
});

// Example keywords
const keywords = [
  'machine learning algorithms',
  'deep learning frameworks',
  'neural networks',
  'data visualization tools',
  'business intelligence software',
  'data analytics platforms',
  'artificial intelligence applications',
  'natural language processing',
  'computer vision systems',
  'reinforcement learning'
];

// Cluster the keywords
const clusters = await clusterer.clusterKeywords(keywords);

// Output the clusters with their names and descriptions
console.log(clusters);

API Reference

KeywordClusterer

The main class for clustering keywords.

Constructor Options

  • apiKey (string, required): Your OpenAI API key
  • embeddingModel (string, optional): The OpenAI model to use for embeddings. Default: 'text-embedding-3-small'
  • completionModel (string, optional): The OpenAI model to use for generating cluster names and descriptions. Default: 'gpt-3.5-turbo'
  • minClusterSize (number, optional): Minimum number of items to form a cluster. Default: 2
  • distanceThreshold (number, optional): Maximum cosine distance for items to be considered similar. Default: 0.3
  • algorithm (string, optional): Clustering algorithm to use ('simple', 'kmeans', or 'hierarchical'). Default: 'simple'
  • k (number, optional): Number of clusters for k-means algorithm. Default: auto-calculated based on dataset size
  • maxIterations (number, optional): Maximum iterations for k-means algorithm. Default: 100
  • linkage (string, optional): Linkage method for hierarchical clustering ('single', 'complete', or 'average'). Default: 'average'

Methods

  • clusterKeywords(keywords: string[]): Promise<Cluster[]>: Clusters the provided keywords and returns the clusters with names and descriptions.
  • getEmbeddings(texts: string[]): Promise<number[][]>: Gets embeddings for the provided texts.
  • calculateDistances(embeddings: number[][]): number[][]: Calculates the cosine distances between all embeddings.
  • generateClusters(keywords: string[], distances: number[][], embeddings?: number[][]): Cluster[]: Generates clusters based on the selected algorithm.
  • nameAndDescribeClusters(clusters: Cluster[]): Promise<Cluster[]>: Generates names and descriptions for the clusters.

Command Line Interface (CLI)

The package includes a CLI tool for clustering keywords directly from the command line:

# Install globally
npm install -g clusterkw

# Basic usage with a file
clusterkw --file keywords.txt

# Using a CSV file with a specific column
clusterkw --file keywords.csv --column keyword

# Specify output file
clusterkw --file keywords.txt --output clusters.json

# Customize clustering parameters
clusterkw --file keywords.txt --min-cluster-size 3 --distance 0.25

# Use different clustering algorithms
clusterkw --file keywords.txt --algorithm kmeans --clusters 5
clusterkw --file keywords.txt --algorithm hierarchical --linkage complete
clusterkw --file keywords.txt --algorithm direct --gpt-model gpt-4o-mini-2024-07-18

# Use context to guide clustering
clusterkw --file keywords.txt --context "AI chat topics"
clusterkw --file keywords.txt --algorithm direct --context "Google Ads keywords"

# Provide API key directly (alternatively, set OPENAI_API_KEY or OPENAI_KEY environment variable)
clusterkw --file keywords.txt --api-key your-openai-api-key

CLI Options

| Option | Alias | Description | Default | |--------|-------|-------------|---------| | --file | -f | Path to file containing keywords (supports txt, csv, json) | (required) | | --column | -c | Column name containing keywords (for CSV files) | keyword | | --output | -o | Output file path (supports json, csv) | (none) | | --api-key | | OpenAI API key (overrides environment variables) | (from env) | | --min-cluster-size | -m | Minimum cluster size | 2 | | --distance | -d | Maximum distance threshold | 0.3 | | --embedding-model | -e | OpenAI embedding model | text-embedding-3-small | | --gpt-model | -g | OpenAI completion model | gpt-4o-mini-2024-07-18 | | --algorithm | -a | Clustering algorithm (simple, kmeans, hierarchical, direct) | simple | | --clusters | -k | Number of clusters for k-means algorithm | 5 | | --max-iterations | | Maximum iterations for k-means algorithm | 100 | | --linkage | | Linkage method for hierarchical clustering | average | | --context | | Context description to guide clustering | (none) | | --delimiter | | CSV delimiter | , | | --no-header | | CSV file has no header row | (false) |

Version Management

This package follows Semantic Versioning (SemVer):

  • MAJOR: Incompatible API changes
  • MINOR: Add functionality (backward-compatible)
  • PATCH: Bug fixes (backward-compatible)

Updating the Version

You can update the version using npm scripts:

# Increment patch version (1.0.0 -> 1.0.1)
npm run version:patch

# Increment minor version (1.0.0 -> 1.1.0)
npm run version:minor

# Increment major version (1.0.0 -> 2.0.0)
npm run version:major

Automatic Publishing

This package uses GitHub Actions to automatically publish to npm when:

  1. A new version is pushed to the main branch (package.json changes)
  2. A new GitHub Release is created
  3. The publish workflow is manually triggered

License

MIT