npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

newskit-mcp-server

v1.0.0

Published

MCP server for intelligent news article categorization using embeddings and clustering

Readme

NewsKit MCP Server

An MCP server for intelligent news article categorization using embeddings and clustering. Automatically groups similar articles together and generates human-readable category names.

Features

  • Semantic Categorization: Uses sentence-transformers to generate embeddings and DBSCAN clustering to group similar articles
  • ChromaDB Integration: Stores article embeddings for fast semantic search
  • Automatic Category Naming: Uses TF-IDF to extract keywords and generate descriptive category names
  • Configurable Parameters: Adjust similarity thresholds and minimum cluster sizes to fine-tune results
  • Search Capability: Find semantically similar articles using natural language queries

Tools

categorize_articles

Run the full categorization pipeline on a TSV file of news articles.

Parameters:

  • inputPath (required): Path to TSV file with article_id and title columns
  • outputPath (optional): Output JSON file path (default: categories.json)
  • minClusterSize (optional): Minimum articles per category (default: 2)
  • similarityThreshold (optional): Cosine similarity threshold 0-1 (default: 0.75)
  • persistDir (optional): ChromaDB storage directory (default: ./chroma_db)

Example:

{
  "inputPath": "/path/to/articles.tsv",
  "outputPath": "/path/to/categories.json",
  "similarityThreshold": 0.8,
  "minClusterSize": 3
}

load_articles

Preview articles from a TSV file without categorizing.

Parameters:

  • inputPath (required): Path to TSV file
  • limit (optional): Maximum articles to return (default: 50)

search_similar

Search for semantically similar articles using natural language queries.

Parameters:

  • query (required): Search query text
  • persistDir (optional): ChromaDB directory (default: ./chroma_db)
  • nResults (optional): Number of results (default: 5, max: 20)

get_categories

Display categorized results from a JSON output file.

Parameters:

  • resultsPath (required): Path to categories.json file

Installation

Prerequisites

  • Node.js 18 or higher
  • Python 3.8 or higher
  • Python dependencies: pip install chromadb sentence-transformers pandas numpy scikit-learn

From NPM

npm install -g newskit-mcp-server

From Source

git clone https://github.com/CodeAKrome/newskit-mcp-server.git
cd newskit-mcp-server
npm install
npm run build

Configuration

Add to your MCP settings file:

{
  "mcpServers": {
    "newskit": {
      "command": "node",
      "args": ["/path/to/newskit-mcp-server/build/index.js"],
      "disabled": false,
      "alwaysAllow": [],
      "disabledTools": []
    }
  }
}

Or if installed via npm:

{
  "mcpServers": {
    "newskit": {
      "command": "npx",
      "args": ["newskit-mcp-server"],
      "disabled": false
    }
  }
}

Input Format

The input TSV file should have two columns:

  • article_id: Unique identifier for the article
  • title: Article title text

Example:

article_id	title
abc123	Venezuela releases over 100 political prisoners
def456	Seahawks advance to Super Bowl with thrilling win

Output Format

The output JSON file contains:

{
  "categories": [
    {
      "category_id": 1,
      "category_name": "Venezuela / Prisoners",
      "article_count": 3,
      "articles": [
        {"article_id": "abc123", "title": "Venezuela releases..."}
      ]
    }
  ],
  "uncategorized": [
    {"article_id": "xyz789", "title": "Unique article..."}
  ]
}

Tuning Guide

| Goal | Parameter Adjustment | |------|---------------------| | More categories (looser) | Lower similarityThreshold (try 0.65) | | Fewer, tighter categories | Raise similarityThreshold (try 0.85) | | Only major categories | Raise minClusterSize (try 5) | | Include smaller clusters | Lower minClusterSize (try 2) |

Architecture

  • TypeScript MCP Server: Provides the tool interface via stdio transport
  • Python Bridge: Interfaces with ML libraries (sentence-transformers, scikit-learn)
  • ChromaDB: Vector database for embedding storage and similarity search
  • Sentence-Transformers: all-MiniLM-L6-v2 model for generating embeddings
  • DBSCAN: Clustering algorithm for grouping similar articles
  • TF-IDF: Keyword extraction for automatic category naming

License

MIT License - See LICENSE file for details

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.