npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@phoenixaihub/tokenwaste

v1.0.0

Published

Information-theoretic context selection for AI coding agents. TF-IDF + AST call graph for 50-70x context reduction.

Readme

tokenwaste

npm license tests

Information-theoretic context selection for AI coding agents.

TF-IDF + AST call graph to compute mutual information between your query and code chunks. Only includes chunks above a configurable information threshold. Targets 50-70x context reduction.

Why?

AI coding agents stuff entire codebases into context windows. Most of it is noise. TokenWaste uses information theory to select only the code that actually matters for your query.

Install

npm install @phoenixaihub/tokenwaste

# Or globally for CLI
npm install -g @phoenixaihub/tokenwaste

CLI Usage

Select relevant context

tokenwaste select ./src --query "authentication middleware" --threshold 0.3

# With token budget
tokenwaste select ./src --query "database connection" --threshold 0.5 --max-tokens 8000

# JSON output for piping
tokenwaste select ./src --query "error handling" --json

Analyze all chunks

tokenwaste analyze ./src --query "API routes" --top 5

Score specific files

tokenwaste score --query "auth" --files src/auth.ts src/middleware.ts

Programmatic API

selectContext(dir, options)

Main entry point. Analyzes a directory, scores all chunks, returns only relevant ones.

import { selectContext } from '@phoenixaihub/tokenwaste';

const result = await selectContext('./src', {
  query: 'authentication middleware',
  threshold: 0.5,        // minimum bits of mutual information
  maxTokens: 8000,       // optional token budget
  useCallGraph: true,    // include AST call graph analysis
});

console.log(`Selected ${result.selectedChunks}/${result.totalChunks} chunks`);
console.log(`Compression: ${result.compressionRatio}x`);

for (const chunk of result.chunks) {
  console.log(`${chunk.file}:${chunk.startLine} — ${chunk.mutualInformation} bits`);
  console.log(chunk.content);
}

analyzeChunks(dir, query, options?)

Returns all chunks scored but unfiltered. Useful for exploration.

import { analyzeChunks } from '@phoenixaihub/tokenwaste';

const scored = await analyzeChunks('./src', 'database connection');
for (const chunk of scored.slice(0, 10)) {
  console.log(`${chunk.file}: ${chunk.mutualInformation} bits`);
}

scoreRelevance(contents, query)

Score arbitrary code content against a query without reading from disk.

import { scoreRelevance } from '@phoenixaihub/tokenwaste';

const scores = scoreRelevance([
  { id: 'auth.ts', content: 'function authenticate() { ... }' },
  { id: 'utils.ts', content: 'function formatDate() { ... }' },
], 'authentication');

// scores[0].score — mutual information in bits
// scores[0].matchedTerms — which query terms matched

How It Works

  1. Chunking: Splits source files into ~50-line chunks with overlap
  2. TF-IDF: Computes term frequency–inverse document frequency between query terms and chunk content
  3. Call Graph: Extracts function definitions and call relationships using regex-based AST patterns (JS/TS/Python)
  4. Mutual Information: Combines TF-IDF similarity (converted to bits) with call graph connectivity boosting
  5. Threshold: Only returns chunks above the configurable MI threshold

Mutual Information Formula

MI(query; chunk) = 0.7 × TF-IDF_bits + 0.3 × Graph_bits

TF-IDF_bits = -log₂(1 - cosine_similarity)
Graph_bits  = log₂(1 + connected_relevant_functions)

Supported Languages

| Language | TF-IDF | Call Graph | |------------|--------|------------| | JavaScript | ✅ | ✅ | | TypeScript | ✅ | ✅ | | Python | ✅ | ✅ | | Go, Rust, Java, etc. | ✅ | ❌ (TF-IDF only) |

Configuration

interface SelectContextOptions {
  query: string;           // Search query (required)
  threshold?: number;      // MI threshold in bits (default: 0.5)
  maxTokens?: number;      // Token budget (default: unlimited)
  useCallGraph?: boolean;  // Use AST analysis (default: true)
  maxLines?: number;       // Max lines per chunk (default: 50)
  overlap?: number;        // Chunk overlap lines (default: 5)
  tfidfWeight?: number;    // TF-IDF weight (default: 0.7)
  graphWeight?: number;    // Graph weight (default: 0.3)
  maxGraphDepth?: number;  // Transitive dep depth (default: 3)
}

License

MIT