@phoenixaihub/tokenwaste

v1.0.0

Published

a month ago

Information-theoretic context selection for AI coding agents. TF-IDF + AST call graph for 50-70x context reduction.

0High
0Medium
0Low

phoenixaihub

context-window ai coding-agent tfidf ast mutual-information token-optimization

tokenwaste

Information-theoretic context selection for AI coding agents.

TF-IDF + AST call graph to compute mutual information between your query and code chunks. Only includes chunks above a configurable information threshold. Targets 50-70x context reduction.

Why?

AI coding agents stuff entire codebases into context windows. Most of it is noise. TokenWaste uses information theory to select only the code that actually matters for your query.

Install

npm install @phoenixaihub/tokenwaste

# Or globally for CLI
npm install -g @phoenixaihub/tokenwaste

CLI Usage

Select relevant context

tokenwaste select ./src --query "authentication middleware" --threshold 0.3

# With token budget
tokenwaste select ./src --query "database connection" --threshold 0.5 --max-tokens 8000

# JSON output for piping
tokenwaste select ./src --query "error handling" --json

Analyze all chunks

tokenwaste analyze ./src --query "API routes" --top 5

Score specific files

tokenwaste score --query "auth" --files src/auth.ts src/middleware.ts

Programmatic API

`selectContext(dir, options)`

Main entry point. Analyzes a directory, scores all chunks, returns only relevant ones.

import { selectContext } from '@phoenixaihub/tokenwaste';

const result = await selectContext('./src', {
  query: 'authentication middleware',
  threshold: 0.5,        // minimum bits of mutual information
  maxTokens: 8000,       // optional token budget
  useCallGraph: true,    // include AST call graph analysis
});

console.log(`Selected ${result.selectedChunks}/${result.totalChunks} chunks`);
console.log(`Compression: ${result.compressionRatio}x`);

for (const chunk of result.chunks) {
  console.log(`${chunk.file}:${chunk.startLine} — ${chunk.mutualInformation} bits`);
  console.log(chunk.content);
}

`analyzeChunks(dir, query, options?)`

Returns all chunks scored but unfiltered. Useful for exploration.

import { analyzeChunks } from '@phoenixaihub/tokenwaste';

const scored = await analyzeChunks('./src', 'database connection');
for (const chunk of scored.slice(0, 10)) {
  console.log(`${chunk.file}: ${chunk.mutualInformation} bits`);
}

`scoreRelevance(contents, query)`

Score arbitrary code content against a query without reading from disk.

import { scoreRelevance } from '@phoenixaihub/tokenwaste';

const scores = scoreRelevance([
  { id: 'auth.ts', content: 'function authenticate() { ... }' },
  { id: 'utils.ts', content: 'function formatDate() { ... }' },
], 'authentication');

// scores[0].score — mutual information in bits
// scores[0].matchedTerms — which query terms matched

How It Works

Chunking: Splits source files into ~50-line chunks with overlap
TF-IDF: Computes term frequency–inverse document frequency between query terms and chunk content
Call Graph: Extracts function definitions and call relationships using regex-based AST patterns (JS/TS/Python)
Mutual Information: Combines TF-IDF similarity (converted to bits) with call graph connectivity boosting
Threshold: Only returns chunks above the configurable MI threshold

Mutual Information Formula

MI(query; chunk) = 0.7 × TF-IDF_bits + 0.3 × Graph_bits

TF-IDF_bits = -log₂(1 - cosine_similarity)
Graph_bits  = log₂(1 + connected_relevant_functions)

Supported Languages

| Language | TF-IDF | Call Graph | |------------|--------|------------| | JavaScript | ✅ | ✅ | | TypeScript | ✅ | ✅ | | Python | ✅ | ✅ | | Go, Rust, Java, etc. | ✅ | ❌ (TF-IDF only) |

Configuration

interface SelectContextOptions {
  query: string;           // Search query (required)
  threshold?: number;      // MI threshold in bits (default: 0.5)
  maxTokens?: number;      // Token budget (default: unlimited)
  useCallGraph?: boolean;  // Use AST analysis (default: true)
  maxLines?: number;       // Max lines per chunk (default: 50)
  overlap?: number;        // Chunk overlap lines (default: 5)
  tfidfWeight?: number;    // TF-IDF weight (default: 0.7)
  graphWeight?: number;    // Graph weight (default: 0.3)
  maxGraphDepth?: number;  // Transitive dep depth (default: 3)
}

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme