npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@grapine.ai/contextprune

v0.1.4

Published

Garbage collection for LLM context windows.

Readme

@grapine.ai/contextprune

Garbage collection for LLM context windows.

Sits between your application and the LLM API. Analyzes your messages[] array, removes dead weight — stale tool outputs, resolved errors, superseded reasoning — and returns a leaner version. Every API call costs less. The model stays focused on what actually matters.

100% local. No data sent anywhere. No LLM calls during compression.

npm install @grapine.ai/contextprune

The problem

Long LLM sessions fill up fast:

Turn  1  ████░░░░░░░░░░░░░░░░░░░░░░░░░░   12%   4,100 tokens
Turn  5  ████████████░░░░░░░░░░░░░░░░░░   38%  12,800 tokens
Turn 10  ████████████████████░░░░░░░░░░   58%  19,400 tokens
Turn 15  ████████████████████████████░░   78%  26,100 tokens  ← quality degrades here
Turn 20  ██████████████████████████████   91%  30,600 tokens  ← coherence cliff

Around 65–75% utilization, model behavior suddenly gets worse — the model loses track of earlier constraints, repeats itself, makes mistakes it wouldn't make with a clean context. Most developers hit this, get confused, and manually clear context — losing all the good state too.

With contextprune:

Turn  1  ████░░░░░░░░░░░░░░░░░░░░░░░░░░   12%   4,100 tokens    —
Turn  5  ████████████░░░░░░░░░░░░░░░░░░   38%  12,800 tokens    —
Turn  6  ████░░░░░░░░░░░░░░░░░░░░░░░░░░   11%   3,700 tokens  ← compressed, 71% saved
Turn 10  ██████████░░░░░░░░░░░░░░░░░░░░   28%   9,500 tokens    —
Turn 11  ████░░░░░░░░░░░░░░░░░░░░░░░░░░   10%   3,200 tokens  ← compressed, 66% saved
Turn 20  ████████████░░░░░░░░░░░░░░░░░░   34%  11,600 tokens    ← never exceeds 40%

Quick start

import { ContextPrune } from '@grapine.ai/contextprune';

const cp = new ContextPrune({ model: 'claude-sonnet-4-5' });

const result = await cp.compress(messages);
// result.messages is a drop-in replacement for messages
// result.summary.tokensSaved — tokens recovered
// result.summary.savingsPercent — e.g. 0.47 = 47% saved

One line changes in your existing code:

// Before
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-5',
  messages,          // ← growing unbounded
  max_tokens: 8096,
});

// After
const { messages: lean } = await cp.compress(messages);
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-5',
  messages: lean,    // ← compressed
  max_tokens: 8096,
});

Installation

npm install @grapine.ai/contextprune

Requires Node 18+. No mandatory peer dependencies — tiktoken is used for token counting when available, otherwise falls back to a character estimate.


CLI

No code required. Run directly with npx — no install needed.

analyze — understand what's in your context

npx @grapine.ai/contextprune analyze ./session.json
npx @grapine.ai/contextprune analyze ./session.jsonl   # Claude Code session transcripts too
─── ContextPrune Analysis ──────────────────────────────────────────────────
Model: claude-sonnet-4-5  |  Capacity: 200,000 tokens

  ████████████████░░░░░░░░░░░░░░  56%  used  ·  112,266 / 200,000 tokens

  [SUGGESTED] Context is 56% full. Compression available but not urgent.
  Projected savings: 48,100 tokens (43%)  →  64,166 tokens after

Classification Breakdown:
  Outdated Tool Result    82 msgs   53,099 tokens  ████████████░  47%
  Chat / Filler           54 msgs   24,446 tokens  ████████░░░░░  22%
  Tool Result (active)    86 msgs   23,528 tokens  ████████░░░░░  21%
  Final Answer             1 msgs   11,406 tokens  ████░░░░░░░░░  10%

Compression Strategies:
  Keep                 141 msgs   64,166 tokens
  Remove                69 msgs   37,814 tokens  ← will be dropped
  Trim to Key Output     8 msgs    8,320 tokens  ← key output preserved
  Collapse to 1 Line     1 msgs    1,966 tokens  ← collapsed to marker

Top Token Consumers:
  #32  Final Answer             11,406 tokens   Preserved    no opportunity
  #55  Outdated Tool Result      6,801 tokens   Remove       high opportunity
  #48  Outdated Tool Result      4,992 tokens   Remove       high opportunity
  #61  Tool Result (active)      4,210 tokens   Trim         medium opportunity
# Also print a session brief — a compact handoff prompt for starting a new session
npx @grapine.ai/contextprune analyze ./session.jsonl --brief

compress — compress a messages file

npx @grapine.ai/contextprune compress ./session.json -o compressed.json
✔ Compressed  112,266 → 64,166 tokens  (43% saved, 48,100 tokens recovered)

Decisions:
  Removed   69 messages  (Outdated Tool Result, Chat/Filler)
  Trimmed    8 messages  (Tool Result — key output preserved)
  Collapsed  1 message   (Reasoning chain → 1-line marker)
  Kept     141 messages  (constraints, active errors, final answers)

Output is a standard JSON messages array — drop it straight into an API call:

const messages = JSON.parse(fs.readFileSync('compressed.json', 'utf-8'));
await anthropic.messages.create({ model: 'claude-sonnet-4-5', messages, max_tokens: 8096 });

watch — live dashboard in your browser

npx @grapine.ai/contextprune watch

Discovers all Claude Code sessions in ~/.claude/projects/ and opens an interactive picker:

  Select a Claude project to monitor:

  › labs/contextprune  #b6c62a11  just now  ● active
    labs/my-app        #a1d3f920  2h ago
    work/api-service   #cc8801ab  1d ago

  ↑↓ to navigate · Enter to select · Ctrl+C to cancel

Opens a browser tab and starts live monitoring. The dashboard updates every time the session file changes.

# Or point directly at a file
npx @grapine.ai/contextprune watch --follow ~/.claude/projects/my-project/session.jsonl

# Use a different port
npx @grapine.ai/contextprune watch --port 8080

Dashboard

A live browser dashboard that monitors your Claude Code sessions in real time. No configuration — run npx @grapine.ai/contextprune watch and it opens automatically.

Healthy Context Dashboard

Healthy Context Dashboard

Context Compression Recommendation Dashboard

Context Compression Recommendation Dashboard

What the dashboard shows:

Context Window — utilization bar with colour-coded status (green → yellow → red). Switches to Compression Suggested / Compress Now badges as context fills up.

Session Cost — cost per API call with input/output/cache breakdown, grouped by calendar day with proportional bars.

Classification Breakdown — how your context is distributed across message types (Outdated Tool Result, Active Tool Result, Chat/Filler, Final Answer, etc.) with token counts and percentages.

Compression Strategies — what contextprune would do right now: Keep / Remove / Trim / Collapse counts.

Compression Projection — before/after utilization bars showing exactly how much would be recovered if you compressed now. Hidden when context is healthy.

Top Consumers — the largest individual messages ranked by token count, with their classification and compression opportunity.

Session Brief — auto-generated handoff prompt that appears at 65%+ utilization. One click copies a compact context summary you can paste into a new session to continue without losing state.

Desktop notifications — opt-in alerts at 65% utilization, then every 5% increment until you compress.

Push data from your own process (no file watching needed):

npx @grapine.ai/contextprune watch &

curl -X POST http://localhost:4242/analyze \
  -H 'Content-Type: application/json' \
  -d '{ "messages": [...], "model": "gpt-4o" }'

Works with any provider — Anthropic, OpenAI, OpenRouter, Groq, or any messages array you construct yourself.


Three ways to use it

1. compress(messages) — explicit, you decide when

const result = await cp.compress(messages);

console.log(result.summary.tokensSaved);       // 48100
console.log(result.summary.savingsPercent);    // 0.43
console.log(result.messages.length);           // fewer messages

Compresses unconditionally every time you call it. Use this when you explicitly decide compression is warranted — after a tool-heavy phase, on every N turns, or as part of a LangGraph compress node.

2. watch(client) — automatic, zero changes to call sites

// Wrap once at startup
const watched = cp.watch(anthropic);

// Use exactly as before — compression fires automatically when context > 65%
const response = await watched.messages.create({
  model: 'claude-sonnet-4-5',
  messages,
  max_tokens: 8096,
});

Works with Anthropic, OpenAI, and any OpenAI-compatible provider:

// OpenRouter
const client = new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: '...' });
const watched = cp.watch(client);
await watched.chat.completions.create({ model: 'meta-llama/llama-3.3-70b-instruct', messages });

// Groq
const watched = cp.watch(new Groq());
await watched.chat.completions.create({ model: 'llama3-70b-8192', messages });

3. analyze(messages) — read-only inspection

const analysis = await cp.analyze(messages);

analysis.recommendation.urgency             // 'none' | 'suggested' | 'recommended' | 'critical'
analysis.recommendation.projectedSavings    // tokens that would be saved
analysis.sessionState.tokenBudget.utilizationPercent  // 0.56
analysis.sessionBrief                       // markdown handoff prompt for context continuation

Never compresses — use this to build dashboards, gate on urgency, or log opportunities.


LangGraph

In a LangGraph agent, state["messages"] accumulates every tool result and intermediate step across all graph iterations. By call 20, a typical coding agent has 30–50k tokens of stale tool outputs.

Wrap the client — zero changes inside the graph:

import { ContextPrune } from '@grapine.ai/contextprune';
import Anthropic from '@anthropic-ai/sdk';

const client = new ContextPrune({ model: 'claude-sonnet-4-5' }).watch(new Anthropic());

// Every node compresses automatically, only when context > 65%
function callModel(state: MessagesState) {
  return client.messages.create({         // ← unchanged
    model: 'claude-sonnet-4-5',
    messages: state.messages,
    max_tokens: 8096,
  });
}

Or add a dedicated compress node:

const cp = new ContextPrune({ model: 'claude-sonnet-4-5' });

async function compressNode(state: MessagesState) {
  const result = await cp.compress(state.messages);
  return { messages: result.messages };
}

builder
  .addNode('compress', compressNode)
  .addEdge('tools', 'compress')   // compress after every tool cycle
  .addEdge('compress', 'agent');

When it helps (and when it doesn't)

The core prerequisite: there must be a growing messages[] array that gets passed to an LLM repeatedly.

✓ It helps: single-agent accumulating loops

// ReAct / tool-calling loop — context grows with every iteration
const messages: LLMMessage[] = [{ role: 'system', content: systemPrompt }];

while (!done) {
  const response = await llm.invoke(messages);
  messages.push({ role: 'assistant', content: response.content });
  const toolResult = await runTool(response);
  messages.push({ role: 'user', content: toolResult });

  // ← contextprune here: stale tool results removed before next call
  const { messages: lean } = await cp.compress(messages);
  messages.splice(0, messages.length, ...lean);
}

By call 30, a typical agent has accumulated file reads, bash outputs, error traces, and intermediate reasoning that will never be referenced again. Every call pays for all of it. contextprune removes it.

✗ It doesn't help: parallel stateless fan-out

// Each agent call is 2–3 messages built fresh, discarded after
const [strategy, calendar, copy] = await Promise.all([
  orchestrator.invoke([{ role: 'user', content: strategyPrompt }]),
  strategist.invoke([{ role: 'user', content: calendarPrompt }]),
  copywriter.invoke([{ role: 'user', content: copyPrompt }]),
]);

Each call is constructed fresh and discarded. There is no accumulating history. Nothing to prune.

The diagnostic question:

After N agent calls, is there a single messages[] array that is longer than it was at call 1?

If yes — contextprune helps. If no — each call starts fresh, and contextprune has no leverage point.


Compression modes

| Mode | When compression runs | Default for | |------|----------------------|-------------| | manual | Always, unconditionally | compress() | | auto | Only when utilization ≥ warningThreshold | watch() | | suggest-only | Never — analysis only | analyze() |

const cp = new ContextPrune({
  model: 'claude-sonnet-4-5',
  options: {
    warningThreshold:  0.65,   // start compressing at 65% full (default)
    criticalThreshold: 0.80,   // compress aggressively at 80% (default)
    compressionMode:   'auto', // only compress when needed
  }
});

What gets compressed

| Message type | Strategy | Why | |---|---|---| | Outdated Tool Result | Remove | Not referenced in subsequent turns | | Fixed Error | Remove | Stack trace no longer needed | | Chain of Thought | Collapse to 1 line | Conclusion already in context | | Status Update | Collapse to 1 line | Acknowledged, no longer active | | Tool Result (active) | Trim to key output | Keep answer, drop verbose body | | Chat / Filler | Remove | Low relevance to current task |

Always preserved: system prompts, user corrections, active errors, session goals, final answers.

The classifier assigns one of 11 types to each message. Classification confidence gates compression aggressiveness — if the classifier is uncertain, the message is always preserved.


Supported providers and models

Token budgets are pre-configured for:

| Provider | Models | |---|---| | Anthropic | Claude 4.x, Claude 3.x (all variants) | | OpenAI | GPT-4o, GPT-4.1, GPT-4-turbo, GPT-3.5, o1, o3 series | | Google | Gemini 2.5 Pro/Flash, Gemini 2.0, Gemini 1.5 | | Meta | Llama 3.3 / 3.1 (70B, 8B) | | Mistral | Mistral Large/Medium/Small, Mixtral, Codestral | | DeepSeek | DeepSeek Chat, DeepSeek Reasoner | | Cohere | Command R, Command R+ | | OpenRouter | All provider/model prefixed names | | Groq | Llama3, Mixtral, Gemma hosted models |

Any unrecognized model string falls back to a 128k token budget.