npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@mukundakatta/agentfit

v0.1.1

Published

Fit your messages into the LLM context window. Token-aware truncation with multiple strategies (drop-oldest, drop-middle, priority), pluggable tokenizers, zero dependencies.

Readme

agentfit

Fit your messages into the LLM context window. Token-aware truncation with three strategies (drop-oldest, drop-middle, priority), per-model estimators, pluggable tokenizers (so you can wrap tiktoken if you need exact counts). Zero runtime dependencies.

npm install @mukundakatta/agentfit
import { fit, count } from '@mukundakatta/agentfit';

// Estimate tokens
count('hello world', { model: 'claude-sonnet-4-6' }); // → 4

// Fit a long chat history into your model's budget
const result = fit(longHistory, {
  maxTokens: 100_000,
  model: 'claude-sonnet-4-6',
  preserveLastN: 5,        // never drop the last 5 turns
  preserveSystem: true,    // never drop the system prompt (default true)
  strategy: 'drop-oldest', // drop the oldest user/assistant pairs first
});

// result = { messages, dropped, tokens: { before, after, budget }, fit: true }
await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: result.messages,
});

If the budget can't be reached even after dropping all non-protected messages, OverBudgetError is thrown by default with the partial result attached. Pass onOverBudget: 'return-partial' if you'd rather inspect and decide.

TypeScript types ship in the box.

See it in action

git clone https://github.com/MukundaKatta/agentfit && cd agentfit
node examples/demo-trim.js

Same input run through all three strategies side by side, so you can see what each preserves.

Why

Every long-context agent eventually hits this:

  • The chat history grows past the context window
  • A retrieved doc is too long for the budget you have left
  • You want to swap models and the new one has half the context window
  • You want to drop low-value chatter but keep the important facts

Hand-rolled truncation is everywhere in agent codebases — and it's almost always just truncate the front N messages, which silently drops your system prompt or the user's most recent question. agentfit is the small, focused primitive that does this right: protect what matters, drop what doesn't, give you a structured result with token counts before and after.

API

count(input, opts?) → number

Estimate tokens in a string or chat-message array.

count('hello world');                                  // 3 (default chars/4)
count('hello world', { model: 'claude-sonnet-4-6' });   // 4 (claude estimator)
count(messages, { model: 'gpt-5' });                    // sums per-message + overhead

Built-in estimator families: openai, anthropic, google, llama, default. The model string is matched fuzzily ('gpt-5' → openai, 'claude-haiku-4-5' → anthropic, etc.).

For exact counts, plug in your own tokenizer:

import { encode } from 'gpt-tokenizer'; // or any other
count('hello world', { tokenizer: (s) => encode(s).length });

fit(messages, opts) → FitResult

Drop messages from the input array until the total is under maxTokens.

const result = fit(messages, {
  maxTokens: 50_000,
  model: 'claude-sonnet-4-6',
  preserveSystem: true,    // default true
  preserveFirstN: 0,       // default 0
  preserveLastN: 0,        // default 0
  strategy: 'drop-oldest', // 'drop-oldest' | 'drop-middle' | 'priority'
  onOverBudget: 'throw',   // 'throw' (default) | 'return-partial'
  tokenizer: undefined,    // optional override
  overhead: undefined,     // optional override
});

// result = {
//   messages: Message[],   // the messages that survived
//   dropped: Message[],    // the ones removed
//   tokens: { before, after, budget },
//   fit: true | false,
// }

Strategies:

  • drop-oldest (default): the oldest non-protected message gets dropped first. Best for chat histories where recency matters.
  • drop-middle: keep the head and tail; drop from the middle outward. Best when both the early system context and the recent turns matter, but the middle is filler.
  • priority: drop messages with the lowest priority field first (default 0; protected messages effectively have priority +∞). Best for tagged content where you've manually marked importance.

Protection precedence: preserveSystempreserveFirstNpreserveLastN. A message protected by any of these is never dropped.

OverBudgetError

Thrown by fit() when the budget can't be reached even after dropping all non-protected messages. Carries the partial result.

import { OverBudgetError } from '@mukundakatta/agentfit';

try {
  fit(messages, { maxTokens: 100, preserveSystem: true });
} catch (err) {
  if (err instanceof OverBudgetError) {
    console.error(`still over budget: ${err.tokens.after}/${err.tokens.budget}`);
    console.error(`dropped ${err.dropped.length} messages, kept ${err.messages.length}`);
    // err.messages is the partial result if you want to use it anyway
  }
}

estimators

The built-in estimator table, exported in case you want to compose:

import { estimators } from '@mukundakatta/agentfit';
estimators.anthropic('hello world'); // 4

Recipes

Chat agent with a Sonnet budget

const fitted = fit(history, {
  maxTokens: 180_000, // leave room for the response in claude-sonnet-4-6's 200k
  model: 'claude-sonnet-4-6',
  preserveLastN: 6,
});
const r = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 4096,
  messages: fitted.messages,
});

RAG with priority-tagged retrieved chunks

const messages = [
  { role: 'system', content: SYSTEM_PROMPT },
  ...retrieved.map((chunk, i) => ({
    role: 'user',
    content: `[doc ${i}]\n${chunk.text}`,
    priority: chunk.score, // higher score = harder to drop
  })),
  { role: 'user', content: userQuestion },
];

const fitted = fit(messages, {
  maxTokens: 100_000,
  model: 'gpt-5',
  preserveLastN: 1,        // always keep the user's question
  strategy: 'priority',
});

Exact counts via tiktoken

import { encode } from 'gpt-tokenizer';

const exactCounter = (s) => encode(s).length;

count(text, { tokenizer: exactCounter });
fit(messages, { maxTokens: 50_000, tokenizer: exactCounter });

CLI

@mukundakatta/agentfit ships an agentfit binary for one-liners and CI use:

# Token count for a literal string or for a JSON message array
npx -p @mukundakatta/agentfit agentfit count "hello world" --model claude-sonnet-4-6
# → {"tokens":4,"model":"claude-sonnet-4-6"}

# Fit a chat history under a budget; reads stdin with '-'
cat history.json | npx -p @mukundakatta/agentfit agentfit fit - \
  --max-tokens 100000 --model claude-sonnet-4-6 --preserve-last-n 5 --pretty

Pass - to read from stdin or any file path to read from disk. Output is JSON to stdout (use --pretty for indented). Exit code is 0 on success, 1 when the budget can't be reached, 2 on usage errors. Run agentfit --help for the full subcommand reference.

What this is not

  • Not a tokenizer. It estimates fast and pluggably. For exact counts, wrap your favourite tokenizer.
  • Not a summarizer. It drops; it doesn't summarize. Pair with an LLM call if you want compaction-by-summarization.
  • Not a context manager. No retrieval, no chunking, no embeddings. For RAG, see MukundaKatta/context-forge; use this to fit the output of your RAG pipeline into the budget.

Sibling libraries

Part of the agent reliability stack — all @mukundakatta/* scoped, all zero-dep:

Natural pipeline: fit → guard → snap → vet → cast.

Status

v0.1.0 — initial release. Core API stable. TypeScript types included. 27/27 tests, CI on Node 20/22/24.

v0.2 plans (post-real-world-feedback):

  • Built-in tiktoken adapter as an optional separate package
  • Per-tool token attribution (so you can blame which RAG chunk used the budget)
  • Streaming-aware truncation (trim while streaming retrieved docs in)

License

MIT