npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

snug-ai

v0.1.2

Published

Fit the right context into your LLM's window

Downloads

286

Readme

snug

Intelligent context window packing for LLMs.

npm install snug-ai

snug decides what goes into your LLM's context window, where it's placed, and what gets cut — so you don't have to.

import { ContextOptimizer } from 'snug-ai';

const optimizer = new ContextOptimizer({
  model: 'claude-sonnet-4-20250514',
  contextWindow: 200_000,
  reserveOutput: 8_192,
});

optimizer.add('system', systemPrompt, { priority: 'required', position: 'beginning' });
optimizer.add('tools', toolDefinitions, { priority: 'high' });
optimizer.add('history', messages, { priority: 'high', keepLast: 2, dropStrategy: 'oldest', position: 'end', groupBy: 'turn' });
optimizer.add('memory', memoryResults, { priority: 'medium' });
optimizer.add('rag', ragChunks, { priority: 'medium' });

const result = optimizer.pack('Update the auth middleware to use JWT');

result.items     // Ordered context, ready to send
result.stats     // Token counts, cost, per-source breakdown
result.warnings  // Actionable alerts
result.dropped   // What was excluded and why

Zero dependencies. Works with any provider.


The problem

Here's what happens when you naively pack an LLM context — system prompt, 12 tools, 30-message history, and 15 RAG chunks into a 1,600 token budget:

                              NAIVE              SNUG
─────────────────────────── ──────────────────── ──────────────────
Recent history preserved?    No (truncated)       Yes (atomic turns)
High-relevance RAG included  0/3                  3/3
Items in attention dead zone 9                    0 high-value
Tool count                   12                   12
Placement strategy           Sequential           Edges-first (U-curve)
Drop strategy                Cut at end           Score-based

The naive approach fills the window top-to-bottom and cuts when full. The most recent messages — the ones the model needs most — are the first to go. High-relevance RAG chunks never make it in. History messages get buried in the middle of the context where the model barely attends to them.

This isn't a theoretical problem. Research quantifies it:

  • Lost in the Middle (Liu et al., TACL 2024) — LLMs follow a U-shaped attention curve. The middle of context is effectively ignored. Performance degrades 30%+ based purely on position.
  • Context Distraction (Gemini 2.5 tech report) — Beyond ~100K tokens, models over-focus on context and neglect their training knowledge.
  • Tool Overload (Berkeley Function-Calling Leaderboard) — Every model performs worse with more tools. Llama 3.1 8b failed with 46 tools, succeeded with 19.
  • Context Clash (Microsoft/Salesforce) — Multi-turn context caused a 39% average performance drop. o3 went from 98.1 to 64.1.

snug fixes this by scoring every item, packing by priority, and placing high-value content at the edges of the context window where attention is strongest.

Run bun examples/10-before-after.ts to see the full comparison, or bun examples/11-visualize-context.ts to see the U-shaped attention map for your context.


How it works

Every pack() call runs five stages:

Measure — Count tokens per item. Built-in heuristic or bring your own tokenizer.

Score — Rank items by priority tier, recency, and optional custom scoring.

Pack — Greedy knapsack. Required items always go in. Remaining budget fills by score. Everything that doesn't fit is tracked with reasons.

Place — Rearrange based on the U-shaped attention curve. High-value items land at the edges where attention is strongest. Low-value items go in the middle:

Attention
100%|█                                  █      ← system prompt, recent history, query
    |████                            ████
    |████████                    ████████
    |██████████████      ████████████████
 30%|████████████████████████████████████      ← low-priority items here
    +------------------------------------
    START              MID              END

Report — Full breakdown of what happened:

result.stats.totalTokens    // 47,832
result.stats.budget          // 191,808
result.stats.utilization     // 0.249
result.stats.estimatedCost   // { input: '$0.1435', provider: 'anthropic' }
result.stats.breakdown       // per-source: tokens, items included, items dropped

Features

  • Priority tiersrequired > high > medium > low. Required items always make it in. Everything else competes for remaining budget.
  • Recency biasdropStrategy: 'oldest' decays old messages to 10% of their score. Recent conversation survives. Old context drops first.
  • Turn groupinggroupBy: 'turn' packs conversation history as atomic turns. No orphaned tool calls. No split assistant responses. keepLast counts turns, not messages.
  • Role preservationrole is extracted from input objects and carried through to output. Map directly to API messages without guessing.
  • Lost-in-the-middle placement — Pin items to beginning or end. Floating items are arranged edges-first by score.
  • Dependency constraintsrequires: { 'tools_search': 'examples_search_demo' } — if the example can't fit, the tool is removed instead of shipping without context.
  • Custom scoring — Plug in embedding similarity, BM25, or any scoring function.
  • Warnings — Detects budget overflows, lost-in-the-middle placement issues, tool overload (>10 tools), high drop rates, and low utilization.
  • Cost estimation — Per-call cost estimates when you provide pricing.
  • Custom tokenizer — Built-in heuristic is conservative (~10-15% over). Swap in tiktoken or any { count(text): number } for exact counts.

Packing conversation history

The thing most context managers get wrong: conversations aren't flat arrays. A user message, the assistant's response, its tool calls, and the tool results are one logical unit. Dropping the tool result but keeping the tool call breaks the conversation.

optimizer.add('history', [
  { role: 'user', content: 'Search for the auth bug' },
  { role: 'assistant', content: 'Let me search...' },
  { role: 'assistant', content: '[tool_use: search]' },
  { role: 'tool', content: '[result: found in session.ts]' },
  { role: 'assistant', content: 'Found it in session.ts' },
  { role: 'user', content: 'Fix it' },
  { role: 'assistant', content: 'Done. Here is the patch...' },
], {
  priority: 'high',
  keepLast: 1,           // last turn is always included
  dropStrategy: 'oldest',
  position: 'end',
  groupBy: 'turn',       // pack as atomic turns
});

This produces two turns. Turn 0 has 5 messages (user through final assistant). Turn 1 has 2 messages. They're packed and dropped as units. keepLast: 1 means the last turn is required, not the last message.

To reconstruct API messages from turns:

for (const item of result.items.filter(i => i.source === 'history')) {
  for (const msg of item.value as any[]) {
    apiMessages.push({ role: msg.role, content: msg.content });
  }
}

Sending to your LLM

Anthropic

const result = optimizer.pack('Refactor the auth module');

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  system: result.items.filter(i => i.source === 'system').map(i => i.content).join('\n'),
  messages: result.items
    .filter(i => i.source !== 'system')
    .map(i => ({
      role: i.role === 'assistant' ? 'assistant' as const : 'user' as const,
      content: i.content,
    })),
});

OpenAI

const result = optimizer.pack('Refactor the auth module');

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: result.items.map(i => ({
    role: i.role ?? (i.source === 'system' ? 'system' as const : 'user' as const),
    content: i.content,
  })),
});

snug is provider-agnostic. Use source, role, placement, and value on each item to build whatever format your provider expects.


API reference

new ContextOptimizer(config)

| Option | Type | Default | Description | |--------|------|---------|-------------| | model | string | required | Model identifier (used for cost estimation) | | contextWindow | number | required | Total context window size in tokens | | reserveOutput | number | 4096 | Tokens reserved for model output | | tokenizer | { count(text: string): number } | built-in | Custom tokenizer | | pricing | { inputPer1M: number } | — | Enable cost estimation |

optimizer.add(source, content, options)

Register a context source. Arrays become independently-scored items. Objects are JSON-stringified. Re-adding the same source name replaces it.

| Option | Type | Description | |--------|------|-------------| | priority | 'required' \| 'high' \| 'medium' \| 'low' | Priority tier. Required items are always included. | | position | 'beginning' \| 'end' | Pin to start or end. Unpinned items float. | | keepLast | number | Promote last N items (or turns) to required. | | dropStrategy | 'relevance' \| 'oldest' \| 'none' | How to handle items that don't fit. | | groupBy | 'turn' | Group into conversation turns. Packed/dropped atomically. | | scorer | (item, query) => number | Custom scoring function. | | requires | Record<string, string> | Dependency constraints between items. |

optimizer.pack(query?)

Returns { items, stats, warnings, dropped }. Query is included as a required item at the end.

optimizer.remove(source) / optimizer.clear()

Remove one source or all sources.


License

MIT