npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-context-cap

v0.1.2

Published

Cap model context windows so pi's built-in auto-compaction fires earlier. Zero-config 200k default for long-context Claude models; configurable for anything else.

Downloads

395

Readme

pi-context-cap

npm version license

A tiny pi extension that caps model contextWindow values so pi's built-in auto-compaction triggers earlier than the model's native limit. Zero-config defaults for 1M-window Claude models; fully configurable for anything else.

What it does

Pi's auto-compaction trigger is:

contextTokens > contextWindow - reserveTokens

For a Claude model with a native 1,000,000-token window and the default reserveTokens = 16384, that means compaction doesn't fire until ~983,616 tokens — which is probably not what you want for day-to-day use. Sessions that actually approach 1M are slow per turn, carry a lot of noise the model has to attend to, and cost a lot each time they round-trip.

This extension caps contextWindow in pi's in-memory model registry at session start, so compaction fires at a user-chosen ceiling (default 200,000) instead. Everything else in pi's compaction machinery — the summarizer model, the prompt, the recovery flow, /compact, session_before_compact hooks — is unchanged.

On Opus 4.7 or Sonnet 4.6 you'll see:

Context: 182,411 / 200,000 (91%)

…and compaction kicks in at the normal time, as if you were on a natively-200k model.

Install

# From npm (recommended)
pi install npm:pi-context-cap

# Or directly from git
pi install git:github.com/AlexWootton/pi-context-cap

# Or local clone for development
git clone https://github.com/AlexWootton/pi-context-cap
pi install ./pi-context-cap

Default behavior: any model whose id contains "anthropic" or "claude" and whose native contextWindow > 200_000 is capped at exactly 200_000. All other models are left alone.

Why you might want this

  • Shorter working memory per turn. Every turn pays for every token currently in context. Capping at 200k instead of 1M means each turn is billed against a smaller working set, and pi summarizes older history rather than carrying it at full fidelity.
  • Honest /context meter. A meter that fills toward 1M tells you very little; a meter that fills toward the ceiling you chose actually tells you when compaction is coming.
  • Predictable pacing. You picked the ceiling, so you know the upper bound on what a full-context turn costs. No being surprised by a 900k-token turn because you forgot how large the window was.
  • No server-side equivalent for "Opus 4.7 capped at 200k." Anthropic's API doesn't expose a wire-level "serve this model in 200k mode" toggle — the model identifier determines the mode. If you want to stay on 4.7/4.6 but use less of its window, this extension does that client-side.

What this is not

  • Not a pricing-tier change. Current 1M-context Claude models (Opus 4.6, Opus 4.7, Sonnet 4.6) are billed at standard rates across the full window. Capping doesn't move you off any tier.
  • Not a serving-mode switch. There is no wire-level negotiation that routes a capped request to a different serving path. The model identifier determines the mode; a client-side cap only shrinks what you send.
  • Not a latency guarantee. Any speed benefit is strictly downstream of sending fewer tokens per turn.

If you want a same-family model that is natively 200k (different serving characteristics, not just a smaller client-side window), look at the 4.5 generation: claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5. That's a model-selection choice, orthogonal to this extension.

Configure

Drop a JSON file at either path:

| Location | Scope | |---|---| | ~/.pi/agent/extensions/context-cap.json | Global | | <project>/.pi/extensions/context-cap.json | Project (overrides global) |

Schema

{
  "cap": 200000,                               // Target contextWindow for affected models.
  "appliesOver": 200000,                       // Only cap models whose native window exceeds this.
  "matchPatterns": ["anthropic", "claude"],    // id-substring match (case-insensitive). Use "*" to match all.
  "models": {                                  // Per-model-id overrides. Always win over pattern matching.
    "claude-opus-4-7": 180000
  }
}

All keys are optional. Values shown are the defaults.

Examples

More conservative buffer below 200k:

{ "cap": 180000 }

Extend the default Anthropic cap to also cap Gemini at 500k:

{
  "cap": 200000,
  "matchPatterns": ["anthropic", "claude"],
  "models": {
    "google/gemini-2-5-pro": 500000,
    "google/gemini-2-5-flash": 500000
  }
}

Only cap a specific model, leave everything else alone:

{
  "matchPatterns": [],
  "models": {
    "us.anthropic.claude-opus-4-7": 200000
  }
}

Apply the same cap to every model in the registry (aggressive):

{
  "cap": 150000,
  "appliesOver": 150000,
  "matchPatterns": ["*"]
}

Model IDs match model.id exactly; run pi --list-models to see them. Unknown IDs in models are silently ignored.

Other use cases

The mechanism is general:

  • Per-model tuning — different models summarise context differently. Set "claude-opus-4-7": 200000 and "claude-sonnet-4-6": 150000 if you want more headroom on one than the other.
  • Long-window non-Anthropic models — a Gemini or Grok model advertising a 1M/2M window can be capped to something you actually want to pay for per turn.
  • Testing and dev — force compaction at a predictable point without burning through real tokens.

All of these are one-file config changes.

What it does and doesn't do

Does:

  • Cap contextWindow on matching models so pi's built-in auto-compaction fires at the cap point.
  • Emit a capped N model(s) notification once on session start.
  • Work with all of pi's compaction machinery (including session_before_compact hooks, manual /compact, and compaction error recovery) without modification.
  • Apply project config on top of global config.

Does not:

  • Replace or duplicate pi's compaction logic.
  • Touch token billing, API requests, or the messages array.
  • Cap any model if matchPatterns is empty and models has no entries (you've told it to do nothing).
  • Prevent a single turn from crossing the cap if that turn's new content exceeds the reserve buffer — see Caveats.

Caveats

Pi's compaction trigger checks the previous assistant's reported input-token usage. So if one turn adds more than reserveTokens (default ~16k tokens) of fresh content — say, three large file reads plus a long bash dump — the next request may be sent with more input tokens than the cap despite this extension being active.

For typical conversational coding, this is rare. For stricter guarantees:

  • Set cap below your actual ceiling (e.g. 180000 to stay well under 200k).
  • Or bump compaction.reserveTokens in ~/.pi/agent/settings.json (affects all models, not just the capped ones).

See also

  • pi-custom-compaction — swaps pi's compaction model, template, and trigger point. Its trigger.maxTokens option overlaps with this extension's core function. Choose pi-custom-compaction if you also want to swap the summarizer model or get per-project compaction-policy control; choose pi-context-cap if you only want per-model trigger caps with zero-config defaults and /context that honestly reflects your working ceiling.
  • pi-model-aware-compaction — per-model percent-based compaction thresholds using a different mechanism (inflating reported token counts to trigger pi's compaction). Good when you think in percentages; this extension is better when you think in absolute tokens.
  • pi-budget-guard — tracks dollar spend per session and blocks tool calls at a $ threshold. Complementary (dollars ≠ tokens); safe to run alongside.

How it works

Pi's ModelRegistry.getAll() returns a live array of Model objects. The extension mutates model.contextWindow on each matching entry at session_start before any LLM request is built. Pi's shouldCompact() reads this value directly:

export function shouldCompact(contextTokens, contextWindow, settings) {
  if (!settings.enabled) return false;
  return contextTokens > contextWindow - settings.reserveTokens;
}

So the cap flows through to every existing compaction code path automatically. The extension itself is under 50 lines of logic.

A note on extension load order

Extensions are loaded in this order:

  1. Installed packages (from settings.json's packages array)
  2. Ad-hoc extensions passed via --extension / -e

Each extension's session_start handler fires in the same order. If you combine this extension with another loaded via -e that reads contextWindow in its own session_start handler, the other extension may see the pre-cap value. Mitigations:

  • Read contextWindow in before_agent_start or later — by then the cap is applied.
  • Or install both extensions as packages (order within packages is settings-file order).
  • Or pass this one first when using -e: pi -e path/to/context-cap.ts -e path/to/other.ts.

For typical single-extension usage this is a non-issue.

Uninstall

pi remove npm:pi-context-cap

Fully reversible. Pi's ModelRegistry is rebuilt on each launch from pi-ai's canonical model list, so removing the extension restores every affected model's native window on the next startup.

License

MIT. See LICENSE.