pi-model-aware-compaction

v0.1.5

Published

2 months ago

Per-model context-usage thresholds for Pi's built-in auto-compaction, so models with different context windows and performance profiles compact at the right time

0High
0Medium
0Low

w-winter

pi-package pi pi-coding-agent compaction context-window

Model-Aware Compaction for Pi (`pi-model-aware-compaction`)

Per-model context-usage thresholds for Pi's compaction pipeline, because different models have different context windows and different performance profiles near their context window limits.

This extension nudges Pi's native compaction pipeline at configurable percent-used thresholds, preserving the full built-in UX (loader, queued-message flush, and whichever compaction summary implementation ultimately handles session_before_compact).

Install

From npm:

pi install npm:pi-model-aware-compaction

From the dot314 git bundle (filtered install):

{
  "packages": [
    {
      "source": "git:github.com/w-winter/dot314",
      "extensions": ["extensions/model-aware-compaction/index.ts"],
      "skills": [],
      "themes": [],
      "prompts": []
    }
  ]
}

Requirements

Pi auto-compaction must be enabled in ~/.pi/agent/settings.json:

{ "compaction": { "enabled": true } }

Compatible with compaction-summary extensions that hook session_before_compact, since it triggers Pi's normal compaction pipeline rather than calling ctx.compact() directly. Said differently, this package decides when compaction starts; stock Pi or your summary extension decides what summary gets written.

Configuration

Copy config.json.example to config.json in the extension's directory and edit:

{
  "global": 70,
  "models": {
    "claude-opus-4-6": 85,
    "gpt-5.2*": 75
  }
}

| Key | Purpose | |-----|---------| | global | Default threshold (percent used) for models without a specific override | | models | Per-model overrides keyed by model ID; supports * wildcards |

Compaction triggers when used% >= threshold.

Tuning `reserveTokens`

Pi's own auto-compaction triggers when usedTokens > contextWindow - reserveTokens. If that fires before your model-aware threshold, Pi compacts first. To let model-aware thresholds take priority, lower reserveTokens:

{
  "compaction": {
    "enabled": true,
    "reserveTokens": 9000,
    "keepRecentTokens": 15000
  }
}

How it works

After each agent run, the extension checks context usage against the model-specific threshold. When exceeded, it inflates the last assistant message's usage.totalTokens past the context window size, causing Pi's _checkCompaction() to fire its normal pipeline. The inflated value is ephemeral — compaction rebuilds messages from the session file.

That normal pipeline still prepares compaction the usual way, then either stock Pi or any installed session_before_compact override produces the actual summary entry.

This approach preserves the full native compaction UX (loader, summary, queued-message flush) that would be lost by calling ctx.compact() directly.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Model-Aware Compaction for Pi (pi-model-aware-compaction)