pi-model-aware-compaction
v0.1.5
Published
Per-model context-usage thresholds for Pi's built-in auto-compaction, so models with different context windows and performance profiles compact at the right time
Downloads
272
Maintainers
Readme
Model-Aware Compaction for Pi (pi-model-aware-compaction)
Per-model context-usage thresholds for Pi's compaction pipeline, because different models have different context windows and different performance profiles near their context window limits.
This extension nudges Pi's native compaction pipeline at configurable percent-used thresholds, preserving the full built-in UX (loader, queued-message flush, and whichever compaction summary implementation ultimately handles session_before_compact).
Install
From npm:
pi install npm:pi-model-aware-compactionFrom the dot314 git bundle (filtered install):
{
"packages": [
{
"source": "git:github.com/w-winter/dot314",
"extensions": ["extensions/model-aware-compaction/index.ts"],
"skills": [],
"themes": [],
"prompts": []
}
]
}Requirements
Pi auto-compaction must be enabled in ~/.pi/agent/settings.json:
{ "compaction": { "enabled": true } }Compatible with compaction-summary extensions that hook session_before_compact, since it triggers Pi's normal compaction pipeline rather than calling ctx.compact() directly. Said differently, this package decides when compaction starts; stock Pi or your summary extension decides what summary gets written.
Configuration
Copy config.json.example to config.json in the extension's directory and edit:
{
"global": 70,
"models": {
"claude-opus-4-6": 85,
"gpt-5.2*": 75
}
}| Key | Purpose |
|-----|---------|
| global | Default threshold (percent used) for models without a specific override |
| models | Per-model overrides keyed by model ID; supports * wildcards |
Compaction triggers when used% >= threshold.
Tuning reserveTokens
Pi's own auto-compaction triggers when usedTokens > contextWindow - reserveTokens. If that fires before your model-aware threshold, Pi compacts first. To let model-aware thresholds take priority, lower reserveTokens:
{
"compaction": {
"enabled": true,
"reserveTokens": 9000,
"keepRecentTokens": 15000
}
}How it works
After each agent run, the extension checks context usage against the model-specific threshold. When exceeded, it inflates the last assistant message's usage.totalTokens past the context window size, causing Pi's _checkCompaction() to fire its normal pipeline. The inflated value is ephemeral — compaction rebuilds messages from the session file.
That normal pipeline still prepares compaction the usual way, then either stock Pi or any installed session_before_compact override produces the actual summary entry.
This approach preserves the full native compaction UX (loader, summary, queued-message flush) that would be lost by calling ctx.compact() directly.
