pi-ultra-compact

v0.9.1

Published

5 days ago

Advanced compaction extension and skill for Pi with automatic threshold-based compaction and critical context preservation

0High
0Medium
0Low

realvendex

pi-package compaction compression ultra-compact context-management token-savings

pi-ultra-compact

Advanced compaction extension and skill for Pi with automatic threshold-based compaction and support for 70+ models across 15+ providers.

Features

/ultracompact command for manual compaction
Auto-adapts threshold to model's context window (60-80% of max)
70+ models supported - OpenAI, Anthropic, Google, DeepSeek, Meta, Mistral, Qwen, Kimi, MiniMax, GLM, xAI, NVIDIA, Xiaomi, and more
Smart model ID normalization - automatically strips provider prefixes (opencode/, openai/, etc.), suffixes (-free, -latest, -preview), and date stamps
Runtime context window - uses Pi's authoritative context window when available, falls back to detection
Graduated Eviction (4 levels) — strips reasoning, bulk outputs, artifacts, then messages
Generational Compaction — micro (fast, no LLM) at 60-90%, full at 90%+
Preemptive Trigger — fires before next turn, never pays latency during user turns
Cache-Aware Compaction — immutable summary blocks keep prompt cache warm
Circuit Breaker — 3 strikes → lossy truncation fallback, session never dies
Hierarchical summarization with entropy-based information extraction
Critical context preservation - goals, decisions, errors, file paths
Extension + Skill - works as both a Pi extension and a skill
Smart model switching - remembers per-model thresholds and preserves custom settings
Conversation structure detection - identifies turns, phases, and progress
Multi-pass summarization — progressive compression with quality scoring
LLM-based summarization — optional AI-powered compression (useLLM config)
Content-aware token counting — dynamic ratios for code, prose, and whitespace
Compact section templates — shorter headers, condensed formatting, saves 10-15% more tokens

Installation

pi install npm:pi-ultra-compact

Quick Start

After installation and restarting Pi, use:

/ultracompact

This triggers manual ultra-compact compaction.

Auto-compaction triggers automatically when context exceeds 80% of your model's context window.

Supported Models

| Provider | Models | Context Window | |----------|--------|----------------| | OpenAI | GPT-5/5.1/5.2/5.4/5.5, GPT-4.1, GPT-4o, O1/O3/O4 | 128K - 1.1M tokens | | Anthropic | Claude Opus/Sonnet/Haiku 4.x, 3.x | 200K - 1M tokens | | Google | Gemini 3.5/3.1/2.5/2.0/1.5, Gemma 3/2 | 8K - 2M tokens | | DeepSeek | V4 Pro/Flash, V3, V2.5, R1 | 64K - 1M tokens | | Meta | Llama 4 Maverick/Scout, 3.3, 3.1 | 128K - 1M tokens | | Mistral | Medium 3.5, Large 3, Codestral | 128K - 256K tokens | | Qwen | Qwen 3.6/3.5/3, Qwen 2.5 Coder | 131K - 262K tokens | | Kimi/Moonshot | Kimi K2.6/K2.5/K2, Moonshot V1 | 128K - 262K tokens | | MiniMax | M2.7, M2.5 | 204K tokens | | GLM (Zhipu) | GLM 5.1/5/4-Plus | 128K - 204K tokens | | xAI | Grok Build, Grok 3/2 | 131K - 256K tokens | | NVIDIA | Nemotron 3 Super, Nemotron 4 | 128K - 204K tokens | | Xiaomi | MiMo V2.5 Pro/V2.5 | 1M tokens | | OpenCode | big-pickle | 200K tokens |

Provider-prefixed model IDs are handled automatically (e.g. opencode/deepseek-v4-flash-free → deepseek-v4-flash).

How It Works

Three-Tier System

Preemptive check (every turn): Projects next turn's token usage. If projected > 60% of context, triggers micro-compaction.
Micro-compaction (60-90% usage): Strips reasoning blocks + bulk tool outputs. No LLM call. Runs in microseconds.
Full compaction (90%+ usage): Graduated eviction preconditions the input, then structured summarization produces the final compacted context.

Eviction Levels

| Level | What it strips | When | |-------|---------------|------| | 1 | Assistant thinking/reasoning blocks | Always (harmless removal) | | 2 | Bulk tool outputs (>100 lines, >5K chars) | Most sessions | | 3 | All non-error tool results | Heavy sessions | | 4 | Oldest non-protected messages | Only when necessary |

Safety Systems

Snapshot-rollback: Messages are deep-copied before compaction. If anything fails, the original is preserved.
Circuit breaker: After 3 consecutive failures, falls back to lossy truncation (keep system + last 10 turns).
User messages inviolable: Never stripped regardless of token pressure.
Cache-aware mode: Previous summaries stay immutable — only new content pays prefill cost.

Configuration

Default settings work out of the box. The extension auto-detects your model and sets appropriate thresholds.

Default Settings

| Setting | Default | Description | |---------|---------|-------------| | thresholdTokens | Auto (60-80% of context) | When to trigger compaction | | keepPercentage | 30% | Percentage of context to keep | | maxKeepTokens | 30,000 | Maximum tokens to keep | | autoCompact | true | Enable automatic compaction | | cacheAware | false | Immutable summary blocks (saves API costs) | | maxEvictionLevel | FULL_REMOVAL | Max eviction aggressiveness | | outputHeadroom | 4,096 | Tokens reserved for LLM response | | circuitBreakerMaxFailures | 3 | Failures before lossy truncation | | preemptiveWatermark | 0.70 | Preemptive trigger level | | hardWatermark | 0.95 | Reactive fallback level |

Commands

| Command | Description | |---------|-------------| | /ultracompact | Trigger manual ultra-compact compaction |

Model Examples

# Works with any model - threshold auto-adapts
# Claude Sonnet 4: 160,000 tokens (80% of 200K)
# GPT-5.4 Pro: 880,000 tokens (80% of 1.1M)
# Gemini 2.5 Pro: 800,000 tokens (80% of 1M)
# DeepSeek V4 Flash: 160,000 tokens (80% of 200K)
# Kimi K2.5: 209,680 tokens (80% of 262K)

# Provider-prefixed IDs work too:
# opencode/deepseek-v4-flash-free → DeepSeek, 200K
# opencode-go/kimi-k2.5 → Moonshot, 262K

Compatibility

Works with any Pi-compatible model
Compatible with gentle-engram (Engram memory backup)
Compatible with gentle-pi (SDD/OpenSpec)
No conflicts with Pi's default compaction

Changelog

See CHANGELOG.md for full version history.

v0.9.0 - Model Detection Overhaul

Smart model ID normalization — strips provider prefixes, suffixes (-free, -latest), and date stamps
70+ models — added Qwen, Kimi, MiniMax, GLM, Grok, Nemotron, MiMo, OpenCode models
20+ model families — expanded from 6 to detect alibaba, moonshot, minimax, zhipu, xai, nvidia, xiaomi, opencode
Runtime context window — uses Pi's authoritative ctx.model.contextWindow when available
195 tests, 100% pass rate — 17 new detection tests

v0.8.0 - Generational Compaction + Safety Systems

Graduated Eviction — 4-level content stripping (reasoning → bulk → artifacts → full)
Generational Compaction — micro (60-90%, no LLM) + full (90%+) tiers
Preemptive Trigger — fires at 70% watermark by projecting next turn
Cache-Aware Mode — immutable summary blocks preserve prompt cache
Snapshot-Rollback + Circuit Breaker — session never dies from bad compaction
66 tests, 100% pass rate — zero regressions

v0.7.0 - Compact Templates & LLM Summarization

Compact section templates - shorter headers save 10-15% tokens across all conversations
LLM-based summarization - optional LLM-powered semantic compression
Content-aware token estimation - dynamic ratios for code/prose/whitespace
66 tests, 100% pass rate - including 13 new effectiveness benchmarks
generateSummary is now async - supports LLM callback integration

v0.6.0 - Algorithm Enhancement Release

Major improvements to compaction quality and performance:

Smart model switching - per-model threshold memory, preserves custom settings
Conversation structure detection - identifies turns, phases, progress
Enhanced critical extraction - progress indicators, questions, user preferences
Multi-pass summarization - 3-pass compression with quality scoring
Token estimation cache - LRU cache for 3x faster performance
100% test pass rate - 43 unit tests + 17 performance benchmarks

v0.5.0 - Audit & Stability Release

This release fixes 18 issues found via comprehensive 5-agent audit:

3 Critical regex bugs fixed - \b word boundaries on all patterns, no more false matches
Startup model detection fixed - correct threshold from boot
Custom thresholds preserved - across model switches
Null safety - guards on all message-consuming methods
53-test Jest suite - comprehensive coverage
Dead code removed - 329-line .disabled file deleted, unused typebox dep removed

Troubleshooting

Extension not loading

Restart Pi after installation
Check pi install npm:pi-ultra-compact completed successfully

Wrong threshold detected

The extension auto-detects your model from Pi config
Ensure your model is in the supported list (70+ models)
Run /ultracompact manually to see detected model and threshold in the logs

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Pi - The AI coding agent

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-ultra-compact

Features

Installation

Quick Start

Supported Models

How It Works

Three-Tier System

Eviction Levels

Safety Systems

Configuration

Default Settings

Commands

Model Examples

Compatibility

Changelog

v0.9.0 - Model Detection Overhaul

v0.8.0 - Generational Compaction + Safety Systems

v0.7.0 - Compact Templates & LLM Summarization

v0.6.0 - Algorithm Enhancement Release

v0.5.0 - Audit & Stability Release

Troubleshooting

Extension not loading

Wrong threshold detected

Contributing

License

Acknowledgments