@arvoretech/pi-smart-context

v0.2.0

Published

2 hours ago

Intelligent model routing and prompt compression for Pi/Kiro

0High
0Medium
0Low

@arvoretech/pi-smart-context

Intelligent model routing and retrieval-augmented prompt compression for Pi/Kiro.

Features

Model Routing

Uses Haiku (fast, cheap) to classify task complexity based on the full conversation context — not just the current message. So "bora" after a complex architecture discussion correctly routes to Opus.

| Classification | Model | When | |---|---|---| | trivial | claude-haiku-4-5 | Greetings, meta-conversation, no pending task | | simple | claude-sonnet-4-6 | Single-file fixes, quick questions | | medium | claude-sonnet-4-6 | Standard multi-file work (deterministic baseline) | | complex | claude-opus-4-8 | Architecture, large refactors, security audits | | Large context (>500K) | claude-sonnet-4-6 | 1M window needed |

Retrieval-Augmented Compression

The core principle (from the prompt-compression literature): the model never loses access to information — it just pays less to carry it by default.

Compressed/dropped content is replaced by a summary + a recover_context("id") hint. The original is kept in an in-memory store. If the model actually needs the detail, it calls the recover_context tool to pull back the full text. This lets us compress aggressively with no quality loss.

Pipeline

| Stage | Technique | Safety | |---|---|---| | Tool output (structural) | Log folding, n-gram dedup, JSON tabularize, cross-turn delta | Lossless / near-lossless | | BM25 relevance | Score old messages vs current query | — | | Haiku summarization | Summarize old messages preserving load-bearing facts, cached by hash | Lossy but recoverable | | Retrieval drop | Replace low-relevance content with stub + recover hint | Recoverable |

Cache-aware (critical)

Anthropic/Kiro use prompt caching keyed by prefix. Compression that rewrites the context differently each turn would break the cache and increase cost.

Two protections:

Runtime cache detection — the extension inspects the last assistant message's cacheRead/cacheWrite. If the provider is actively caching, lossy compression of the prefix is disabled (only safe structural compression of new tool output runs). No cache break, ever.
Stable/monotonic compression — when cache is off, once a message is compressed the identical compressed form is reused on every subsequent turn, so even the one-time prefix change rebuilds and stays stable.

Note: at the time of writing, the Kiro provider reports cacheRead: 0 / cacheWrite: 0 across sessions — caching is effectively off, so compression is pure savings (the full context is re-billed every turn with no cache to break). The cache-detection path future-proofs the extension for when Kiro enables caching.

Aggressive quality gate

Last 4 turns never compressed (active working set)
Compression only applied if it saves >15%
Haiku summary only used if it beats the original by >15%; otherwise falls back to a recoverable stub

Commands

/smart-context — Stats: chars saved, avg ratio, Haiku calls/cache hits, recoverable items

Architecture

src/
├── index.ts                      # Hooks + recover_context tool
├── router.ts                     # Haiku-based complexity classification
└── compression/
    ├── pipeline.ts               # Orchestrates stages, cache-stable, retrieval-augmented
    ├── store.ts                  # Content store for recover_context
    ├── haiku-summarize.ts        # Haiku summarizer with hash cache
    ├── types.ts
    └── stages/
        ├── bm25.ts               # BM25 relevance scoring
        ├── dedup.ts              # N-gram line deduplication
        ├── log-fold.ts           # Log error extraction + folding
        ├── json-compact.ts       # JSON array tabularization
        └── delta.ts              # Cross-turn delta compression

Usage

cd arvore-pi-extensions && pnpm install
cd packages/smart-context && pnpm build

Add to your Pi packages. The extension hooks into before_agent_start (routing), context (compression), and tool_result (structural tool-output compression).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@arvoretech/pi-smart-context

Features

Model Routing

Retrieval-Augmented Compression

Pipeline

Cache-aware (critical)

Aggressive quality gate

Commands

Architecture

Usage