pi-sift

v0.4.1

Published

17 hours ago

Model-scored compression of large tool results for Pi Coding Agent

0High
0Medium
0Low

eengad

pi-package pi pi-coding-agent extension context summarization

pi-sift

A Pi Coding Agent extension that prevents large and unnecessary tool results from polluting the context. The model scores large results for relevance and replaces low-value content with concise summaries, optionally preserving critical line ranges verbatim (keepLines).

How it works

When a tool result exceeds a size threshold, pi-sift injects a scoring instruction as a separate user message asking the model to decide: keep or summarize.
On summarize, the model can specify keepLines — line ranges to preserve verbatim while compressing the rest.
Before each API call, the context hook replaces scored content with the summary + kept lines.
Heuristic dismiss auto-removes stale reads when files are re-read or edited, but preserves summarize+keepLines decisions.

Install

pi install pi-sift

Or from source:

pi install https://github.com/eengad/pi-sift

Benchmark

An A/B benchmark script is included for evaluating pi-sift on SWE-ReBench tasks. See A/B benchmark below for usage. Early results with Claude Opus 4.6 show token reductions of 17–59% on tasks where the model makes scoring decisions, though single-run variance is high and more data is needed.

Local development

npm install
npm run build
npm test

A/B benchmark

Run baseline vs extension on SWE-ReBench tasks with Docker verification:

npm run benchmark:swe-pipeline-ab

Override defaults with env vars:

PI_BENCH_TASKS=0,1,2 \
PI_BENCH_CONFIGS=extension \
PI_BENCH_KEEP_WORKDIR=1 \
npm run benchmark:swe-pipeline-ab

Analyse session logs after a run:

npm run analyse-session -- /tmp/tmp.XXX/task_0/extension_run1/sessions/*.jsonl

Model compatibility

Claude Opus 4.6 — works well. The model follows scoring instructions reliably and uses keepLines effectively.
OpenAI Codex 5.3 (xhigh thinking) — partially works. The model sees the scoring instruction (confirmed via debug logging) but only follows it ~33% of the time, skipping scoring and emitting tool calls instead. When it does follow, it produces valid summarize decisions. Tasks still resolve but with higher token usage than Opus.

Known issues

Streaming flash of `<context_lens>` blocks

During streaming, <context_lens> blocks are briefly visible in the TUI before message_end strips them. Fixing in message_update is unsafe — the pi agent may rebuild message content from the stream buffer on each update (undoing mutations), and stripping before message_end would remove blocks before decision parsing runs. Cosmetic only; disappears when streaming completes.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-sift

How it works

Install

Benchmark

Local development

A/B benchmark

Model compatibility

Known issues

Streaming flash of <context_lens> blocks

Links

Streaming flash of `<context_lens>` blocks