npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@monotykamary/pi-vcc

v0.6.0

Published

Algorithmic conversation compactor for pi - transcript-preserving structured summaries, no LLM calls

Readme

🗜️ pi-vcc

Algorithmic conversation compactor for pi

No LLM calls — 35-99% token reduction via extraction and formatting. Same input = same output, always.

pi extension license


Inspired by VCC (View-oriented Conversation Compiler).

Demo

pi-vcc demo

Why pi-vcc

| | Pi default | pi-vcc | |---|---|---| | Method | LLM-generated summary | Algorithmic extraction, no LLM | | Determinism | Non-deterministic, can hallucinate | Same input = same output, always | | Token reduction | Varies | 35-99% on real sessions (higher on longer sessions) | | Compaction latency | Waits for LLM call | 30-470ms, no API calls | | History after compaction | Gone — agent only sees summary | Active lineage searchable via vcc_recall (scope:"all" available) | | Repeated compactions | Each rewrite risks losing more | Sections merge and accumulate | | Cost | Burns tokens on summarization call | Zero — no API calls | | Structure | Free-form prose | Brief transcript + 7 semantic sections + priority tags + metadata footer | | Code awareness | None (summarizes text only) | Symbol-annotated files, type catalog, deep error extraction |

Real session metrics

Measured on real session JSONLs under ~/.pi/agent/sessions (chars = rendered message text).

| Session | Messages | Before | After | Reduction | Time | |---|---|---|---|---|---| | Session A | 2,943 | 997,162 | 7,959 | 99.2% | 64ms | | Session B | 1,703 | 428,334 | 7,762 | 98.2% | 29ms | | Session C | 1,657 | 424,183 | 9,577 | 97.7% | 54ms | | Session D | 1,004 | 2,258,477 | 4,439 | 99.8% | 30ms | | Session E | 486 | 295,006 | 11,163 | 96.2% | 30ms | | Session F | 46 | 5,234 | 3,364 | 35.7% | 5ms | | Session G | 27 | 8,595 | 2,489 | 71.0% | 2ms |

Compaction Deep Dive

pi-vcc is one of four compaction approaches in the AI coding-agent ecosystem. Here is how they compare.

Pi Default Harness

Based in @earendil-works/pi-coding-agent/dist/core/compaction/compaction.js

Architecture: LLM-based structured summarization via a summarization model.

Flow:

  1. shouldCompact() — checks if contextTokens > contextWindow - reserveTokens (16k)
  2. prepareCompaction() — walks branch entries, finds previous compaction boundary, calculates cut point by walking newest→oldest accumulating estimated message sizes until hitting keepRecentTokens (20k default)
  3. compact()generateSummary() — serializes conversation to plain text (not LLM messages, to prevent the model from continuing it), calls LLM with structured summarization prompt
  4. Two prompt variants: initial SUMMARIZATION_PROMPT (first time) or UPDATE_SUMMARIZATION_PROMPT (merges into existing summary)
  5. Output format: ## Goal / ## Constraints & Preferences / ## Progress / ## Key Decisions / ## Next Steps / ## Critical Context
  6. Detects mid-turn splits — when the cut falls mid-turn, generates a separate turn prefix summary in parallel and merges both
  7. Tracks file operations (read/write/edit from tool calls) and appends <read-files> / <modified-files> XML tags to each summary

Key characteristics:

  • Pure LLM — every compaction costs a model call
  • Token-budget backwalk keeps a configurable tail (20k recent tokens)
  • Turn-aware: isSplitTurn preserves incomplete assistant turns
  • Previous-summary merging via update prompt (incremental)
  • Non-deterministic — different runs produce different summaries

Claude Code

Based in claude-code/src/services/compact/

Architecture: Three-tier compaction — proactive/manual (LLM), session memory (LLM-free), and micro-compaction (cache-editing).

Flow (Main Compaction — compactConversation()) :

  1. shouldAutoCompact()getAutoCompactThreshold() = context window minus reserved output minus buffer (13k)
  2. PreCompact hooks execute (SDK extensions can inject custom instructions)
  3. getCompactPrompt() builds a prompt with a NO_TOOLS_PREAMBLE, a detailed 9-section template, and a trailer rejecting tool calls
  4. streamCompactSummary() first tries a cache-sharing fork path (piggybacks on the main thread's prompt-cache prefix with a forked agent), then falls back to a direct streaming path with only FileReadTool + ToolSearchTool
  5. Strips images/documents from messages before sending to the compact API (replaces with [image] / [document] markers)
  6. PTL (Prompt Too Long) retry: truncateHeadForPTLRetry() drops oldest API-round groups and retries (up to 3)
  7. After summary generation: creates post-compact file attachments (re-reads recently accessed files), plan attachments, skill attachments, delta tool announcements
  8. Executes SessionStart hooks and PostCompact hooks
  9. Returns CompactionResult { boundaryMarker, summaryMessages, attachments, hookResults, messagesToKeep }

Flow (Session Memory Compact — trySessionMemoryCompaction()) :

  1. Feature-gated: tengu_session_memory + tengu_sm_compact flags
  2. Waits for in-progress session memory extraction to finish
  3. calculateMessagesToKeepIndex() starts from lastSummarizedMessageId, expands backwards to meet minTokens (10k) and minTextBlockMessages (5), capped at maxTokens (40k)
  4. adjustIndexToPreserveAPIInvariants() ensures tool_use/tool_result pairs are not split (handles streaming message fragmentation)
  5. No LLM call — uses already-extracted session memory content as the summary
  6. Truncates oversized sections via truncateSessionMemoryForCompact()
  7. Falls back to legacy compact if session memory is empty or boundary can't be found

Flow (Micro Compact — microcompactMessages()) :

  1. Time-based trigger: if the gap since the last main-loop assistant message exceeds the threshold (cold server cache), content-clear old tool results to shrink what gets rewritten
  2. Cached microcompact (experimental, CACHED_MICROCOMPACT feature): tracks tool results per message, queues cache_edits blocks for the API layer — removes tool results from the server-side cached prompt without mutating local messages and without invalidating the cached prefix
  3. Legacy microcompact (content-clear) fully replaced by the cache-editing approach

Key characteristics:

  • Three compaction tiers: full LLM / session memory (LLM-free) / micro (cache-edit only)
  • Cache-aware: cache-sharing fork path, cache-editing microcompact, PTL retry
  • Heavy hook system: 3 hook sets (PreCompact → SessionStart → PostCompact)
  • File restoration: re-attaches recently read files post-compact
  • Circuit breaker: 3 consecutive failures stops retrying
  • Partial compact: supports up_to (summarize before, keep prefix) / from (summarize after, keep suffix) directions
  • Analytics: tengu_compact events with full token breakdowns, analyzeContext() walks every content block

Codex (OpenAI)

Based in codex/codex-rs/core/src/compact.rs, compact_remote.rs, compact_remote_v2.rs

Architecture: Rust-based, three concurrent compaction paths — inline (local LLM), remote (server-side), and remote v2 (streaming).

Flow:

  1. Decision: should_use_remote_compact_task() checks whether the provider supports remote compaction
  2. Three parallel implementations:

Inline (local) Path (compact.rs):

  1. Pre-hooks → LLM call with a compact prompt → Post-hooks
  2. Uses ContextCompactionItem — a first-class protocol item embedded in conversation history (not a hack)
  3. COMPACT_USER_MESSAGE_MAX_TOKENS = 20k token cap
  4. InitialContextInjection controls when system context is re-injected:
    • DoNotInject — for pre-turn/manual compaction (next regular turn handles reinjection)
    • BeforeLastUserMessage — for mid-turn compaction (injects above the last real user message)
  5. Summarization prompt (from templates/compact/prompt.md):
    • "Context checkpoint compaction" handoff summary
    • Key sections: progress, decisions, constraints, remaining work
  6. Summary prefix (templates/compact/summary_prefix.md): "Another language model started to solve this problem..."
  7. trim_function_call_history_to_fit_context_window() — truncates oversized call histories before compact
  8. Event-driven: emits TurnStarted, stream events, TurnCompleted
  9. Backoff retry via codex_util::backoff

Remote Path (compact_remote.rs):

  1. Delegates compaction to the codex-backend server via the Responses API Compact endpoint
  2. Server-side compaction uses OpenAI's own compact infrastructure
  3. Client sends history, server returns a CompactedItem
  4. process_compacted_history() replaces conversation items with the compacted version
  5. Same hook system (PreCompact → PostCompact) and analytics tracking
  6. Logs request/response data via build_compact_request_log_data()

Remote v2 Path (compact_remote_v2.rs):

  1. Uses Responses API streaming compact — same endpoint as v1 but leverages the existing ModelClientSession for streaming
  2. Feature-gated: Feature::RemoteCompactionV2 (under development, disabled by default)
  3. Reuses process_compacted_history() and trim_function_call_history_to_fit_context_window()
  4. Rollout-trace aware: CompactionCheckpointTracePayload for end-to-end observability

Key characteristics:

  • Three parallel compaction implementations: inline / remote / remote-v2
  • Server-side compaction can delegate to OpenAI's backend (token savings on the client)
  • Rust async with cancellation tokens throughout
  • ContextCompactionItem is a first-class protocol type, not a synthetic message
  • Fine-grained InitialContextInjection control over system context reinjection
  • Event-driven architecture: full turn lifecycle for compaction (start → stream → complete → error)
  • CompactionAnalyticsAttempt tracks every phase, status, and implementation

Comparison Summary

| Aspect | Pi Default | pi-vcc | Claude Code | Codex | |--------|-----------|--------|-------------|-------| | Language | TypeScript (compiled) | TypeScript (extension) | TypeScript (source) | Rust | | LLM dependency | Always required | None | Optional (session memory bypass) | Always (inline) / server-offloaded | | Cut strategy | Token-budget backwalk (20k recent) | Keep last user message | Min tokens (10k) + min text messages (5) | Context window trim | | Summary format | Markdown structured sections ## Goal etc. | Bracket-tagged sections [Session Goal] + [Anchors] + [Earlier Turns] | <analysis> scratchpad + 9-section <summary> | Markdown handoff | | Merge with prev | Update prompt (LLM merges) | Header-by-header deterministic dedup | Via session memory (LLM-free) or prompt | Replaces (no merge) | | File tracking | <read-files> / <modified-files> XML tags | [Files And Changes] with symbol annotations | Post-compact file re-attachment (re-reads recent files) | Via server (server-managed) | | Turn splitting | Yes (isSplitTurn with parallel prefix summary) | Task-boundary-aware (pushes back on mid-flight turns) | Via preservedSegment metadata | Via InitialContextInjection | | Cache awareness | None | Section ordering (stable first for prompt cache) | Cache-sharing fork path, cache-editing microcompact, PTL retry | Server-side cache (remote path) | | Hook system | 2 hooks (session_before_compact, session_compact) | 5 hooks (session_before_compact, session_compact, agent_end, model_select, session_start) | 3 hooks (PreCompact, SessionStart, PostCompact) | 2 hooks (PreCompact, PostCompact) | | Micro compaction | None | None | Yes (cache-editing + time-based content clear) | None | | Partial compact | None | None | Yes (up_to / from directions) | None | | Error handling | Basic | Orphan recovery, resolution detection ([RESOLVED] tag) | PTL retry (3x), circuit breaker (3 failures) | Backoff retry | | Token estimation | chars/4 heuristic | chars/4 heuristic | roughTokenCountEstimation + 4/3 padding | approx_token_count | | Determinism | Non-deterministic (LLM) | Deterministic (no LLM) | Non-deterministic (LLM) / deterministic (SM) | Non-deterministic (LLM) / deterministic (server) | | Latency | LLM call time | 2–64ms | LLM call time (or instant with SM/micro) | LLM call time (or server-offloaded) | | Cost | Per-compact LLM tokens | Zero | Per-compact LLM tokens or zero (SM/micro) | Per-compact LLM tokens or server-side | | Debugging | Basic | /tmp/pi-vcc-debug.json snapshots | logForDebugging, analytics events | Rollout trace, compaction analytics |

Features

  • No LLM — purely algorithmic, zero extra API cost
  • Brief transcript — chronological conversation flow, each tool call collapsed to a one-liner with (#N) refs, text truncated to keep it compact
  • 8 semantic sections — session goal, files & changes, type catalog, commits, outstanding context, earlier turns, anchors, user preferences
  • Bounded merge — rolling sections re-capped after merge instead of growing unbounded
  • Lossless recallvcc_recall reads raw session JSONL, so active-lineage history stays searchable across compactions
  • Scoped recall — default search is active lineage; use scope:"all" for all lineages, or scope:"compaction:N" / scope:"compaction:latest" to search within a specific compaction segment's original messages
  • Priority error tags — outstanding context items tagged [ERROR], [WARN], [INFO], [RESOLVED] for urgency at a glance
  • Metadata footer — each compaction summary ends with timestamp, compression ratio, and message range
  • Cache-friendly ordering — stable sections (goal, preferences, files, commits, anchors) come first; volatile sections (outstanding context, earlier turns, current status) come last, maximizing prompt-cacheable prefix across compactions
  • Adaptive recall view — search results grouped by conversation segments (turns) with match indicators (>) and context preservation, so the agent sees the conversational structure around each match
  • Regex searchvcc_recall supports regex patterns (hook|inject, fail.*build) and OR-ranked multi-word queries
  • Result ranking — search results ranked by BM25 term relevance, rare terms weighted higher than common ones
  • /pi-vcc-recall — slash command to search history directly, results shown as collapsible message and auto-fed to agent as context
  • Fallback cut — still works when Pi core returns nothing to summarize
  • /pi-vcc — manual compaction on demand
  • Multi-resolution transcript — three-zone brief: [Earlier Turns] (one-liner per conversational turn, heaviest compression), brief transcript (tool calls collapsed, medium compression), and the kept tail (uncompressed). Eliminates the information cliff where older turns vanish entirely.
  • Error resolution detection — tsc errors in [Outstanding Context] are tagged [RESOLVED] when the file they reference was subsequently edited, letting the model skip stale errors.
  • Task-boundary-aware cut — compaction splits at complete conversational turns, not mid-tool-call. If the assistant's response is in-flight (unmatched tool calls), the cut pushes back to keep the whole turn in the tail.
  • Structured anchors[Anchors] section lists commit hashes, error IDs, and key file paths for zero-tool-call recall. The model can find references at a glance instead of calling vcc_recall.
  • Per-model and global compaction thresholds — configure different reserveTokens or compactPercent per model and globally, so models with different context windows compact at the right time. Proactive triggering on agent_end and model_select events compacts earlier for small-context models. Applies to both pi-vcc and pi-core compaction.

Install

pi install npm:@monotykamary/pi-vcc

Or install from GitHub:

pi install https://github.com/monotykamary/pi-vcc@tom

Or try without installing:

pi -e https://github.com/monotykamary/pi-vcc@tom

Usage

Once installed, pi-vcc registers a session_before_compact hook.

  • Run /pi-vcc to trigger pi-vcc compaction manually.
  • By default, pi-vcc handles all compaction paths (/compact, auto-threshold, /pi-vcc). Set overrideDefaultCompaction: false in the config to fall back to pi core's LLM-based compaction for /compact and auto-threshold.
  • To search older active-lineage history after compaction, use vcc_recall.
  • To intentionally search across all lineages, pass scope:"all" to vcc_recall or run /pi-vcc-recall <query> scope:all.
  • To search and feed results to agent yourself, run /pi-vcc-recall <query> [page:N].
    • Tip: type /recall and Pi will autocomplete to /pi-vcc-recall.

How compaction works

Pi splits the conversation at the last user message. Everything after — the kept tail — stays intact and untouched. pi-vcc only summarizes the older portion before that cut point.

Compacted message structure

[Session Goal]
- Fix the authentication bug in login flow
- [Scope change]
- Also update the session token refresh logic

[Files And Changes]
- Modified: src/auth/session.ts (refreshToken, verifyToken, Session)
- Read: src/types.ts (User, AuthPayload)
- Created: tests/auth-refresh.test.ts

[Type Catalog]
- src/auth/session.ts [modified]:
  export function refreshToken(token: string): Promise<Session>
  export function verifyToken(token: string): Promise<User>
  export interface Session {
- src/types.ts [read]:
  export interface User {
  export type AuthPayload = {

[Commits]
- a1b2c3d: fix(auth): refresh token after password reset

[Anchors]
- commits: a1b2c3d
- errors: TS2304
- files: src/auth/session.ts, src/types.ts, tests/auth-refresh.test.ts

[Outstanding Context]
- [RESOLVED] [tsc] src/session.ts(5,18): error TS2304: Cannot find name 'authenticateUser'
- [ERROR] [bash:exit 1] bun test tests/auth.test.ts → 3 tests failed
- [WARN]  [tests] FAIL auth.test.ts > refresh token should work
- [INFO]  [no matches] grep "verifyCredentials"

[Earlier Turns]
- Set up the project structure → read package.json, tsconfig.json
- Install auth dependencies → ran bun add, edited package.json
- Configure the test runner → edited bunfig.toml, ran bun test

[Current Status]
- Working on: fix the auth bug, users can't log in after password reset
- Last action: Edit "src/auth/session.ts"
- Next: need to add the refreshToken function signature

---

[user]
Fix the auth bug, users can't log in after password reset

[assistant]
Root cause is a missing token refresh after password reset...
* Read "src/auth/session.ts" (#3)
* Read "src/types.ts" (#5)
* Edit "src/auth/session.ts" (#7)
* bash "bun test tests/auth.test.ts" (#9)
...(28 earlier lines omitted)

---

---
Compaction at 2026-05-18T14:32:00Z — 47 msgs → 23k tok (12x) | tail: 3 msgs ~5.2k tok (range: [#0, #43])

Use `vcc_recall` to search for prior work, decisions, and context from before this summary.
Do not redo work already completed.

Sections appear only when relevant — a session with no git commits won't have [Commits].

Sections:

| Section | Description | |---|---| | [Session Goal] | Initial goal + scope changes (regex-based extraction) | | [Files And Changes] | Modified/created/read files from tool calls, annotated with exported symbol names (capped, paths trimmed to common root) | | [Type Catalog] | Exported signature lines from modified and read files — the public API surface the model needs for continuation | | [Commits] | Git commits made during the session (last 8, hash + first line) | | [Anchors] | Structured reference points — commit hashes, error IDs, key file paths — for zero-tool-call recall | | [Outstanding Context] | Unresolved items — error exit codes, test failures, tsc errors, empty search results, pending questions — tagged [ERROR]/[WARN]/[INFO]/[RESOLVED] by severity | | [Earlier Turns] | Per-turn one-liner summaries for every conversational turn — heaviest compression layer covering turns that would otherwise fall off the brief transcript | | [Current Status] | Current focus, last file-modifying action, and next steps — extracted from the conversation tail | | [User Preferences] | Regex-extracted from user messages (always, never, prefer...) | | Brief transcript | Chronological conversation flow — rolling window of ~120 recent lines, tool calls collapsed to one-liners with (#N) refs |

Merge policy:

  • Session Goal, User Preferences: concise sticky sections
  • Session Goal, User Preferences, Earlier Turns: sticky sections that accumulate across compactions (capped)
  • Outstanding Context, Type Catalog, Current Status, Anchors: volatile (replaced each compaction)
  • Files And Changes, Commits: unique union across compactions
  • Brief transcript: rolling window, older lines drop off

Deep error extraction

[Outstanding Context] goes beyond keyword matching. It captures:

| Signal | Format | Example | |---|---|---| | Bash non-zero exit code | [bash:exit N] | [bash:exit 1] npm test → 3 tests failed | | TypeScript compiler error | [tsc] | [tsc] src/auth.ts(12,5): error TS2322: Type 'string' is not... | | Test failure | [tests] | [tests] FAIL auth.test.ts > login should work | | Empty grep/glob | [no matches] | [no matches] Grep "verifyCredentials" | | Tool error result | [tool] | [bash] Command not found | | Blocker text | [user] or plain | [user] The build is still failing with... |

Items tagged [RESOLVED] when the file they reference was subsequently edited — the model can skip them:

- [RESOLVED] [tsc] src/auth.ts(5,18): error TS2304: Cannot find name 'authenticateUser'
- [ERROR] [bash:exit 1] bun test tests/api.test.ts → 2 tests failed

All items are deduplicated — the same error won't appear twice.

Symbol-level file annotations

[Files And Changes] annotates file paths with exported symbol names extracted from tool call arguments and results:

- Modified: src/auth.ts (login, verifyToken, Session)
- Read: src/types.ts (User, AuthPayload)

Supported languages: TypeScript/JavaScript (export function/class/type/interface), Python (def/class), Go (func, exported only), Rust (pub fn/struct/enum/trait).

Type catalog

[Type Catalog] captures the exact exported signature lines from modified and read files. This gives the compacted model the type signatures it needs to continue coding — without re-reading files.

Modified files appear first, read files second. Entries are capped at 8 signatures per file and 12 files total.

Recall (Lossless History)

Pi's default compaction discards old messages permanently. After compaction, the agent only sees the summary.

vcc_recall bypasses this by reading the raw session JSONL file directly. By default it searches only the active conversation lineage, regardless of how many compactions have happened. Use scope:"all" only when you intentionally want to include off-lineage branches.

Adaptive View (Structure-Preserving Search Results)

Search results are grouped by conversation segments (turns) instead of showing flat ranked entries. Each segment starts at a user or bash message and includes all subsequent assistant responses, tool calls, and tool results.

Matched entries are marked with >, non-matched entries within the same segment are shown for context:

vcc_recall({ query: "auth bug" })

Returns:

Found 4 matches for "auth bug" — 2 matches across 1 segment

--- #12-#17 (2/6 entries match) ---
> #12 [user] I found an auth bug in the login flow
  #13 [assistant] Let me check the auth module...
  #14 [tool_call] Read src/auth.ts
  #15 [tool_result] export function login...
> #16 [assistant] The bug is in refreshToken
  #17 [tool_result] Edit src/auth.ts (success)

When matches span multiple segments, adjacent non-matching turns are shown with a (context) tag:

Found 3 matches for "cache" — 2 matches across 2 segments

--- #5-#8 (1/4 entries match) ---
  #5 [user] add caching to the API layer
  #6 [assistant] I'll set up Redis...
  #7 [tool_call] Edit src/cache.ts
> #8 [tool_result] Redis connected successfully

--- #20-#23 (1/4 entries match) ---
> #20 [user] the cache eviction policy is wrong
  #21 [assistant] Let me check the TTL config...
  #22 [tool_call] Read src/cache.ts
  #23 [tool_result] export const TTL = 3600

--- #9-#19 (context) ---
  #9 [user] also fix the error handling
  #10 [assistant] Added try/catch around cache calls

This format preserves the conversational structure around matches, so the agent can understand where in the conversation flow each match occurred and what context surrounds it.

Search

Queries support regex and multi-word OR logic ranked by relevance:

vcc_recall({ query: "auth token" })                                    // active-lineage OR search, ranked
vcc_recall({ query: "auth token", page: 2 })                           // paginated (5 results/page)
vcc_recall({ query: "hook|inject" })                                    // regex pattern
vcc_recall({ query: "fail.*build" })                                    // regex pattern
vcc_recall({ query: "auth token", scope: "all" })                      // search all lineages
vcc_recall({ query: "race condition", scope: "compaction:2" })         // search within compaction #2's segment
vcc_recall({ query: "design rationale", scope: "compaction:latest" })  // search most recent compaction segment

Compaction-scoped search targets only the original messages that were summarized by that compaction cycle. This lets you drill into specific conversation segments without sifting through unrelated chat.

Manual slash command:

/pi-vcc-recall auth token scope:all
/pi-vcc-recall race condition scope:compaction:latest

Browse

Without a query, returns the last 25 entries as brief summaries:

vcc_recall()
vcc_recall({ scope: "all" })  // browse recent entries across all lineages

Expand

Returns full untruncated content for specific indices found via search:

vcc_recall({ expand: [41, 42] })                 // active-lineage expand
vcc_recall({ expand: [41, 42], scope: "all" })   // expand across all lineages

Typical workflow: search → find relevant entry indices → expand those indices for full content.

Some tool results are truncated by Pi core at save time. expand returns everything in the JSONL but can't recover what Pi already cut.

Performance

pi-vcc processes 3.7 MB sessions (2,600 messages, 3,000 blocks) in ~31 ms — no LLM calls, no I/O waits beyond reading the session JSONL. Below are the optimizations that got us there.

Pipeline profile (3.7 MB session)

| Stage | Time | % of total | |---|---|---| | normalize | 4 ms | 13% | | filterNoise | <1 ms | <1% | | buildToolResultIndex | <1 ms | <1% | | extractFileAndSymbolData | 23 ms | 74% | | Other extractors | <1 ms | <1% | | buildBriefSections | 1 ms | 4% | | formatSummary + merge | ~2 ms | 7% | | Total | ~31 ms | |

Optimizations

Catastrophic backtracking fix (C_FUNC_RE)

The C/C++ function-declaration regex used a repeated group (?:\w+(?:\s*[*&]+\s*)?)+ that triggered exponential backtracking on long non-matching identifiers (e.g. createAssistantMessageEventStream). A single pathological line took 1.2 s; a session with many such lines could stall compaction for seconds.

Replaced with a lazy-quantifier pattern \w[\w:*&\s]*? and a negative lookahead to skip Go func lines. The same line now takes <0.1 ms — a >1000× speedup. This was the root cause of the original "slow compaction" report on 170k-token sessions.

Unified symbol extraction (extractFileAndSymbolData)

Previously, three independent extractors (extractFiles, extractSymbolChanges, extractTypeCatalog) each scanned the same tool results with overlapping regex patterns — a triple-redundant parse. The unified extractFileAndSymbolData() in shared-symbols.ts does it once and feeds all three consumers from a single pass.

Also added ToolResultIndex and buildToolResultIndex() to pre-compute the tool_call → tool_result look-ahead map once, shared across all extractors instead of each scanning forward independently.

DECL_SCREEN_RE pre-filter

Each line was tested against a 15-regex cascade to find declaration names. ~60% of lines in a real session are body code, comments, or blank — none can match, yet every line ran all 15 tests.

DECL_SCREEN_RE is a single anchored regex that rejects non-declaration lines in one test. Matching lines then fall through to the full cascade. Measured at 2.6× faster for the parseDeclName stage.

eachLine() generator replaces split().slice()

extractSymbolsFromText used text.split("\n").slice(0, N) to read the first N lines — allocating a full temporary string array every call. Over 600+ tool results in a large session, this added up to ~18 ms of allocation overhead.

Replaced with an eachLine() generator using indexOf("\n") + slice() — zero intermediate array allocation. Produces identical iteration behavior.

Set-based dedup replaces Array.includes()

Symbol dedup used Array.includes() on value arrays that grew to 200+ entries per file — O(n) per check. A parallel Map<string, Set<string>> makes dedup O(1). Measured at 5.7× faster for dedup operations.

Intl.Segmenter → regex word split

brief.ts used Intl.Segmenter for token-aware truncation, which allocated granular objects per word. Replaced with \p{L}[\p{L}\p{N}]*|\p{N}+ regex — identical output, ~2× faster, zero object allocation.

convertToLlm() elimination

The before-compact hook called convertToLlm() to transform messages into an LLM message format before processing. Since pi-vcc processes messages algorithmically via normalize() (which already handles user, assistant, toolResult, and bashExecution directly), this conversion was both lossy (flattened bash command/output/exitCode into plain text) and wasteful. Removed entirely.

Missing read in FILE_READ_TOOLS

pi's built-in file-read tool uses the lowercase read tool name, but FILE_READ_TOOLS only contained Read. All read operations were invisible to file-activity and symbol extraction — a correctness fix, not strictly a performance fix, but it meant the symbol extractor was silently skipping data it should have processed.

Summary

| Optimization | Impact | |---|---| | C_FUNC_RE backtracking fix | 1.2 s → <0.1 ms per line (>1000×) | | Unified symbol extraction | 3× fewer redundant scans | | DECL_SCREEN_RE pre-filter | 2.6× faster parseDeclName | | eachLine() generator | ~18 ms saved on large sessions | | Set-based dedup | 5.7× faster symbol dedup | | Regex word split | 2× faster token truncation | | convertToLlm() removal | Eliminated redundant message conversion |

Pipeline

  1. Normalize — raw Pi messages → uniform blocks (user, assistant, tool_call, tool_result, thinking, bash)
  2. Filter noise — strip system messages, empty blocks, noise tools (TodoWrite, etc.)
  3. Build sections — extract goal, file paths + symbols, type catalog, blockers (exit codes, tsc, tests, empty grep), preferences
  4. Brief transcript — chronological conversation flow, tool calls collapsed to one-liners, text truncated
  5. Format — render into bracketed sections + transcript, with cache-friendly ordering (stable sections first, volatile last)
  6. Merge — if previous summary exists: sticky sections merge, volatile sections replace, transcript rolls
  7. Footer — append timestamp, compression ratio, message range, and recall note

Config

Config lives at ~/.pi/agent/pi-vcc-config.json (auto-scaffolded on first load with safe defaults):

{
  "overrideDefaultCompaction": true,
  "debug": false,
  "modelThresholds": {
    "neuralwatt/zai-org/GLM-5.1-FP8": { "reserveTokens": 32768 },
    "neuralwatt/moonshotai/Kimi-K2.6": { "compactPercent": 65 },
    "neuralwatt/neuralwatt/glm-5.1-long": { "compactPercent": 80 }
  },
  "globalThreshold": { "compactPercent": 70 }
}
  • overrideDefaultCompaction (default true): when true (default), pi-vcc handles all compaction paths (/compact, auto-threshold, /pi-vcc). Set false to let pi core handle /compact and auto-threshold compactions via its default LLM-based compaction.
  • debug (default false): when true, each compaction writes detailed info to /tmp/pi-vcc-debug.json — message counts, cut boundary, summary preview, sections.
  • modelThresholds (default: none): per-model compaction thresholds. Keys match against "provider/modelId" (e.g., "neuralwatt/zai-org/GLM-5.1-FP8") or just "modelId" (e.g., "GLM-5.1" — matched only when provider/modelId doesn't). Each value has:
    • reserveTokens: tokens to reserve for the LLM response. Overrides pi-core's global compaction.reserveTokens for matching models. Controls when compaction triggers: contextTokens > contextWindow − reserveTokens. A higher value compacts earlier (more conservative); a lower value lets context grow larger. Takes precedence over compactPercent when both are set.
    • compactPercent: compaction trigger as a percentage of context window (1–99). Compaction fires when contextTokens > contextWindow × compactPercent / 100. E.g. 65 means "compact when context is 65% full". Ignored when reserveTokens is also set.
    • keepRecentTokens (optional): advisory token budget for pi-core's default compaction. Pi-vcc's own buildOwnCut uses task-boundary heuristics, so this only affects pi-core's cut when overrideDefaultCompaction is false.
  • globalThreshold (default: none): global threshold applied to all models not matched by modelThresholds. Uses compactPercent or reserveTokens (compactPercent is easier — e.g. 65 means "compact at 65% full"). If omitted, pi-core's global compaction.reserveTokens applies (no override).
  • defaultThreshold (default: none, deprecated): use globalThreshold instead. Backward compatible — still works.

How compaction thresholds work

Pi-core's auto-compaction triggers when contextTokens > contextWindow − reserveTokens. The global reserveTokens (default 16384) is one-size-fits-all — but different models have very different context windows and cost profiles.

Pi-vcc's thresholds provide proactive compaction at both the per-model and global level:

| Direction | How it works | |---|---| | Compact earlier (model needs compaction sooner) | agent_end and model_select proactively trigger compaction when context exceeds the model's threshold but hasn't hit the global threshold yet. The globalThreshold also proactively triggers for unmatched models. |

Previously, a "compact later" direction was implemented by cancelling compaction in session_before_compact when context was below the per-model threshold. This guard was removed because session_before_compact carries no reason field — manual /compact and auto-compaction are indistinguishable (both have customInstructions: undefined), so the guard was blocking explicit user compaction requests.

The proactive trigger handles the "compact earlier" direction. If pi-core's global threshold fires before the per-model threshold is crossed, the compaction proceeds — slightly premature from the per-model threshold's perspective, but preferable to blocking an explicit user action.

Key matching order: exact "provider/modelId""modelId"globalThreshold → pi-core's global setting.

Explicit /pi-vcc commands bypass threshold checks — if you ask for compaction, you get it.

Related Work

  • VCC — the original transcript-preserving conversation compiler
  • Pi — the AI coding agent this extension is built for
  • DeepSeek-V4 — hybrid attention architecture that directly inspired pi-vcc's multi-resolution transcript, resolution detection, task-boundary cut, and anchors
  • Mastra — Observational Memory patterns that inspired Current Status, priority error tags, cache-friendly ordering, and compaction-scoped recall
  • Claude Code — three-tier compaction architecture (LLM / session memory / micro-compact) that influenced cache-friendly ordering and compaction-scoped recall design
  • Codex (OpenAI) — Rust-based three-path compaction that inspired the handoff preamble and first-class structured compaction output
  • VCC Paper — adaptive view concept that inspired structure-preserving search results and thinking content surfacing in recall

Inspirations & Attribution

This fork builds on the upstream sting8k/pi-vcc with novel features inspired by five external projects. Below is a comprehensive mapping of each inspiration source to the features it produced.

DeepSeek-V4 — Hybrid Attention Architecture

Inspired by DeepSeek-V4's CSA/HCA/SWA attention architecture, Lightning Indexer, Attention Sink, Quick Instruction, and contextual parallelism.

| DeepSeek-V4 Technique | pi-vcc Equivalent | Shared Principle | |---|---|---| | CSA (light compression, m=4) | [Files And Changes], [Type Catalog] | Medium-fidelity: keeps structure but drops full content | | HCA (heavy compression, m'=128) | [Earlier Turns] | Heaviest compression: one-liner per conversational turn | | Sliding Window Attention (n_win=128) | Brief transcript rolling window + [Current Status] | Uncompressed recent context for local fidelity | | Lightning Indexer (top-k sparse selection) | vcc_recall (BM25 + regex) | Selective, not exhaustive, access to compressed memory | | Attention Sink (near-zero on stale entries) | [RESOLVED] tag on fixed errors | Let the consumer gracefully ignore stale compressed context | | On-disk KV cache (prefix reuse) | Raw JSONL recall via vcc_recall | Lossless cold store alongside compressed hot context | | Quick Instruction (cache reuse for aux tasks) | [Anchors] (zero-tool-call recall) | Self-serve lookups from already-present context | | Contextual Parallelism (boundary alignment) | Task-boundary-aware cut | Compression segments align to meaningful units, not arbitrary positions | | Hybrid precision (BF16+FP8, cache-aligned) | Cache-friendly section ordering | Stable prefix survives across compactions for prompt caching | | Interleaved thinking preservation | Brief transcript preservation | Discarding intermediate reasoning forces reconstruction from scratch |

Features delivered:

  1. Multi-resolution transcript — three-zone brief: [Earlier Turns] (one-liner/turn), brief transcript (medium compression), kept tail (uncompressed)
  2. Error resolution detection — tsc errors tagged [RESOLVED] when the file they reference was subsequently edited
  3. Task-boundary-aware cut — cut point detects mid-flight turns and pushes back to keep the whole turn in the tail
  4. Structured anchors[Anchors] section with commit hashes, error IDs, key file paths for zero-tool-call recall

Mastra — Observational Memory

Inspired by Mastra's OM patterns — treating compaction output as a structured observation layer rather than free-form prose.

| Mastra OM Pattern | pi-vcc Equivalent | |---|---| | Observational memory summary | [Current Status] section — auto-extracted focus, last action, next steps | | Priority-tagged observations | [ERROR]/[WARN]/[INFO]/[RESOLVED] tags on Outstanding Context | | Stable-first observation ordering | Cache-friendly section ordering (stable sections first, volatile last) | | Per-observation metadata | Timestamp + compression-ratio metadata footer | | Scoped observation retrieval | Compaction-scoped vcc_recall (scope:'compaction:N') |

Claude Code — Three-Tier Compaction

Claude Code's three-tier architecture (full LLM, session memory = LLM-free, micro-compact = cache-editing) influenced pragmatic design choices:

| Claude Code Technique | pi-vcc Influence | |---|---| | Cache-sharing fork path | Cache-friendly section ordering — stable prefix survives across compactions for prompt cache hits | | lastSummarizedMessageId boundary tracking | Compaction-scoped recall — scope:'compaction:N' drills into specific segments | | Session memory (deterministic, LLM-free) | Validates the zero-LLM approach; pi-vcc achieves similar determinism via extraction instead of a separate memory pipeline |

Codex (OpenAI) — Three-Path Compaction

Codex's Rust-based compaction (inline/remote/remote-v2) inspired higher-level design decisions:

| Codex Technique | pi-vcc Influence | |---|---| | summary_prefix.md continuation directive | Handoff preamble — continuation directive prepended to every compaction summary | | ContextCompactionItem as first-class protocol type | Structured bracket-tagged sections act as a first-class compaction artifact, not free-form prose | | InitialContextInjection control over system context | Task-boundary-aware cut ensures meaningful boundaries, not arbitrary splits |

VCC Paper — Adaptive View

The original VCC paper (arxiv.org/abs/2603.29678) introduced the adaptive view concept — preserving conversation structure and role tags in search projections.

| VCC Paper Concept | pi-vcc Equivalent | |---|---| | Adaptive view (structure-preserving projection) | Structure-preserving search results — grouped by conversation segments (turns) with > match indicators | | Role tag preservation | Thinking content surfacing — thinkingOf() extracts model reasoning for recall display and search indexing |

Original Novel Work

Features with no external inspiration — original engineering contributions unique to this fork:

  1. Deep error extraction — captures bash exit codes [bash:exit N], tsc errors [tsc], test failures [tests], empty grep/glob [no matches] with structured tags and dedup
  2. Symbol-annotated files — file paths annotated with exported symbol names extracted from tool call arguments and results
  3. Type catalog[Type Catalog] section with exact exported signature lines from modified/read files
  4. Multi-language symbol extraction — Rust, Java, C/C++, Zig, Ruby, Elixir symbol detection with language-specific regex patterns
  5. Performance optimization suite — catastrophic backtracking fix (>1000×), unified extraction (3×), DECL_SCREEN_RE pre-filter (2.6×), eachLine() generator, Set-based dedup (5.7×), Intl.Segmenter replacement (2×), convertToLlm() elimination
  6. Entry-ID-based message range — stores entry IDs instead of branch-relative indices for correct cross-branch resolution
  7. Neuralwatt-MCR interop — signals compaction override so MCR models don't discard pi-vcc's summary
  8. Supply-chain hardening — pinned deps, npm-shrinkwrap, audit fixes

License

MIT