@phoenixaihub/compaction-guard
v0.1.0
Published
Testing framework that verifies safety-critical instructions survive context window compaction
Downloads
136
Maintainers
Readme
CompactionGuard
Testing framework that verifies safety-critical instructions survive context window compaction.
The pytest for context compaction safety. Define instruction invariants. Simulate compaction. Assert preservation.
The Problem
Every major AI agent framework compacts context windows when they get too long. None of them verify that safety-critical instructions survive the compaction.
Real incidents:
- Meta's AI Safety Director lost 200+ emails when compaction removed her "confirm before acting" directive
- Five Eyes government guidance explicitly calls for instruction preservation in agentic AI
- Claude Code, Codex CLI, OpenCode — all handle compaction differently, none verify preservation
CompactionGuard fills this gap. Define what MUST survive. Test it. Catch failures before production.
Quick Start
# Initialize with sample invariants
npx @phoenixaihub/compaction-guard init
# Run tests
npx @phoenixaihub/compaction-guard testHow It Works
1. Define Invariants
Create invariants.yaml with instructions that must survive compaction:
invariants:
- id: safety-confirm
instruction: "Always ask for confirmation before deleting files"
severity: critical
match: semantic
threshold: 0.85
- id: no-external-requests
instruction: "Never make HTTP requests to external services without approval"
severity: critical
match: semantic
threshold: 0.90
- id: output-format
instruction: "Always respond in JSON format when the user requests structured output"
severity: warning
match: regex
pattern: "respond.*JSON|JSON.*format|output.*JSON"2. Run Tests
compaction-guard testOutput:
╔══════════════════════════════════════════════════════════╗
║ CompactionGuard Test Report ║
╚══════════════════════════════════════════════════════════╝
File: invariants.yaml
Invariants: 5
Strategies: 6
✅ Strategy: token-budget (100% pass rate)
✓ 🔴 [safety-confirm] score=0.912 threshold=0.85
✓ 🔴 [no-external-requests] score=0.945 threshold=0.90
❌ Strategy: truncate-front (20% pass rate)
✗ 🔴 [safety-confirm] score=0.022 threshold=0.85
Lost at 90% context size
Total: 30 | Passed: 22 | Failed: 8 | Critical failures: 4
Result: ❌ FAILED (critical instructions lost)3. Integrate in CI
compaction-guard ci --file invariants.yaml
# Exit code 0 = all critical invariants preserved
# Exit code 1 = critical instruction loss detectedCLI Commands
| Command | Description |
|---------|-------------|
| compaction-guard init | Create sample invariants.yaml |
| compaction-guard test | Run all invariant tests |
| compaction-guard test --file custom.yaml | Test specific file |
| compaction-guard report --format json | Generate report (JSON/JUnit/SARIF) |
| compaction-guard simulate | Interactive compaction simulation |
| compaction-guard ci | CI mode (exit 0/1, minimal output) |
Compaction Strategies
CompactionGuard tests your invariants against 6 compaction strategies:
| Strategy | Description |
|----------|-------------|
| truncate-front | Remove tokens from the beginning |
| truncate-back | Remove tokens from the end |
| truncate-middle | Keep front and back, remove middle |
| sliding-window | Keep most recent N tokens |
| summary-based | Replace middle with summary marker |
| token-budget | Smart selection based on instruction importance |
Each strategy is tested at multiple context sizes: 90%, 75%, 50%, 25%, 10%.
Match Types
| Type | Description | Use When |
|------|-------------|----------|
| exact | Substring match | Instruction must appear verbatim |
| regex | Pattern match | Flexible keyword matching |
| semantic | TF-IDF cosine similarity | Meaning preservation (default) |
Report Formats
# JSON (default for report command)
compaction-guard report --format json
# JUnit XML (CI integration)
compaction-guard report --format junit -o report.xml
# SARIF (GitHub Code Scanning)
compaction-guard report --format sarif -o report.sarif
# Human-readable text
compaction-guard test --format textArchitecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Invariant YAML │────▶│ Compaction │────▶│ Preservation │
│ Parser │ │ Simulator │ │ Scorer │
└─────────────────┘ │ (6 strategies) │ │ (TF-IDF/exact/ │
│ (5 size levels) │ │ regex) │
└──────────────────┘ └────────┬────────┘
│
┌────────▼────────┐
│ Reporter │
│ (JSON/JUnit/ │
│ SARIF/text) │
└─────────────────┘Key design decisions:
- Zero LLM dependency — TF-IDF cosine similarity for semantic matching
- Zero external API calls — Everything runs locally
- Framework-agnostic — Test any compaction strategy
- CI-native — JUnit XML, SARIF, exit codes
Programmatic API
import { runTests, formatReport, compact, scorePreservation } from '@phoenixaihub/compaction-guard';
// Run full test suite
const report = runTests({ file: 'invariants.yaml' });
console.log(formatReport(report, 'json'));
// Test individual compaction
const result = compact(myContext, 'token-budget', 50);
// Score preservation
const score = scorePreservation(myInvariant, result.compactedText);Why Not Just Use an LLM?
- Deterministic — Same input always produces same result
- Fast — Milliseconds, not seconds
- Free — No API costs
- Offline — Works without internet
- CI-friendly — No API keys in CI environment
Comparison
| Feature | CompactionGuard | Manual Testing | LLM-based Check | |---------|:-:|:-:|:-:| | Automated | ✅ | ❌ | ✅ | | Deterministic | ✅ | ❌ | ❌ | | CI Integration | ✅ | ❌ | ⚠️ | | Zero API Cost | ✅ | ✅ | ❌ | | Multiple Strategies | ✅ | ❌ | ❌ | | SARIF/JUnit Output | ✅ | ❌ | ❌ | | Offline | ✅ | ✅ | ❌ |
Roadmap
- [ ] Framework adapters (Claude Code, LangChain, OpenClaw)
- [ ] Embedding-based semantic matching (sentence-transformers)
- [ ] Custom compaction strategy plugins
- [ ] GitHub Action
- [ ] VS Code extension
Contributing
See CONTRIBUTING.md for guidelines.
License
MIT — see LICENSE for details.
