compakt
v1.0.1
Published
Chunked context compaction plugin for local LLMs.
Downloads
252
Maintainers
Readme
Compakt Plugin
Compakt is an OpenClaw plugin that provides efficient context compaction for local LLMs by chunking conversation history and summarizing each chunk. It builds on the original jasper-context-compactor project, extending it with the new CompactionProvider API and proper message removal handling.
Model Selection Guide
| Context Window | Peak VRAM | Best For | |----------------|-----------|----------| | 32K (recommended) | ~8.6 GB | Preserving detailed context, long conversations | | 8K | ~7.5 GB | Memory-constrained GPUs (12GB), general conversation continuity | | 4K+ | Lower | Only if VRAM is extremely tight, will lose significant detail |
Trade-off: Smaller context windows save VRAM but lose fine-grained details. Use 32K if you need the model to recall specific detailed information.
Sizing Your Compaction Model
A good rule of thumb: compaction model's num_ctx can be ~4x smaller than your main model's num_ctx without significant context loss.
| Main Model Context | Recommended Compaction Model | |-------------------|------------------------------| | 128K (e.g., glm-5.1:cloud) | 32K | | 32K (default) | 8K | | 16K | 4K | | 8K | 2K (may lose detail) |
Why 4x works: The compaction model summarizes conversation history into compact summaries. It doesn't need to hold the entire context at once — it processes chunks and accumulates summaries. A 4x smaller context window still captures enough detail for effective summarization.
Performance
Compakt dramatically reduces VRAM usage during context compaction:
| Metric | Before Compakt | With Compakt | |--------|---------------|--------------| | Peak VRAM | ~13.9 GB | ~8.6 GB | | Compaction time | 6+ minutes | ~20 seconds | | VRAM reduction | — | 38% |
The VRAM spike during Compakt compaction is essentially just the compaction model loading — chunked processing adds near-zero overhead.
Measured on RTX 4060 Ti (16GB VRAM)
| Component | VRAM | |-----------|------| | System + browser baseline | ~2.1 GB | | Compaction model (qwen3.5:4b-32k) | ~6.5 GB | | Compakt peak during compaction | ~8.6 GB |
Note: Peak VRAM varies based on your system baseline (browser tabs, other apps). Measurements above were taken with a ~2.1 GB baseline. Your baseline may be higher if you have more applications running, but the relative savings remain consistent.
Quick Start
1. Create a Compaction Model
Compakt needs a model configured for summarization. Create one with sufficient context window:
# Example: Create a 32K context model from qwen3.5:4b
ollama create qwen3.5-compaction-32k -f - <<EOF
FROM qwen3.5:4b
PARAMETER num_ctx 32768
PARAMETER temperature 0.3
PARAMETER top_k 20
PARAMETER top_p 0.95
PARAMETER presence_penalty 1.5
EOFImportant: The num_ctx must match or exceed your chunkContextWindow config (default: 8192).
2. Install the Plugin
# Install from npm
npm install compakt
# Or install from ClawHub (recommended for OpenClaw users)
openclaw plugins install compakt3. Configure
Edit ~/.openclaw/openclaw.json:
{
"plugins": {
"entries": {
"compakt": {
"config": {
"summaryModel": "ollama/qwen3.5-compaction-32k",
"chunkContextWindow": 32768
}
}
}
}
}Configuration
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| enabled | boolean | true | Enables or disables the plugin. |
| summaryModel | string | "" | Required. Model used for summarization (e.g., ollama/qwen3.5-compaction-32k). |
| ollamaBaseUrl | string | http://127.0.0.1:11434 | Ollama API URL. Can also set OLLAMA_BASE_URL env var (takes precedence). |
| summaryMaxTokens | number | 1000 | Maximum tokens for the summarizer output (100-32000). |
| charsPerToken | number | 4 | Characters per token for estimation. Use 4 for English, 2-3 for code. |
| chunkContextWindow | number | 8192 | Must match your model's num_ctx. Context window for chunking. |
| chunkOverlap | number | 500 | Overlapping tokens between chunks for continuity. |
| logLevel | enum | info | Logging verbosity (debug, info, warn, error). |
Requirements
- OpenClaw gateway running locally
- Ollama instance at
http://127.0.0.1:11434(configurable viaollamaBaseUrlorOLLAMA_BASE_URL) - Compaction model installed:
ollama pull qwen3.5-compaction-32k
Provider Support
Currently supports Ollama providers only. Other providers (Anthropic, OpenAI, etc.) will log a warning and skip compaction via compakt.
Context Window Matching
Critical: chunkContextWindow must be ≤ your compaction model's num_ctx.
| Model num_ctx | Recommended chunkContextWindow | |---------------|-------------------------------| | 8192 (8K) | 8192 | | 16384 (16K) | 16384 | | 32768 (32K) | 32768 |
If you set chunkContextWindow higher than num_ctx, summarization will fail.
Example Setups
8K Model (most common):
{
"summaryModel": "ollama/qwen3.5-compaction",
"chunkContextWindow": 8192,
"chunkOverlap": 500
}32K Model (large contexts):
{
"summaryModel": "ollama/qwen3.5-compaction-32k",
"chunkContextWindow": 32768,
"chunkOverlap": 1000
}Usage
The plugin registers a /context-stats command:
/user: /context-stats
assistant: ⚙️ Compakt stats:
- Model: ollama/qwen3.5-compaction-32k
- Estimated tokens (all messages): 1234
- Chunk count: 3How It Works
- Token Estimation: Counts tokens using character heuristic (chars ÷ charsPerToken)
- Chunking: Splits messages into chunks that fit within
chunkContextWindow - Overlap: Each chunk overlaps by
chunkOverlaptokens for continuity - Summarization: Each chunk is summarized by the configured model
- Fallback: If summarization fails, raw message snippets are preserved with
[FALLBACK]prefix
Troubleshooting
Compakt not activating
Symptom: Compaction uses fallback LLM instead of Compakt.
Cause: The provider field must be in ~/.openclaw/openclaw.json, not config.yaml.
Fix: Add to openclaw.json:
{
"agents": {
"defaults": {
"compaction": {
"provider": "compakt",
"model": "ollama/qwen3.5-compaction-32k"
}
}
}
}Verify Compakt is active
Check logs for:
[Compakt.summarize] Called with summaryModel=ollama/qwen3.5-compaction-32k
[Compakt.summarize] Complete: outputTokens=X, compression=Y%If you see these logs, Compakt is working. If not, check:
provider: "compakt"inopenclaw.jsonunderagents.defaults.compactionsummaryModelis set correctly- Ollama is running and the model is installed
Compression shows negative percentage
This means the summary is larger than the input — expected when summarizing small contexts. Compakt still preserves context continuity via overlap.
Attribution
Based on jasper-context-compactor by E.x.O. Entertainment Studios Inc. https://github.com/E-x-O-Entertainment-Studios-Inc/openclaw-context-compactor
License
MIT License. See LICENSE file for details.
Differences from Original
- Implements new CompactionProvider API
- Handles message removal correctly
- Configurable chunking for models with smaller context windows
/context-statscommand for debugging- AbortSignal support for cancellation
[FALLBACK]prefix for degraded output visibility
