compakt

v1.0.1

Published

20 days ago

Chunked context compaction plugin for local LLMs.

Downloads

252

0High
0Medium
0Low

gilliangroks

openclaw plugin compaction context llm

Compakt Plugin

Compakt is an OpenClaw plugin that provides efficient context compaction for local LLMs by chunking conversation history and summarizing each chunk. It builds on the original jasper-context-compactor project, extending it with the new CompactionProvider API and proper message removal handling.

Model Selection Guide

| Context Window | Peak VRAM | Best For | |----------------|-----------|----------| | 32K (recommended) | ~8.6 GB | Preserving detailed context, long conversations | | 8K | ~7.5 GB | Memory-constrained GPUs (12GB), general conversation continuity | | 4K+ | Lower | Only if VRAM is extremely tight, will lose significant detail |

Trade-off: Smaller context windows save VRAM but lose fine-grained details. Use 32K if you need the model to recall specific detailed information.

Sizing Your Compaction Model

A good rule of thumb: compaction model's num_ctx can be ~4x smaller than your main model's num_ctx without significant context loss.

| Main Model Context | Recommended Compaction Model | |-------------------|------------------------------| | 128K (e.g., glm-5.1:cloud) | 32K | | 32K (default) | 8K | | 16K | 4K | | 8K | 2K (may lose detail) |

Why 4x works: The compaction model summarizes conversation history into compact summaries. It doesn't need to hold the entire context at once — it processes chunks and accumulates summaries. A 4x smaller context window still captures enough detail for effective summarization.

Performance

Compakt dramatically reduces VRAM usage during context compaction:

| Metric | Before Compakt | With Compakt | |--------|---------------|--------------| | Peak VRAM | ~13.9 GB | ~8.6 GB | | Compaction time | 6+ minutes | ~20 seconds | | VRAM reduction | — | 38% |

The VRAM spike during Compakt compaction is essentially just the compaction model loading — chunked processing adds near-zero overhead.

Measured on RTX 4060 Ti (16GB VRAM)

| Component | VRAM | |-----------|------| | System + browser baseline | ~2.1 GB | | Compaction model (qwen3.5:4b-32k) | ~6.5 GB | | Compakt peak during compaction | ~8.6 GB |

Note: Peak VRAM varies based on your system baseline (browser tabs, other apps). Measurements above were taken with a ~2.1 GB baseline. Your baseline may be higher if you have more applications running, but the relative savings remain consistent.

Quick Start

1. Create a Compaction Model

Compakt needs a model configured for summarization. Create one with sufficient context window:

# Example: Create a 32K context model from qwen3.5:4b
ollama create qwen3.5-compaction-32k -f - <<EOF
FROM qwen3.5:4b
PARAMETER num_ctx 32768
PARAMETER temperature 0.3
PARAMETER top_k 20
PARAMETER top_p 0.95
PARAMETER presence_penalty 1.5
EOF

Important: The num_ctx must match or exceed your chunkContextWindow config (default: 8192).

2. Install the Plugin

# Install from npm
npm install compakt

# Or install from ClawHub (recommended for OpenClaw users)
openclaw plugins install compakt

3. Configure

Edit ~/.openclaw/openclaw.json:

{
  "plugins": {
    "entries": {
      "compakt": {
        "config": {
          "summaryModel": "ollama/qwen3.5-compaction-32k",
          "chunkContextWindow": 32768
        }
      }
    }
  }
}

Configuration

| Option | Type | Default | Description | |--------|------|---------|-------------| | enabled | boolean | true | Enables or disables the plugin. | | summaryModel | string | "" | Required. Model used for summarization (e.g., ollama/qwen3.5-compaction-32k). | | ollamaBaseUrl | string | http://127.0.0.1:11434 | Ollama API URL. Can also set OLLAMA_BASE_URL env var (takes precedence). | | summaryMaxTokens | number | 1000 | Maximum tokens for the summarizer output (100-32000). | | charsPerToken | number | 4 | Characters per token for estimation. Use 4 for English, 2-3 for code. | | chunkContextWindow | number | 8192 | Must match your model's num_ctx. Context window for chunking. | | chunkOverlap | number | 500 | Overlapping tokens between chunks for continuity. | | logLevel | enum | info | Logging verbosity (debug, info, warn, error). |

Requirements

OpenClaw gateway running locally
Ollama instance at http://127.0.0.1:11434 (configurable via ollamaBaseUrl or OLLAMA_BASE_URL)
Compaction model installed: ollama pull qwen3.5-compaction-32k

Provider Support

Currently supports Ollama providers only. Other providers (Anthropic, OpenAI, etc.) will log a warning and skip compaction via compakt.

Context Window Matching

Critical: chunkContextWindow must be ≤ your compaction model's num_ctx.

| Model num_ctx | Recommended chunkContextWindow | |---------------|-------------------------------| | 8192 (8K) | 8192 | | 16384 (16K) | 16384 | | 32768 (32K) | 32768 |

If you set chunkContextWindow higher than num_ctx, summarization will fail.

Example Setups

8K Model (most common):

{
  "summaryModel": "ollama/qwen3.5-compaction",
  "chunkContextWindow": 8192,
  "chunkOverlap": 500
}

32K Model (large contexts):

{
  "summaryModel": "ollama/qwen3.5-compaction-32k",
  "chunkContextWindow": 32768,
  "chunkOverlap": 1000
}

Usage

The plugin registers a /context-stats command:

/user: /context-stats
assistant: ⚙️ Compakt stats:
- Model: ollama/qwen3.5-compaction-32k
- Estimated tokens (all messages): 1234
- Chunk count: 3

How It Works

Token Estimation: Counts tokens using character heuristic (chars ÷ charsPerToken)
Chunking: Splits messages into chunks that fit within chunkContextWindow
Overlap: Each chunk overlaps by chunkOverlap tokens for continuity
Summarization: Each chunk is summarized by the configured model
Fallback: If summarization fails, raw message snippets are preserved with [FALLBACK] prefix

Troubleshooting

Compakt not activating

Symptom: Compaction uses fallback LLM instead of Compakt.

Cause: The provider field must be in ~/.openclaw/openclaw.json, not config.yaml.

Fix: Add to openclaw.json:

{
  "agents": {
    "defaults": {
      "compaction": {
        "provider": "compakt",
        "model": "ollama/qwen3.5-compaction-32k"
      }
    }
  }
}

Verify Compakt is active

Check logs for:

[Compakt.summarize] Called with summaryModel=ollama/qwen3.5-compaction-32k
[Compakt.summarize] Complete: outputTokens=X, compression=Y%

If you see these logs, Compakt is working. If not, check:

provider: "compakt" in openclaw.json under agents.defaults.compaction
summaryModel is set correctly
Ollama is running and the model is installed

Compression shows negative percentage

This means the summary is larger than the input — expected when summarizing small contexts. Compakt still preserves context continuity via overlap.

Attribution

Based on jasper-context-compactor by E.x.O. Entertainment Studios Inc. https://github.com/E-x-O-Entertainment-Studios-Inc/openclaw-context-compactor

License

MIT License. See LICENSE file for details.

Differences from Original

Implements new CompactionProvider API
Handles message removal correctly
Configurable chunking for models with smaller context windows
/context-stats command for debugging
AbortSignal support for cancellation
[FALLBACK] prefix for degraded output visibility