openclaw-plugin-tokenranger
v2026.3.2
Published
TokenRanger — context compression plugin for OpenClaw. Reduces cloud LLM token costs by 50-80% via local SLM summarization (Ollama).
Downloads
39
Maintainers
Readme
openclaw-plugin-tokenranger
TokenRanger is a community plugin for OpenClaw that compresses session context through a local SLM (via Ollama) before sending to cloud LLMs — reducing input token costs by 50–80%.
Table of contents
- How it works
- Requirements
- Install
- Configuration
- Commands
- Upgrading
- Uninstalling
- Performance
- Graceful degradation
- Contributing
How it works
User message → OpenClaw gateway
→ before_agent_start hook
→ Turn 1? Skip (full fidelity for first message)
→ Turn 2+: strip code blocks, send history to localhost:8100/compress
→ FastAPI sidecar runs LangChain LCEL chain (Ollama)
→ Compressed summary returned as prependContext
→ Cloud LLM receives compressed context instead of full historyInference strategy is auto-selected based on GPU availability:
| Strategy | Trigger | Model | Approach |
|---|---|---|---|
| full | GPU available | mistral:7b | Deep semantic summarization |
| light | CPU only | phi3.5:3b | Extractive bullet points |
| passthrough | Ollama unreachable | — | Truncate to last 20 lines, no compression |
Requirements
- OpenClaw ≥ 2026.2.0 (install guide)
- Ollama installed and running locally (ollama.com)
- Python 3.10+ (for the FastAPI compression sidecar)
Install
1. Install the plugin
openclaw plugins install openclaw-plugin-tokenrangerTo pin an exact version (recommended for production):
openclaw plugins install [email protected] --pin
2. Run first-time setup
openclaw tokenranger setupsetup does the following automatically:
- Pulls the required Ollama models (
mistral:7b+phi3.5:3b) - Creates a Python virtualenv and installs FastAPI/LangChain deps
- Registers the TokenRanger sidecar as a system service:
- Linux:
systemduser unit (tokenranger.service) - macOS:
launchdagent (com.peterjohannmedina.tokenranger.plist)
- Linux:
- Starts the sidecar on
localhost:8100
3. Restart the gateway
openclaw gateway restart4. Verify
openclaw tokenrangerYou should see your current settings and sidecar status (reachable/unreachable).
Manual sidecar start (if needed)
If the system service didn't register, you can start the sidecar directly:
# Linux / macOS
~/.openclaw/extensions/tokenranger/service/start.shConfiguration
After install, configure under plugins.entries.tokenranger.config in your openclaw.json
(edit via openclaw config set plugins.entries.tokenranger.config.<key> <value>):
| Key | Default | Description |
|---|---|---|
| serviceUrl | http://127.0.0.1:8100 | TokenRanger FastAPI sidecar URL |
| timeoutMs | 10000 | Max wait per request before fallthrough |
| minPromptLength | 500 | Min chars of history before compressing |
| ollamaUrl | http://127.0.0.1:11434 | Ollama API base URL |
| preferredModel | mistral:7b | Model used in full GPU strategy |
| compressionStrategy | auto | auto / full / light / passthrough |
| inferenceMode | auto | auto / cpu / gpu / remote |
Example — force CPU-only light mode:
openclaw config set plugins.entries.tokenranger.config.compressionStrategy light
openclaw config set plugins.entries.tokenranger.config.inferenceMode cpu
openclaw gateway restartCommands
| Command | Description |
|---|---|
| /tokenranger | Show current settings and sidecar health |
| /tokenranger mode gpu | Force GPU (full) compression strategy |
| /tokenranger mode cpu | Force CPU (light) compression strategy |
| /tokenranger mode off | Disable compression (passthrough) |
| /tokenranger model | List available Ollama models |
| /tokenranger toggle | Enable / disable the plugin |
Upgrading
TokenRanger follows calendar versioning (YYYY.M.D[-patch.N]), matching the OpenClaw release cadence.
Check for updates
openclaw plugins update tokenranger --dry-runThis shows the available version without applying anything.
Apply an update
openclaw plugins update tokenranger
openclaw tokenranger setup # re-runs sidecar setup if service files changed
openclaw gateway restartNote:
setupis idempotent — it only pulls new models or reinstalls deps if versions have changed. It will not wipe your existing config.
Pin to a specific version
If you want to lock to a known-good release:
openclaw plugins install [email protected] --pin
openclaw tokenranger setup
openclaw gateway restartTo see all published versions:
npm view openclaw-plugin-tokenranger versions --jsonAfter major OpenClaw upgrades
Check CHANGELOG.md in this repo for any breaking config key renames or sidecar API changes before upgrading TokenRanger across a major OpenClaw version bump.
Uninstalling
openclaw plugins uninstall tokenranger
openclaw gateway restartThis removes the plugin from config and deletes its install directory. The Python sidecar and system service are left in place — to fully remove:
# Linux
systemctl --user stop tokenranger && systemctl --user disable tokenranger
rm ~/.config/systemd/user/tokenranger.service
# macOS
launchctl unload ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist
rm ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plistPerformance
5-turn Discord conversation benchmark (GPU, mistral:7b-instruct):
| Turn | Input tokens | Compressed | Reduction | Latency | |---|---|---|---|---| | 2 | 732 | 125 | 82.9% | 1,086ms | | 3 | 1,180 | 150 | 87.3% | 1,375ms | | 4 | 1,685 | 212 | 87.4% | 1,960ms | | 5 | 2,028 | 277 | 86.3% | 2,420ms |
Cumulative: 5,866 → 885 tokens (84.9% reduction), ~1.6s avg/turn.
CPU (phi3.5:3b-mini) benchmarks TBD.
Graceful degradation
TokenRanger never breaks your chat. At every failure point there's a silent fallthrough:
- Sidecar unreachable → passthrough (message sent to cloud LLM uncompressed)
- Ollama timeout → passthrough
- Compression returns empty string → original message used
- Plugin disabled → zero overhead, standard OpenClaw routing
Contributing
Issues and PRs welcome: https://github.com/peterjohannmedina/openclaw-plugin-tokenranger
For discussion, find us in the OpenClaw Discord.
Release process (maintainers)
See CONTRIBUTING.md for the full release and versioning workflow.
License
MIT — see LICENSE
