@aionis/openclaw-adapter
v0.1.3
Published
Standalone OpenClaw adapter for Aionis execution control.
Maintainers
Readme
Aionis OpenClaw Adapter
Bring execution control to OpenClaw.
@aionis/openclaw-adapter connects OpenClaw to Aionis so agent runs stop acting like an unbounded ReAct loop and start behaving like a controlled execution system.
What Aionis adds on top of OpenClaw:
- externalized context so each run starts with the right task state instead of rediscovering it
- policy gating so broad search, broad test, and repeated no-progress tool paths get suppressed
- replay dispatch so repeatable work can escape into a known path instead of starting over
- handoff fallback so failed or interrupted runs preserve a usable continuation point
- loop control so tool churn, duplicate observations, and no-progress streaks get stopped before they burn more time and tokens
Runtime safety notes on the current release line:
- Aionis transport failures on hot hooks now degrade open instead of aborting the host run
enabled=falseis a real off switch for loop-control behavior- deny-only policy outcomes now go through the same controlled replay/handoff stop path as other loop-control stops
This is not a generic memory plugin. It is an execution-control adapter for OpenClaw.
Why It Matters
OpenClaw is powerful, but on complex tasks it can still fail in predictable ways:
- too many repeated tool calls
- broad repo scans when a focused path would do
- broad test runs when a targeted validation is enough
- no-progress retry loops that keep burning tokens
- interrupted runs that lose the exact execution state needed to continue
Aionis changes that operating model.
Instead of letting each run improvise from scratch, the adapter gives OpenClaw:
- a compact execution context at run start
- a policy layer before expensive tool calls
- feedback and evidence capture after each tool call
- structured escape hatches through replay or handoff
What Is Proven Today
Current benchmark evidence supports five concrete claims:
- Tool-loop churn goes down
- Token burn goes down on benchmarked slices
- Completion goes up on current replay, focused-repo, handoff-resume, and one-prompt multi-agent slices
- Reviewer-ready completion goes up on the current realistic workflow scenario
- The adapter is active on real OpenClaw runtime paths, not just mock harnesses
Headline results:
- Live-task A/B: average executed steps dropped from
7.33to3, and broad tool calls dropped from1.33to0 - GLM-5 semi-live token benchmark: average total tokens dropped from
1893to865.33 - Hard-stop / replay token slice: average total tokens dropped from
1659to1267, withcontrolled_stop_rate = 1 - Completion benchmark: baseline
completed_rate = 0, treatmentcompleted_rate = 1on the current benchmark slices - One-prompt multi-agent A/B:
- issue
#10864: baselinecompleted_rate = 0, treatmentcompleted_rate = 1 - dashboard auth drift: baseline
completed_rate = 0, treatmentcompleted_rate = 1 - markdown fallback: baseline
completed_rate = 0.3333, treatmentcompleted_rate = 1(supporting slice)
- issue
- Repeated Google runtime-backed A/B: baseline
completed_rate = 0, treatmentcompleted_rate = 0.8 - Real workflow scenario v1 (3 repeats):
- dashboard auth drift with real Lite: baseline
reviewer_ready_rate = 0.6667, treatmentreviewer_ready_rate = 1 - pairing / approval recovery with real Lite: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1 - service token drift repair with real Lite: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 0.6667 - markdown parser fallback with real Lite: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 0.6667(supporting slice)
- dashboard auth drift with real Lite: baseline
- Execution continuity validation on the real Lite path (single-run checks):
- dashboard auth drift with recovered
execution_packet_v1: baselinereviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1 - pairing / approval recovery with recovered
execution_packet_v1: baselinereviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1 - service token drift repair with recovered
execution_packet_v1: baselinereviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1 - markdown parser fallback with recovered
execution_packet_v1: baselinereviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1
- dashboard auth drift with recovered
- Phase 2 state-first context revalidation on the real Lite path (3 repeats):
- dashboard auth drift: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 0.6667, with lower average total tokens from24005.33to21859and lower wall-clock from98846.67msto74957.33ms - pairing / approval recovery: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1, with lower average total tokens from19066.33to16862.67and lower wall-clock from76917msto55271.67ms - service token drift repair: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1, but with higher average total tokens from17245.67to25099.67and higher wall-clock from66586msto78718.67ms
- dashboard auth drift: baseline
- Phase 2 handoff-transition single-run revalidation on the real Lite path:
- dashboard auth drift: baseline
reviewer_ready_rate = 1, treatmentreviewer_ready_rate = 1, while treatment lowers total tokens from23870to17533and wall-clock from96660msto58810ms
- dashboard auth drift: baseline
- Phase 2 handoff-transition repeated revalidation on the real Lite path (3 repeats):
- dashboard auth drift: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1, with lower average total tokens from24717.67to21235.67and lower wall-clock from101635.33msto68210ms
- dashboard auth drift: baseline
- Phase 2 tools/select state-aware repeated revalidation on the real Lite path (3 repeats):
- dashboard auth drift: baseline
reviewer_ready_rate = 0.6667, treatmentreviewer_ready_rate = 1, but with higher average total tokens from18936.67to28186.67and higher wall-clock from83000.67msto94629ms - pairing / approval recovery: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 1, but with higher average total tokens from18493to22063.33and higher wall-clock from78439.33msto80485.33ms - service token drift repair: baseline
reviewer_ready_rate = 0, treatmentreviewer_ready_rate = 0.3333, but with higher average total tokens from14376.67to23844.33and higher wall-clock from55854.67msto76087ms(supporting completion slice)
- dashboard auth drift: baseline
- Repeated continuity A/B on the real Lite path (
legacyvsexecution_packet_v1):- dashboard auth drift: completion stays
1 -> 1, while packet continuity lowers average total tokens from24750.67to22974 - pairing / approval recovery: completion stays
1 -> 1, while packet continuity lowers average total tokens from22704to22091.33 - service token drift repair: completion stays
1 -> 1, while packet continuity lowers average total tokens from24974.67to23043 - markdown parser fallback: baseline
reviewer_ready_rate = 0.6667, packet continuityreviewer_ready_rate = 1, but with higher average total tokens from20920.67to30203(supporting completion slice, not a core efficiency slice)
- dashboard auth drift: completion stays
Supporting docs:
- Benchmark Evidence Overview
- Benchmark Summary
- Completion Benchmark
- One-Prompt Multi-Agent Benchmark
- One-Prompt Multi-Agent Case Study
- Loader-Backed Semi-Live Token Benchmark
- Google Runtime Benchmark
- Google Runtime Case Study
- Real Workflow Scenario v1
- Execution Continuity Validation
- Execution Continuity A/B
- Controlled Nightly Validation
- Launchd Setup
Public evidence files:
- Evidence Index
- Live-task benchmark summary
- GLM-5 semi-live token summary
- Hard-stop / replay token summary
- Completion benchmark summary
- One-prompt multi-agent summary: issue #10864
- One-prompt multi-agent summary: dashboard auth drift
- One-prompt multi-agent summary: markdown fallback
- One-prompt multi-agent case study
- Repeated Google runtime summary
- Real workflow scenario summary: dashboard auth drift (real Lite)
- Real workflow scenario summary: pairing / approval recovery (real Lite)
- Real workflow scenario summary: service token drift repair (real Lite)
- Real workflow scenario summary: markdown parser fallback (real Lite)
- Real workflow continuity validation: dashboard auth drift (real Lite)
- Phase 2 state-first context revalidation: dashboard auth drift (real Lite)
- Phase 2 state-first context revalidation: pairing / approval recovery (real Lite)
- Phase 2 state-first context revalidation: service token drift repair (real Lite)
- Phase 2 handoff-transition single-run revalidation: dashboard auth drift (real Lite)
- Phase 2 handoff-transition repeated revalidation: dashboard auth drift (real Lite)
- Phase 2 tools/select state-aware repeated revalidation: dashboard auth drift (real Lite)
- Phase 2 tools/select state-aware repeated revalidation: pairing / approval recovery (real Lite)
- Phase 2 tools/select state-aware repeated revalidation: service token drift repair (real Lite, supporting completion slice)
- Real workflow continuity validation: pairing / approval recovery (real Lite)
- Real workflow continuity validation: service token drift repair (real Lite)
- Real workflow continuity validation: markdown parser fallback (real Lite)
- Post-main-merge revalidation: dashboard auth drift (real Lite)
- Post-main-merge revalidation: pairing / approval recovery (real Lite)
- Post-main-merge revalidation: service token drift repair (real Lite)
- Repeated continuity A/B: dashboard auth drift (real Lite)
- Repeated continuity A/B: pairing / approval recovery (real Lite)
- Repeated continuity A/B: service token drift repair (real Lite)
- Repeated continuity A/B: markdown parser fallback (real Lite, supporting completion slice)
5-Minute Quickstart
1. Start Aionis Lite
npx @aionis/[email protected] dev
npx @aionis/[email protected] healthExpected Aionis base URL:
http://127.0.0.1:3321
2. Install the Adapter into OpenClaw
openclaw plugins install @aionis/openclaw-adapter
openclaw plugins info openclaw-adapter --jsonYou should see:
- plugin id:
openclaw-adapter - status:
loaded
3. Add the Minimal OpenClaw Config
Reference examples:
{
"plugins": {
"allow": ["openclaw-adapter"],
"entries": {
"openclaw-adapter": {
"enabled": true,
"config": {
"baseUrl": "http://127.0.0.1:3321",
"tenantId": "default",
"actor": "openclaw",
"scopeMode": "project",
"strictToolBlocking": true,
"replayDispatchEnabled": true,
"handoffFallbackEnabled": true
}
}
}
}
}Use examples/openclaw.json first.
Do not tune the threshold knobs on first install:
maxStepsmaxSameToolStreakmaxDuplicateObservationStreakmaxNoProgressStreakmaxEstimatedTokenBurnmaxBroadTestInvocationsmaxBroadScanInvocations
Those are advanced controls for later slice-specific tuning, not required install-time setup.
4. Run a First Turn
openclaw agent --local --message "Inspect the task, avoid broad scans, and proceed carefully." --jsonHow Aionis Changes an OpenClaw Run
Before the run
Aionis assembles a compact execution context so the model starts from the right task state instead of re-reading the same surface area.
Before a tool call
Aionis applies policy gating. This is where the adapter can suppress:
- repeated calls to the same tool
- broad repo search when a focused query is enough
- broad test runs when a targeted test is available
- obviously no-progress paths that should stop or reroute
After a tool call
Aionis writes back:
- tool feedback
- evidence
- loop state updates
That lets later steps reason from actual execution history, not just the transient conversation buffer.
When the run degrades
The adapter can escape through:
- replay dispatch when the task matches a reusable path
- handoff when the right behavior is to preserve a structured continuation point
What This Product Is
This package gives you:
- a reusable
AionisLoopControlAdapter - an OpenClaw host binding
- policy and loop heuristics for expensive tool paths
- replay and handoff orchestration around OpenClaw runs
Current hook coverage:
session_startsession_endbefore_agent_startbefore_tool_callafter_tool_callagent_endtool_result_persistbefore_message_write
What It Does Not Claim
This adapter currently controls the tool-loop boundary.
It does not claim to:
- control planner-internal reasoning steps that never emit a tool call
- solve every OpenClaw failure mode
- guarantee token wins on every provider and every task shape
The current evidence is strong on:
- tool-loop control
- token reduction on benchmarked slices
- completion uplift on current benchmark slices, including one-prompt multi-agent workflows
- real OpenClaw runtime activity
Verification and Benchmark Commands
Core checks:
npm testnpm run smoke:openclaw-loadnpm run smoke:adapter-activity
Benchmarks:
npm run bench:openclaw-abnpm run bench:live-tasknpm run bench:semi-live-tokennpm run bench:loader-backed-semi-live-tokennpm run bench:completionnpm run bench:google-runtimenpm run bench:google-runtime-abnpm run bench:real-workflow
