@aionis/openclaw-adapter

v0.1.3

Published

2 months ago

Standalone OpenClaw adapter for Aionis execution control.

Downloads

0High
0Medium
0Low

fielddd

aionis openclaw adapter loop-control policy replay handoff

Aionis OpenClaw Adapter

Bring execution control to OpenClaw.

@aionis/openclaw-adapter connects OpenClaw to Aionis so agent runs stop acting like an unbounded ReAct loop and start behaving like a controlled execution system.

What Aionis adds on top of OpenClaw:

externalized context so each run starts with the right task state instead of rediscovering it
policy gating so broad search, broad test, and repeated no-progress tool paths get suppressed
replay dispatch so repeatable work can escape into a known path instead of starting over
handoff fallback so failed or interrupted runs preserve a usable continuation point
loop control so tool churn, duplicate observations, and no-progress streaks get stopped before they burn more time and tokens

Runtime safety notes on the current release line:

Aionis transport failures on hot hooks now degrade open instead of aborting the host run
enabled=false is a real off switch for loop-control behavior
deny-only policy outcomes now go through the same controlled replay/handoff stop path as other loop-control stops

This is not a generic memory plugin. It is an execution-control adapter for OpenClaw.

Why It Matters

OpenClaw is powerful, but on complex tasks it can still fail in predictable ways:

too many repeated tool calls
broad repo scans when a focused path would do
broad test runs when a targeted validation is enough
no-progress retry loops that keep burning tokens
interrupted runs that lose the exact execution state needed to continue

Aionis changes that operating model.

Instead of letting each run improvise from scratch, the adapter gives OpenClaw:

a compact execution context at run start
a policy layer before expensive tool calls
feedback and evidence capture after each tool call
structured escape hatches through replay or handoff

What Is Proven Today

Current benchmark evidence supports five concrete claims:

Tool-loop churn goes down
Token burn goes down on benchmarked slices
Completion goes up on current replay, focused-repo, handoff-resume, and one-prompt multi-agent slices
Reviewer-ready completion goes up on the current realistic workflow scenario
The adapter is active on real OpenClaw runtime paths, not just mock harnesses

Headline results:

Live-task A/B: average executed steps dropped from 7.33 to 3, and broad tool calls dropped from 1.33 to 0
GLM-5 semi-live token benchmark: average total tokens dropped from 1893 to 865.33
Hard-stop / replay token slice: average total tokens dropped from 1659 to 1267, with controlled_stop_rate = 1
Completion benchmark: baseline completed_rate = 0, treatment completed_rate = 1 on the current benchmark slices
One-prompt multi-agent A/B:
- issue #10864: baseline completed_rate = 0, treatment completed_rate = 1
- dashboard auth drift: baseline completed_rate = 0, treatment completed_rate = 1
- markdown fallback: baseline completed_rate = 0.3333, treatment completed_rate = 1 (supporting slice)
Repeated Google runtime-backed A/B: baseline completed_rate = 0, treatment completed_rate = 0.8
Real workflow scenario v1 (3 repeats):
- dashboard auth drift with real Lite: baseline reviewer_ready_rate = 0.6667, treatment reviewer_ready_rate = 1
- pairing / approval recovery with real Lite: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1
- service token drift repair with real Lite: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 0.6667
- markdown parser fallback with real Lite: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 0.6667 (supporting slice)
Execution continuity validation on the real Lite path (single-run checks):
- dashboard auth drift with recovered execution_packet_v1: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1
- pairing / approval recovery with recovered execution_packet_v1: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1
- service token drift repair with recovered execution_packet_v1: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1
- markdown parser fallback with recovered execution_packet_v1: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1
Phase 2 state-first context revalidation on the real Lite path (3 repeats):
- dashboard auth drift: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 0.6667, with lower average total tokens from 24005.33 to 21859 and lower wall-clock from 98846.67ms to 74957.33ms
- pairing / approval recovery: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1, with lower average total tokens from 19066.33 to 16862.67 and lower wall-clock from 76917ms to 55271.67ms
- service token drift repair: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1, but with higher average total tokens from 17245.67 to 25099.67 and higher wall-clock from 66586ms to 78718.67ms
Phase 2 handoff-transition single-run revalidation on the real Lite path:
- dashboard auth drift: baseline reviewer_ready_rate = 1, treatment reviewer_ready_rate = 1, while treatment lowers total tokens from 23870 to 17533 and wall-clock from 96660ms to 58810ms
Phase 2 handoff-transition repeated revalidation on the real Lite path (3 repeats):
- dashboard auth drift: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1, with lower average total tokens from 24717.67 to 21235.67 and lower wall-clock from 101635.33ms to 68210ms
Phase 2 tools/select state-aware repeated revalidation on the real Lite path (3 repeats):
- dashboard auth drift: baseline reviewer_ready_rate = 0.6667, treatment reviewer_ready_rate = 1, but with higher average total tokens from 18936.67 to 28186.67 and higher wall-clock from 83000.67ms to 94629ms
- pairing / approval recovery: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 1, but with higher average total tokens from 18493 to 22063.33 and higher wall-clock from 78439.33ms to 80485.33ms
- service token drift repair: baseline reviewer_ready_rate = 0, treatment reviewer_ready_rate = 0.3333, but with higher average total tokens from 14376.67 to 23844.33 and higher wall-clock from 55854.67ms to 76087ms (supporting completion slice)
Repeated continuity A/B on the real Lite path (legacy vs execution_packet_v1):
- dashboard auth drift: completion stays 1 -> 1, while packet continuity lowers average total tokens from 24750.67 to 22974
- pairing / approval recovery: completion stays 1 -> 1, while packet continuity lowers average total tokens from 22704 to 22091.33
- service token drift repair: completion stays 1 -> 1, while packet continuity lowers average total tokens from 24974.67 to 23043
- markdown parser fallback: baseline reviewer_ready_rate = 0.6667, packet continuity reviewer_ready_rate = 1, but with higher average total tokens from 20920.67 to 30203 (supporting completion slice, not a core efficiency slice)

Supporting docs:

Public evidence files:

5-Minute Quickstart

1. Start Aionis Lite

npx @aionis/[email protected] dev
npx @aionis/[email protected] health

Expected Aionis base URL:

http://127.0.0.1:3321

2. Install the Adapter into OpenClaw

openclaw plugins install @aionis/openclaw-adapter
openclaw plugins info openclaw-adapter --json

You should see:

plugin id: openclaw-adapter
status: loaded

3. Add the Minimal OpenClaw Config

Reference examples:

{
  "plugins": {
    "allow": ["openclaw-adapter"],
    "entries": {
      "openclaw-adapter": {
        "enabled": true,
        "config": {
          "baseUrl": "http://127.0.0.1:3321",
          "tenantId": "default",
          "actor": "openclaw",
          "scopeMode": "project",
          "strictToolBlocking": true,
          "replayDispatchEnabled": true,
          "handoffFallbackEnabled": true
        }
      }
    }
  }
}

Use examples/openclaw.json first.

Do not tune the threshold knobs on first install:

maxSteps
maxSameToolStreak
maxDuplicateObservationStreak
maxNoProgressStreak
maxEstimatedTokenBurn
maxBroadTestInvocations
maxBroadScanInvocations

Those are advanced controls for later slice-specific tuning, not required install-time setup.

4. Run a First Turn

openclaw agent --local --message "Inspect the task, avoid broad scans, and proceed carefully." --json

How Aionis Changes an OpenClaw Run

Before the run

Aionis assembles a compact execution context so the model starts from the right task state instead of re-reading the same surface area.

Before a tool call

Aionis applies policy gating. This is where the adapter can suppress:

repeated calls to the same tool
broad repo search when a focused query is enough
broad test runs when a targeted test is available
obviously no-progress paths that should stop or reroute

After a tool call

Aionis writes back:

tool feedback
evidence
loop state updates

That lets later steps reason from actual execution history, not just the transient conversation buffer.

When the run degrades

The adapter can escape through:

replay dispatch when the task matches a reusable path
handoff when the right behavior is to preserve a structured continuation point

What This Product Is

This package gives you:

a reusable AionisLoopControlAdapter
an OpenClaw host binding
policy and loop heuristics for expensive tool paths
replay and handoff orchestration around OpenClaw runs

Current hook coverage:

session_start
session_end
before_agent_start
before_tool_call
after_tool_call
agent_end
tool_result_persist
before_message_write

What It Does Not Claim

This adapter currently controls the tool-loop boundary.

It does not claim to:

control planner-internal reasoning steps that never emit a tool call
solve every OpenClaw failure mode
guarantee token wins on every provider and every task shape

The current evidence is strong on:

tool-loop control
token reduction on benchmarked slices
completion uplift on current benchmark slices, including one-prompt multi-agent workflows
real OpenClaw runtime activity

Verification and Benchmark Commands

Core checks:

npm test
npm run smoke:openclaw-load
npm run smoke:adapter-activity

Benchmarks:

npm run bench:openclaw-ab
npm run bench:live-task
npm run bench:semi-live-token
npm run bench:loader-backed-semi-live-token
npm run bench:completion
npm run bench:google-runtime
npm run bench:google-runtime-ab
npm run bench:real-workflow