@riddledc/openclaw-riddle-proof
v0.4.134
Published
OpenClaw wrapper for Riddle Proof evidence-backed agent changes.
Readme
@riddledc/openclaw-riddle-proof
OpenClaw wrapper for Riddle Proof: evidence-backed workflows for agent-authored changes.
This package is intentionally separate from @riddledc/openclaw-riddledc.
The browser automation plugin stays focused on hosted browser tools while this
package owns the Riddle Proof OpenClaw install surface.
Status
Initial wrapper scaffold plus the first engine-harness wiring point. In default
mode it normalizes OpenClaw tool parameters through
@riddledc/riddle-proof/openclaw, creates the shared run envelope, and returns
a blocked result until execution is explicitly configured.
When configured with executionMode: "engine", the wrapper calls the reusable
engine harness in @riddledc/riddle-proof.
That harness drives the packaged proof-run checkpoint engine in
@riddledc/riddle-proof directly.
Riddle preview deployment, script execution, job polling, and artifact retrieval
also live in @riddledc/riddle-proof/riddle-client; this package re-exports
that client for OpenClaw integrations but does not carry a second copy of the
artifact logic.
By default it still stops at concrete blockers when an agent adapter is not
configured. When agentMode: "local_exec" is explicitly set, the wrapper uses a
configured local CLI adapter for recon judgment, proof packet authoring,
implementation, and proof judgment.
Set proofReviewMode: "main_agent" when the local CLI adapter should still
handle recon, proof authoring, and implementation, but final proof judgment
should pause for the current OpenClaw agent. In that mode the run blocks at
main_agent_proof_review_required with a proof-review packet containing the
request, before/after image URLs, visual delta metadata, and a review rubric.
The OpenClaw agent can inspect the screenshot evidence in its own conversation
context and then resume the same run with riddle_proof_review.
The ready verdict is intentionally strict: for visual polish, screenshots must
prove a visible reviewer-scale change, not just a code or CSS difference.
For smoke/debug runs where the effective ship mode is none, the wrapper can
auto-advance this proof-review checkpoint only when the inspection packet already
marks the evidence as a ready-to-ship candidate. That removes a handoff loop
without allowing a merge, ready mark, or ship action. Set
autoReviewShipModeNone: false to force manual review even in non-shipping
runs.
This keeps the currently working OpenClaw/Discord proof flow on the public
riddle_proof_change path rather than a private skill/plugin prototype.
OC Session Routing
For direct OC harnesses and benchmark tooling that need fresh isolation per run,
do not assume openclaw agent --agent ... --session-id ... is sufficient.
The package now exports helpers that make the routing decision explicit:
buildOpenClawAgentSessionKey(agentId, sessionId)buildOpenClawAgentInvocationPlan(request, routingMode)
Supported routing modes:
agent_session_idgateway_session_key
Current recommendation for isolation-sensitive harnesses:
- prefer
gateway_session_key - use
agent_session_idonly when you intentionally want current CLI semantics
This is not a global default claim yet. The decision should be based on harness data, not intuition.
Product Boundary
This package is meant to become the OpenClaw entry point for the full Riddle Proof harness, not just a skill prompt. The valuable path is:
idea -> workspace setup -> agent implementation -> server-backed proof capture
-> proof judgment -> PR creation -> CI evidence -> integration updateThe wrapper owns the OpenClaw tool contract and integration metadata. The reusable harness behind it owns the hard workflow pieces: configured agent execution, Riddle server usage, proof assessment, ship gates, and notifications. The current wrapper owns the public OpenClaw path; private instance repos should only provide deployment-specific defaults and credentials.
OpenClaw participation is expressed separately from the base Riddle Proof loop. The same durable proof run should support three wrapper modes:
interactive: OpenClaw is the visible driver, runs or delegates the agent work, and reports meaningful stage/checkpoint progress back to Discord or the active chat surface.background_pr: OpenClaw starts the same loop, keeps routine iteration quiet, and wakes the user when a PR is ready for review or a concrete blocker needs a decision. This is the default product mode.continuous: OpenClaw acts as a guarded long-running development manager that assigns work, requires Riddle Proof evidence, reviews PR/proof/CI, and may merge only under explicit policy controls.
These are OpenClaw UX/policy modes, not separate proof engines. The reusable
@riddledc/riddle-proof package remains responsible for stages, checkpoint
contracts, evidence recovery, profile/audit runs, Riddle API calls, run cards,
proof assessment, and ship gates.
Tool
riddle_proof_changeriddle_proof_statusriddle_proof_inspectriddle_proof_syncriddle_proof_review
riddle_proof_change accepts proofed-change-style params such as repo,
branch, change_request, verification_mode, assertions_json, and Discord
routing metadata. The default ship path should open or update a draft PR, prove
the exact commit, wait for CI, and mark the PR ready; leave_draft: true is an
explicit escape hatch for debug or intentionally draft-only runs. It returns a
RiddleProofRunResult.
Use workflow_mode to select how OpenClaw participates in the proof loop:
interactive, background_pr, or continuous. This is distinct from
run_mode, which only controls whether the current tool call blocks or returns
immediately while the run continues in the background.
For chat surfaces that should not keep one long tool reply open, background mode
is the default. Pass run_mode: "blocking" only for deliberate synchronous
debugging, or configure defaultRunMode: "blocking" in runtimes that really
need the old behavior. In background mode, the tool
then writes the wrapper state immediately, returns status: "running" with a
state_path, and continues the proof in the gateway process. Any OC interface
can poll riddle_proof_status, call riddle_proof_inspect when review evidence
is ready, and resume with riddle_proof_review. This is intentionally
channel-agnostic: Discord, Telegram, iMessage bridges, and CLIs all consume the
same state contract instead of relying on a fragile transport-specific timeout.
The normal UX should keep polling out of the main chat turn: use an OC
sessions_spawn monitor or host worker when available, and otherwise poll at
the returned recommended_poll_after_ms cadence.
When a background run settles, the wrapper appends a durable
run.wake.requested event with the final status, blocker if any, and suggested
next tools. Host integrations can watch that event and re-enter the originating
OC session without this package knowing which chat transport is in use.
For pages behind login, pass generic browser auth as JSON strings:
auth_localStorage_json, auth_cookies_json, or auth_headers_json. These
are forwarded to the proof runtime so previews and script captures can exercise
authenticated pages without depending on a site-specific OpenClaw helper.
use_auth: true remains available only for deployments that have explicitly
configured their own auth helper.
riddle_proof_status accepts a wrapper state_path returned by
riddle_proof_change and returns a cheap status snapshot with run id, stage,
elapsed time, blocker, worktree path, and latest event. Engine-backed background
runs also include active_substep, phase_elapsed_ms, engine_latest_event,
engine_runtime_event_count, recommended_poll_after_ms, and a wake_strategy
hint so agents and host surfaces can monitor long proof runs without noisy main
conversation polling.
If the supplied path exists but is not the wrapper run state, the not-found
response includes diagnostics that distinguish a missing file from accidentally
passing an underlying engine state path.
riddle_proof_inspect accepts the same wrapper state_path and returns a
proof-native review packet: route match, repo profile usage, artifact URLs,
visual delta, structured proof evidence, semantic anchors, visible text samples,
and a concrete next action for the supervising agent. Use it when a run pauses
for proof review and the reviewer needs one compact packet instead of stitching
together raw state, screenshots, and side inspection tools.
riddle_proof_sync accepts the same wrapper state_path and asks the configured
engine to reconcile PR lifecycle state. It is the explicit path for "the PR was
merged, update the run": check the PR, record merged/closed/open state, fetch the
base branch when configured, safely fast-forward a clean local base checkout
when configured, and clean isolated proof worktrees after merge. The sync result
includes cleanup_report.base_checkout so operators can see the base worktree,
branch, clean state, local/remote heads, and whether the fast-forward ran,
skipped, or failed.
riddle_proof_review accepts the wrapper state_path plus a structured
main-agent proof verdict. It is intended for runs that stopped at
main_agent_proof_review_required; the submitted judgment is passed back to the
underlying engine as proof_assessment_json so the workflow can ship, iterate,
or escalate without losing run state.
If proofReviewMode: "main_agent" and the run is non-shipping
(ship_mode: "none" or defaultShipMode: "none"), the wrapper may auto-submit
a conservative ready_to_ship proof assessment when riddle_proof_inspect
would already return ready_to_ship_candidate: true. This is only a QOL path
for held proof runs; shipping modes still require explicit proof review.
Runtime Boundary
The wrapper depends on @riddledc/riddle-proof for contracts and normalization.
It does not invoke another OpenClaw plugin and does not supply a coding agent.
The reusable engine harness and the local Codex exec adapter are wired behind
explicit config. Agent execution remains an adapter boundary: the package does
not publish a hosted coding agent or secrets. The configured runtime must supply
the local codex CLI environment, repository access, Riddle engine module, and
any service credentials needed by that runtime before the wrapper can drive all
the way to a PR.
The package should call configured services and credentials at runtime; it must not publish Riddle server secrets, Discord credentials, GitHub tokens, or OpenClaw-instance-specific configuration.
Local Exec Adapter
The optional adapter lives in @riddledc/riddle-proof/local-agent; this
OpenClaw package only wires it into the OC tool surface. It can be enabled with
config like:
{
"executionMode": "engine",
"agentMode": "local_exec",
"codexHome": "/root/.codex",
"codexSandbox": "workspace-write",
"proofReviewMode": "main_agent",
"defaultShipMode": "ship"
}The current local adapter implementation runs codex exec in the isolated
after-worktree supplied by the Riddle Proof engine. It writes no package-time
secrets and removes inherited OPENAI_API_KEY from the child process
environment so a configured CODEX_HOME login is used unless the host wraps
the command differently.
With proofReviewMode: "main_agent", codex exec is not asked to make the
final proof judgment. It implements the change and captures proof, then the
wrapper returns a review packet for the main OpenClaw agent to judge using the
visible screenshots and evidence bundle.
