@riddledc/openclaw-riddle-proof

v0.4.159

Published

a month ago

OpenClaw wrapper for Riddle Proof evidence-backed agent changes.

0High
0Medium
0Low

joeddjd

@riddledc/openclaw-riddle-proof

OpenClaw wrapper for Riddle Proof: evidence-backed workflows for agent-authored changes.

This package is intentionally separate from @riddledc/openclaw-riddledc. The browser automation plugin stays focused on hosted browser tools while this package owns the Riddle Proof OpenClaw install surface.

Status

Initial wrapper scaffold plus the first engine-harness wiring point. In default mode it normalizes OpenClaw tool parameters through @riddledc/riddle-proof/openclaw, creates the shared run envelope, and returns a blocked result until execution is explicitly configured.

When configured with executionMode: "engine", the wrapper calls the reusable engine harness in @riddledc/riddle-proof. That harness drives the packaged proof-run checkpoint engine in @riddledc/riddle-proof directly. Riddle preview deployment, script execution, job polling, and artifact retrieval also live in @riddledc/riddle-proof/riddle-client; this package re-exports that client for OpenClaw integrations but does not carry a second copy of the artifact logic. By default it still stops at concrete blockers when an agent adapter is not configured. When agentMode: "local_exec" is explicitly set, the wrapper uses a configured local CLI adapter for recon judgment, proof packet authoring, implementation, and proof judgment.

Set proofReviewMode: "main_agent" when the local CLI adapter should still handle recon, proof authoring, and implementation, but final proof judgment should pause for the current OpenClaw agent. In that mode the run blocks at main_agent_proof_review_required with a proof-review packet containing the request, before/after image URLs, visual delta metadata, and a review rubric. The OpenClaw agent can inspect the screenshot evidence in its own conversation context and then resume the same run with riddle_proof_review. The ready verdict is intentionally strict: for visual polish, screenshots must prove a visible reviewer-scale change, not just a code or CSS difference. For smoke/debug runs where the effective ship mode is none, the wrapper can auto-advance this proof-review checkpoint only when the inspection packet already marks the evidence as a ready-to-ship candidate. That removes a handoff loop without allowing a merge, ready mark, or ship action. Set autoReviewShipModeNone: false to force manual review even in non-shipping runs.

This keeps the currently working OpenClaw/Discord proof flow on the public riddle_proof_change path rather than a private skill/plugin prototype.

OC Session Routing

For direct OC harnesses and benchmark tooling that need fresh isolation per run, do not assume openclaw agent --agent ... --session-id ... is sufficient. The package now exports helpers that make the routing decision explicit:

buildOpenClawAgentSessionKey(agentId, sessionId)
buildOpenClawAgentInvocationPlan(request, routingMode)

Supported routing modes:

agent_session_id
gateway_session_key

Current recommendation for isolation-sensitive harnesses:

prefer gateway_session_key
use agent_session_id only when you intentionally want current CLI semantics

This is not a global default claim yet. The decision should be based on harness data, not intuition.

Product Boundary

This package is meant to become the OpenClaw entry point for the full Riddle Proof harness, not just a skill prompt. The valuable path is:

idea -> workspace setup -> agent implementation -> server-backed proof capture
     -> proof judgment -> PR creation -> CI evidence -> integration update

The wrapper owns the OpenClaw tool contract and integration metadata. The reusable harness behind it owns the hard workflow pieces: configured agent execution, Riddle server usage, proof assessment, ship gates, and notifications. The current wrapper owns the public OpenClaw path; private instance repos should only provide deployment-specific defaults and credentials.

OpenClaw participation is expressed separately from the base Riddle Proof loop. The same durable proof run should support three wrapper modes:

interactive: OpenClaw is the visible driver, runs or delegates the agent work, and reports meaningful stage/checkpoint progress back to Discord or the active chat surface.
background_pr: OpenClaw starts the same loop, keeps routine iteration quiet, and wakes the user when a PR is ready for review or a concrete blocker needs a decision. This is the default product mode.
continuous: OpenClaw acts as a guarded long-running development manager that assigns work, requires Riddle Proof evidence, reviews PR/proof/CI, and may merge only under explicit policy controls.

These are OpenClaw UX/policy modes, not separate proof engines. The reusable @riddledc/riddle-proof package remains responsible for stages, checkpoint contracts, evidence recovery, profile/audit runs, Riddle API calls, run cards, proof assessment, and ship gates.

Tool

riddle_proof_change
riddle_proof_status
riddle_proof_inspect
riddle_proof_sync
riddle_proof_review

riddle_proof_change accepts proofed-change-style params such as repo, branch, change_request, verification_mode, assertions_json, and Discord routing metadata. The default ship path should open or update a draft PR, prove the exact commit, wait for CI, and mark the PR ready; leave_draft: true is an explicit escape hatch for debug or intentionally draft-only runs. It returns a RiddleProofRunResult.

Use workflow_mode to select how OpenClaw participates in the proof loop: interactive, background_pr, or continuous. This is distinct from run_mode, which only controls whether the current tool call blocks or returns immediately while the run continues in the background.

For chat surfaces that should not keep one long tool reply open, background mode is the default. Pass run_mode: "blocking" only for deliberate synchronous debugging, or configure defaultRunMode: "blocking" in runtimes that really need the old behavior. In background mode, the tool then writes the wrapper state immediately, returns status: "running" with a state_path, and continues the proof in the gateway process. Any OC interface can poll riddle_proof_status, call riddle_proof_inspect when review evidence is ready, and resume with riddle_proof_review. This is intentionally channel-agnostic: Discord, Telegram, iMessage bridges, and CLIs all consume the same state contract instead of relying on a fragile transport-specific timeout. The normal UX should keep polling out of the main chat turn: use an OC sessions_spawn monitor or host worker when available, and otherwise poll at the returned recommended_poll_after_ms cadence. When a background run settles, the wrapper appends a durable run.wake.requested event with the final status, blocker if any, and suggested next tools. Host integrations can watch that event and re-enter the originating OC session without this package knowing which chat transport is in use.

For pages behind login, pass generic browser auth as JSON strings: auth_localStorage_json, auth_cookies_json, or auth_headers_json. These are forwarded to the proof runtime so previews and script captures can exercise authenticated pages without depending on a site-specific OpenClaw helper. use_auth: true remains available only for deployments that have explicitly configured their own auth helper.

riddle_proof_status accepts a wrapper state_path returned by riddle_proof_change and returns a cheap status snapshot with run id, stage, elapsed time, blocker, worktree path, and latest event. Engine-backed background runs also include active_substep, phase_elapsed_ms, engine_latest_event, engine_runtime_event_count, recommended_poll_after_ms, and a wake_strategy hint so agents and host surfaces can monitor long proof runs without noisy main conversation polling. If the supplied path exists but is not the wrapper run state, the not-found response includes diagnostics that distinguish a missing file from accidentally passing an underlying engine state path.

riddle_proof_inspect accepts the same wrapper state_path and returns a proof-native review packet: route match, repo profile usage, artifact URLs, visual delta, structured proof evidence, semantic anchors, visible text samples, and a concrete next action for the supervising agent. Use it when a run pauses for proof review and the reviewer needs one compact packet instead of stitching together raw state, screenshots, and side inspection tools.

riddle_proof_sync accepts the same wrapper state_path and asks the configured engine to reconcile PR lifecycle state. It is the explicit path for "the PR was merged, update the run": check the PR, record merged/closed/open state, fetch the base branch when configured, safely fast-forward a clean local base checkout when configured, and clean isolated proof worktrees after merge. The sync result includes cleanup_report.base_checkout so operators can see the base worktree, branch, clean state, local/remote heads, and whether the fast-forward ran, skipped, or failed.

riddle_proof_review accepts the wrapper state_path plus a structured main-agent proof verdict. It is intended for runs that stopped at main_agent_proof_review_required; the submitted judgment is passed back to the underlying engine as proof_assessment_json so the workflow can ship, iterate, or escalate without losing run state. If proofReviewMode: "main_agent" and the run is non-shipping (ship_mode: "none" or defaultShipMode: "none"), the wrapper may auto-submit a conservative ready_to_ship proof assessment when riddle_proof_inspect would already return ready_to_ship_candidate: true. This is only a QOL path for held proof runs; shipping modes still require explicit proof review.

Runtime Boundary

The wrapper depends on @riddledc/riddle-proof for contracts and normalization. It does not invoke another OpenClaw plugin and does not supply a coding agent. The reusable engine harness and the local Codex exec adapter are wired behind explicit config. Agent execution remains an adapter boundary: the package does not publish a hosted coding agent or secrets. The configured runtime must supply the local codex CLI environment, repository access, Riddle engine module, and any service credentials needed by that runtime before the wrapper can drive all the way to a PR.

The package should call configured services and credentials at runtime; it must not publish Riddle server secrets, Discord credentials, GitHub tokens, or OpenClaw-instance-specific configuration.

OpenClaw Host Updates

On OpenClaw hosts, do not assume the visible extension root is the only loaded copy of this package. The gateway may load tools from a managed npm project under:

$OPENCLAW_HOME/npm/projects/riddledc-openclaw-riddle-proof-*/node_modules/@riddledc/openclaw-riddle-proof

while an extension root also exists at:

$OPENCLAW_HOME/extensions/openclaw-riddle-proof

Update both locations before judging riddle_proof_status package metadata. Running npm install @riddledc/openclaw-riddle-proof@... inside the extension root is not enough; it nests the wrapper as a dependency instead of replacing the extension root package.

From this repository, run the host-side helper on the OpenClaw machine:

scripts/update-openclaw-riddle-proof-host.sh 0.4.152 0.8.40

The first argument is the wrapper version. The optional second argument updates the shared OpenClaw npm install of @riddledc/riddle-proof. The script replaces the extension root from the npm tarball, updates any managed npm project whose name matches riddledc-openclaw-riddle-proof-*, restarts openclaw-gateway.service, and prints package readback for both install surfaces.

After deployment, verify with a live riddle_proof_status tool call. Disk readback alone is insufficient because a stale managed npm project can continue serving older tool metadata.

Local Exec Adapter

The optional adapter lives in @riddledc/riddle-proof/local-agent; this OpenClaw package only wires it into the OC tool surface. It can be enabled with config like:

{
  "executionMode": "engine",
  "agentMode": "local_exec",
  "codexHome": "/root/.codex",
  "codexSandbox": "workspace-write",
  "proofReviewMode": "main_agent",
  "defaultShipMode": "ship"
}

The current local adapter implementation runs codex exec in the isolated after-worktree supplied by the Riddle Proof engine. It writes no package-time secrets and removes inherited OPENAI_API_KEY from the child process environment so a configured CODEX_HOME login is used unless the host wraps the command differently.

With proofReviewMode: "main_agent", codex exec is not asked to make the final proof judgment. It implements the change and captures proof, then the wrapper returns a review packet for the main OpenClaw agent to judge using the visible screenshots and evidence bundle.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@riddledc/openclaw-riddle-proof

Status

OC Session Routing

Product Boundary

Tool

Runtime Boundary

OpenClaw Host Updates

Local Exec Adapter