@zhihand/openclaw

v0.9.15

Published

2 months ago

OpenClaw host adapter for the ZhiHand control model

0High
0Medium
0Low

zhihand

zhihand openclaw plugin android phone-control

ZhiHand OpenClaw Adapter

This package provides the public OpenClaw-side adapter for ZhiHand.

It is a thin plugin layer on top of the shared ZhiHand control-plane contract.

What It Does

registers one OpenClaw host instance with the deployment control plane
creates QR-based pairing sessions for the ZhiHand mobile app
stores pairing state under the OpenClaw state directory
fetches the latest uploaded phone screen snapshot
sends control commands and waits for command ACK status

Install And First Run

The shortest working setup on a fresh OpenClaw host is:

openclaw plugins install @zhihand/openclaw
openclaw config set plugins.allow '["openclaw"]' --strict-json
openclaw config set tools.allow '["openclaw"]' --strict-json
openclaw doctor --generate-gateway-token
export ZHIHAND_GATEWAY_TOKEN="$(python3 - <<'PY'
import json
from pathlib import Path
config = json.loads((Path.home() / '.openclaw' / 'openclaw.json').read_text())
print(config['gateway']['auth']['token'])
PY
)"
openclaw config set gateway.http.endpoints.responses.enabled true --strict-json
openclaw config set plugins.entries.openclaw.config.gatewayAuthToken "\"$ZHIHAND_GATEWAY_TOKEN\"" --strict-json

Then restart or reload OpenClaw if your deployment requires it.

Why these steps matter:

plugins.allow trusts the plugin id so OpenClaw will load the extension without the fresh-install warning.
tools.allow enables ZhiHand's optional plugin tools for the agent runtime. Without it, mobile chat can answer text but cannot call zhihand_status or zhihand_control.
gateway.http.endpoints.responses.enabled turns on the local OpenClaw POST /v1/responses route. Without it, the plugin can load and pair, but mobile prompts fail with OpenClaw /v1/responses returned 404.
gatewayAuthToken is required for the plugin's native relay into the local OpenClaw POST /v1/responses endpoint.
without gatewayAuthToken, the plugin loads but logs ZhiHand prompt relay disabled... gatewayAuthToken and mobile prompts do not reach the local runtime.

If you already know your gateway token, you can set it directly:

openclaw config set plugins.entries.openclaw.config.gatewayAuthToken '"your-gateway-token"' --strict-json

If you prefer pinned installs for supply-chain stability on a first install, or after deleting the existing extension directory for a reinstall, install an exact published version:

openclaw plugins install @zhihand/openclaw@<version>

Development fallback from a local checkout:

openclaw plugins install --link /path/to/zhihand/packages/host-adapters/openclaw

Recommended discovery paths after npm publication:

package README
OpenClawDir or another community plugin directory
external catalogs when the host deployment supports them

Do not assume a first-party plugin store UI is the only distribution path.

Expected Warnings

These warnings are normal during setup and tell you what is still missing:

plugins.allow is empty Run openclaw config set plugins.allow '["openclaw"]' --strict-json.
ZhiHand optional tools are not enabled for OpenClaw agent Run openclaw config set tools.allow '["openclaw"]' --strict-json, or add tools.allow: ["openclaw"] to the dedicated mobile agent in agents.list.
OpenClaw /v1/responses returned 404 Run openclaw config set gateway.http.endpoints.responses.enabled true --strict-json, then restart the gateway.
ZhiHand prompt relay disabled ... gatewayAuthToken Set plugins.entries.openclaw.config.gatewayAuthToken to your current OpenClaw gateway token.

These are OpenClaw deployment warnings, not ZhiHand plugin install failures:

gateway.trusted_proxies_missing
origin not allowed
Control UI browser pairing prompts

Minimal plugin config example:

{
  "plugins": {
    "allow": ["openclaw"],
    "entries": {
      "openclaw": {
        "enabled": true,
        "config": {
          "gatewayAuthToken": "set-this-to-your-openclaw-gateway-token"
        }
      }
    }
  }
}

What is not a plugin prerequisite:

Control UI auth mode choices such as password vs token
gateway.controlUi.allowedOrigins
browser device pairing / Control UI login

Those belong to the OpenClaw gateway deployment itself. ZhiHand only needs the current gateway token value for plugins.entries.openclaw.config.gatewayAuthToken; it does not require you to set up the Control UI, browser pairing, or allowed origins before the plugin can load and relay prompts.

OpenClaw Plugin Config

The plugin reads its config from:

plugins.entries.openclaw.config

Supported fields:

controlPlaneEndpoint
originListener
displayName
stableIdentity
pairingTTLSeconds
appDownloadURL
gatewayResponsesEndpoint
gatewayAuthToken
mobileAgentId
updateCheckEnabled
updateCheckIntervalHours
requestedScopes

Normal hosted deployments can leave most fields empty.

Recommended minimum:

only gatewayAuthToken

Example:

{
  "plugins": {
    "allow": ["openclaw"],
    "entries": {
      "openclaw": {
        "enabled": true,
        "config": {
          "gatewayAuthToken": "set-this-in-deployment"
        }
      }
    }
  }
}

CLI equivalent for the allowlist and plugin token steps:

openclaw config set plugins.allow '["openclaw"]' --strict-json
openclaw config set tools.allow '["openclaw"]' --strict-json
openclaw config set plugins.entries.openclaw.config.gatewayAuthToken '"your-gateway-token"' --strict-json

Advanced self-host example:

{
  "plugins": {
    "allow": ["openclaw"],
    "entries": {
      "openclaw": {
        "enabled": true,
        "config": {
          "controlPlaneEndpoint": "https://api.example.com",
          "originListener": "https://host.example.zhihand.com",
          "displayName": "ZhiHand @ example-host",
          "stableIdentity": "openclaw-zhihand:example-host",
          "pairingTTLSeconds": 600,
          "appDownloadURL": "https://zhihand.com/download",
          "gatewayResponsesEndpoint": "http://127.0.0.1:18789/v1/responses",
          "gatewayAuthToken": "set-this-in-deployment",
          "mobileAgentId": "zhihand-mobile",
          "requestedScopes": [
            "observe",
            "session.control",
            "screen.read",
            "screen.capture",
            "ble.control"
          ]
        }
      }
    }
  }
}

Defaults:

controlPlaneEndpoint: https://api.zhihand.com
pairingTTLSeconds: 600
appDownloadURL: https://zhihand.com/download
gatewayResponsesEndpoint: http://127.0.0.1:18789/v1/responses
mobileAgentId: zhihand-mobile
updateCheckEnabled: true
updateCheckIntervalHours: 24
requestedScopes: recommended ZhiHand defaults
stableIdentity: auto-generated from hostname
originListener: optional; the control plane can fill a default host metadata value

Do not store secrets in this package or this public repository.

Best Practice

Use a dedicated OpenClaw agent/runtime path for ZhiHand mobile prompts.

normal chat and phone-operation requests should use the same OpenClaw agent
the plugin should stay thin and only provide pairing, tools, and relay glue
zhihand_* tools should be registered as optional and enabled only for the dedicated mobile agent
do not reintroduce a plugin-owned planner loop or direct codex exec orchestration inside this public plugin

Recommended deployment shape:

{
  "agents": {
    "list": [
      {
        "id": "zhihand-mobile",
        "model": "openai-codex/gpt-5.4",
        "tools": {
          "allow": ["openclaw"]
        }
      }
    ]
  }
}

Why this is the preferred path:

official OpenClaw plugin docs expect tools to be exposed to the agent runtime
official OpenClaw CLI backend docs treat codex-cli/* as text-only fallback paths where tools are disabled
keeping the planner inside the native runtime preserves gateway policy, auditability, and tool scoping

Deployment requirements for the native runtime path:

the OpenClaw gateway must expose local POST /v1/responses
the deployment must provide a gateway bearer token to the plugin
the dedicated ZhiHand mobile agent must use a tool-capable provider model such as openai-codex/gpt-5.4, not codex-cli/*
if these native-runtime prerequisites are missing, the prompt relay stays disabled and logs the configuration error during startup

OpenAI Computer Tool Status

openai-codex/gpt-5.4 is still the recommended model for the ZhiHand mobile agent, but the current OpenClaw relay path does not expose OpenAI's native tools: [{ "type": "computer" }] workflow.

Current behavior:

ZhiHand sends mobile prompts into local OpenClaw POST /v1/responses
OpenClaw's hosted-tool surface currently accepts function tools only
the mobile agent therefore uses zhihand_screen_read and zhihand_control, not OpenAI computer_call / computer_call_output

Implication:

you can use openai-codex/gpt-5.4 for better reasoning and screenshot understanding
you cannot assume OpenClaw will automatically switch to OpenAI's native computer tool loop

Using the GA OpenAI computer tool would require either:

upstream OpenClaw support for computer / computer_call_output, or
a separate direct-to-OpenAI harness that bypasses local OpenClaw /v1/responses

That direct harness is intentionally not the public ZhiHand/OpenClaw contract today.

Release Shape

Recommended first public release:

mobile app
hosted pair.zhihand.com and api.zhihand.com
npm-published OpenClaw plugin

For non-OpenClaw hosts, publish additional thin adapters on top of the same control-plane contract instead of growing this package into a multi-host shell.

Slash Commands

/zhihand pair
/zhihand status
/zhihand unpair
/zhihand update
/zhihand update check

/zhihand pair returns a browser-first pairing summary:

app download URL
QR URL

Open the QR URL in a browser to display the actual scannable QR page.

Plugin update behavior:

on startup, the plugin checks npm for a newer published version by default
/zhihand update check forces a fresh version lookup and prints the result
/zhihand update prints the recommended host-side update command
the preferred host-side update command is openclaw plugins update openclaw

Recommended host-side update command:

openclaw plugins update openclaw

For an installed plugin, upgrade with openclaw plugins update openclaw. Reserve openclaw plugins install @zhihand/openclaw@<version> for a first install or a reinstall after removal.

The current hosted control path is:

HTTP requests for pairing, uploads, acknowledgements, and control writes
SSE downlink for prompt, reply, and command events
per-device profile snapshots so the host can adapt behavior by runtime family

Tools

zhihand_pair
zhihand_status
zhihand_screen_read
zhihand_control

zhihand_control supports:

click
long_click
move
move_to
swipe
back
home
enter
input_text
open_app
set_clipboard
start_live_capture
stop_live_capture

Coordinate rules:

click, long_click, and move_to use xRatio and yRatio in [0,1] from the latest screenshot.
swipe uses x1Ratio, y1Ratio, x2Ratio, and y2Ratio in [0,1].
move uses dxRatio and dyRatio in [-1,1] for relative pointer deltas.
Do not send raw screenshot pixel coordinates through the public tool API.
zhihand_screen_read should be treated as fresh-only visual state. If the latest uploaded snapshot is stale, the tool fails instead of letting the agent click from an old frame.
When a keyboard is visible and the goal is to submit search, send, or confirm text, prefer enter over clicking the IME action button.
input_text supports mode:
- auto: current default, resolved on the mobile runtime as paste
- paste: clipboard-first plus HID paste shortcut
- type: raw HID keyboard typing, reserved for sensitive fields or when paste fails
input_text also supports submit=true to send Enter immediately after the text input completes.
auto and paste overwrite the mobile runtime clipboard as part of the reliability trade-off. Use type for sensitive fields or when clipboard mutation is not acceptable.

State Files

Relative to the OpenClaw state directory:

plugins/openclaw/state.json stored pairing state for the host instance
plugins/openclaw/latest-screen.jpg last fetched screen snapshot cache

The adapter may automatically advance local pairing state to the latest claimed session for the same host edge when the stored pairing becomes stale. This is a host-side recovery path and does not change the public QR claim flow.

Pairing Flow

The host registers itself against the control plane.
The plugin creates a pairing session and pair URL.
The pair URL is the canonical QR landing page; browsers render a scannable HTML page, while the mobile app resolves the same URL in JSON mode.
The mobile app scans the QR code and claims the pairing session.
The control plane returns a long-lived mobile credential.
OpenClaw can then use zhihand_status, zhihand_screen_read, and zhihand_control.
If the phone later claims a newer pairing session for the same host edge, the adapter can recover forward to that latest claimed session instead of staying pinned to an older local credential.

Mobile Prompt Path

The supported runtime path is:

The mobile app uploads a prompt to the control plane.
The mobile app may also upload prompt attachments before the prompt itself.
The OpenClaw plugin polls pending prompts.
The plugin downloads any prompt attachments from the control plane.
The plugin prepares multimodal native-agent input:
- images become input_image
- supported documents become input_file
- audio attachments are transcribed into text context
- video attachments stay limited context and may use preview images
The plugin forwards the prepared prompt to the local OpenClaw POST /v1/responses endpoint for the dedicated mobile agent.
The dedicated mobile agent decides whether to answer directly or call zhihand_status, zhihand_screen_read, and zhihand_control.
The plugin writes the final assistant reply back to the control plane.

Task cancellation also uses this same path:

If the mobile app marks the active prompt as cancelled, the plugin aborts the in-flight native mobile-agent run.
The final reply for that prompt becomes a system message indicating that the user stopped the task.

Capture Constraint

zhihand_screen_read returns the latest uploaded snapshot, not a live video stream.

start_live_capture may return a permission-required result until the mobile app app already has an active screen-capture session.

Attachment Best Practice

Preferred handling:

images and documents remain raw attachments
voice notes remain raw audio attachments and are transcribed on the host
The mobile app should not treat app-local speech-to-text as the canonical contract
video support is intentionally conservative and should be treated as limited context until the deployment adds explicit video understanding