tauri-plugin-agent-control

v0.1.0

Published

a month ago

Dev-only HTTP bridge for observing and controlling Tauri webviews — like Chrome DevTools Protocol, but for Tauri.

0High
0Medium
0Low

palanikannan1437

tauri agent devtools testing automation e2e

tauri-plugin-agent-control

Dev-only HTTP bridge for Tauri webviews — like Chrome DevTools Protocol, but for Tauri.

What it does

tauri-plugin-agent-control exposes your Tauri webview over a local HTTP API at http://localhost:9876. You can snapshot the DOM, click buttons, fill inputs, evaluate JavaScript, intercept network requests, take screenshots, and more — all via simple curl commands or any HTTP client. The server only runs in debug builds, so it's completely stripped from production.

Installation

1. Add the Rust dependency

From crates.io:

[dependencies]
tauri-plugin-agent-control = "0.1"

Or from git:

[dependencies]
tauri-plugin-agent-control = { git = "https://github.com/palanik1/tauri-plugin-agent-control" }

2. Register the plugin

In your src-tauri/src/main.rs (or lib.rs):

fn main() {
    tauri::Builder::default()
        .plugin(tauri_plugin_agent_control::init())
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
}

3. Add capability permissions

In your src-tauri/capabilities/default.json (create it if it doesn't exist):

{
  "identifier": "default",
  "windows": ["main"],
  "permissions": [
    "agent-control:default"
  ]
}

agent-control:default grants allow-agent-control-respond — the only Tauri command the plugin registers. It allows the JS shim to send eval results back to the Rust HTTP server.

4. Run in dev mode

cargo tauri dev

The HTTP bridge starts automatically on http://localhost:9876 in debug builds. Verify it's running:

curl -s http://localhost:9876/health
# {"ok":true}

Production builds: The server is completely stripped — it's gated behind #[cfg(debug_assertions)]. No code, no port, no surface area.

AI Agent Setup

Install the Amp/AI agent skill

Copy the SKILL.md from this package into your project's skill directory:

# Project-level (recommended)
mkdir -p .agents/skills/tauri-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md .agents/skills/tauri-agent-control/SKILL.md

# Or user-level (available across all projects)
mkdir -p ~/.config/agents/skills/tauri-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md ~/.config/agents/skills/tauri-agent-control/SKILL.md

Or install via npm and copy:

npm install tauri-plugin-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md .agents/skills/tauri-agent-control/SKILL.md

Once installed, your AI agent (Amp, Claude Code, etc.) can interact with your running Tauri app via natural language:

"Take a screenshot of the app"
"Click the Submit button"
"Fill in the email field with [email protected]"
"Check if the sidebar is visible"

The agent uses the skill to translate these into the appropriate curl commands to the HTTP bridge.

Quick Start

1. Snapshot the page to see what's on screen:

curl http://localhost:9876/snapshot?format=compact

[page] My App — http://localhost:1420/
  @e1 [button] "Submit" role="button"
  @e2 [input type="text"] placeholder="Enter name" name="username"
  @e3 [a] "Home" href="/"

2. Interact with elements using their refs:

# Fill an input
curl -X POST http://localhost:9876/fill \
  -d '{"ref":"@e2","text":"hello world"}'

# Click a button
curl -X POST http://localhost:9876/click \
  -d '{"ref":"@e1"}'

3. Re-snapshot to see the updated state:

curl http://localhost:9876/snapshot?format=compact

4. Evaluate arbitrary JavaScript:

curl -X POST http://localhost:9876/eval \
  -d '{"code":"document.title"}'

API Reference

Snapshot

| Method | Path | Query / Body | Description | |--------|------|-------------|-------------| | GET | /snapshot | ?format=compact\|json &scope=<selector> &depth=<n> &interactive=true\|false | Snapshot visible elements. compact returns a text tree, json returns full element data. interactive=false includes all elements, not just interactive ones. |

Interactions

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /click | {"ref":"@e1"} | Click an element | | POST | /dblclick | {"ref":"@e1"} | Double-click an element | | POST | /hover | {"ref":"@e1"} | Hover over an element (mouseenter + mouseover) | | POST | /focus | {"ref":"@e1"} | Focus an element | | POST | /fill | {"ref":"@e1","text":"value"} | Set input/textarea value directly (fires input + change) | | POST | /type | {"ref":"@e1","text":"value"} | Type text character-by-character (fires keydown/keypress/input/keyup per char) | | POST | /press | {"key":"Enter"} | Press a key. Supports modifiers: "Control+a", "Meta+Shift+z" | | POST | /check | {"ref":"@e1"} | Check a checkbox | | POST | /uncheck | {"ref":"@e1"} | Uncheck a checkbox | | POST | /select | {"ref":"@e1","values":["opt1","opt2"]} | Select options in a <select> element | | POST | /scroll | {"direction":"down","amount":500} | Scroll the page. Optional "selector" to scroll a container. Directions: up, down, left, right | | POST | /scrollintoview | {"ref":"@e1"} | Scroll an element into view (smooth, centered) | | POST | /drag | {"from":"@e1","to":"@e2"} | Drag and drop between two elements | | POST | /upload | {"ref":"@e1","dataUrl":"data:..."} | Upload a file to a file input via base64 data URL |

Getters

| Method | Path | Description | |--------|------|-------------| | GET | /get/title | Get document.title | | GET | /get/url | Get location.href | | GET | /get/text/{ref} | Get textContent of an element | | GET | /get/html/{ref} | Get innerHTML of an element | | GET | /get/value/{ref} | Get value of an input/textarea | | GET | /get/attr/{ref}/{attr} | Get an attribute value | | GET | /get/styles/{ref} | Get computed styles (font, color, display, position, etc.) | | GET | /get/box/{ref} | Get bounding box {x, y, width, height} | | GET | /get/count/{selector} | Count elements matching a CSS selector (URL-encode the selector) |

State Checks

| Method | Path | Description | |--------|------|-------------| | GET | /is/visible/{ref} | Check if element is visible | | GET | /is/enabled/{ref} | Check if element is enabled (not disabled) | | GET | /is/checked/{ref} | Check if checkbox/radio is checked |

Wait

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /wait | {"ms":1000} | Wait for a fixed delay | | POST | /wait | {"selector":".loaded"} | Wait for a CSS selector to appear | | POST | /wait | {"text":"Success"} | Wait for text to appear on the page | | POST | /wait | {"url":"**/dashboard"} | Wait for URL to match (glob patterns supported) | | POST | /wait | {"fn":"document.querySelector('.x')"} | Wait for a JS expression to be truthy | | POST | /wait | {"selector":".x","timeout":10000} | Custom timeout (default: 5000ms) |

Eval

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /eval | {"code":"document.title"} | Evaluate arbitrary JavaScript and return the result |

Console / Errors

| Method | Path | Description | |--------|------|-------------| | GET | /console | Get and drain captured console logs (log, warn, error, info, debug) | | GET | /errors | Get and drain captured console errors only |

Cookies

| Method | Path | Body | Description | |--------|------|------|-------------| | GET | /cookies | — | Get all cookies as {name: value} | | POST | /cookies/set | {"name":"x","value":"y"} | Set a cookie | | POST | /cookies/clear | — | Clear all cookies |

Storage

| Method | Path | Body | Description | |--------|------|------|-------------| | GET | /storage/local | — | Get all localStorage entries | | GET | /storage/session | — | Get all sessionStorage entries | | GET | /storage/local/{key} | — | Get a single localStorage key | | GET | /storage/session/{key} | — | Get a single sessionStorage key | | POST | /storage/set | {"type":"local","key":"k","value":"v"} | Set a storage entry. type is "local" or "session" | | POST | /storage/clear | {"type":"local"} | Clear all entries for a storage type |

Navigation

| Method | Path | Description | |--------|------|-------------| | POST | /back | Navigate back (history.back()) | | POST | /forward | Navigate forward (history.forward()) | | POST | /reload | Reload the page (location.reload()) |

Find (Semantic Locators)

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /find | {"by":"role","value":"button","name":"Submit"} | Find by ARIA role with optional accessible name | | POST | /find | {"by":"text","value":"Hello"} | Find by visible text content | | POST | /find | {"by":"label","value":"Email"} | Find by associated label text or aria-label | | POST | /find | {"by":"placeholder","value":"Search..."} | Find by placeholder attribute | | POST | /find | {"by":"testid","value":"submit-btn"} | Find by data-testid | | POST | /find | {"by":"alt","value":"Logo"} | Find by alt attribute | | POST | /find | {"by":"title","value":"Close"} | Find by title attribute | | POST | /find | {"by":"first","value":"button"} | First element matching selector | | POST | /find | {"by":"last","value":".item"} | Last element matching selector | | POST | /find | {"by":"nth","value":"li","index":2} | Nth element matching selector (0-based) |

All /find calls return {"ref":"@eN","tag":"...","text":"..."} and accept optional "action" (click, fill, type, hover, focus), "text" (for fill/type), and "exact":true for exact text matching.

Mouse / Keyboard

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /mouse | {"action":"move","x":100,"y":200} | Move mouse to coordinates | | POST | /mouse | {"action":"down","x":100,"y":200,"button":"left"} | Mouse button down | | POST | /mouse | {"action":"up","x":100,"y":200} | Mouse button up | | POST | /mouse | {"action":"wheel","deltaY":100} | Mouse wheel scroll | | POST | /keydown | {"key":"Shift"} | Dispatch keydown event | | POST | /keyup | {"key":"Shift"} | Dispatch keyup event |

Highlight

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /highlight | {"ref":"@e1"} | Visually highlight an element with a red overlay (2s duration) |

Dialogs

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /dialog | {"action":"accept"} | Auto-accept future alert/confirm/prompt dialogs | | POST | /dialog | {"action":"accept","text":"input"} | Accept with custom prompt text | | POST | /dialog | {"action":"dismiss"} | Auto-dismiss future dialogs | | GET | /dialogs | — | Get and drain captured dialog events |

Viewport

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /viewport | {"width":375,"height":812} | Resize the webview window |

State

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /state/save | — | Save current localStorage, sessionStorage, and cookies | | POST | /state/load | {"localStorage":{...},"sessionStorage":{...},"cookies":{...}} | Restore previously saved state |

Screenshot

| Method | Path | Query | Description | |--------|------|-------|-------------| | GET | /screenshot | ?path=/tmp/shot.png &full=true | Take a screenshot. Default path: /tmp/agent-control-screenshot-<ts>.png. full=true resizes to full page dimensions first. |

Recording

| Method | Path | Description | |--------|------|-------------| | POST | /record/start | Start screen recording (macOS screencapture -v). Returns {pid, path} | | POST | /record/stop | Stop recording (sends SIGINT to screencapture process). Returns {path} |

Download

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /download | {"url":"https://...","path":"/tmp/file.zip"} | Download a file using curl |

Network

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /network/intercept | — | Start intercepting fetch/XHR requests | | POST | /network/reset | — | Stop intercepting and restore original fetch/XHR | | GET | /network/requests | — | Get captured network request log | | POST | /network/route | {"url":"/api/data","body":"{\"mock\":true}"} | Mock responses for matching URLs | | POST | /network/route | {"url":"/api/data","abort":true} | Block requests to matching URLs |

Geolocation

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /geo | {"lat":37.7749,"lng":-122.4194} | Override navigator.geolocation |

Offline

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /offline | {"enabled":true} | Simulate offline mode (navigator.onLine → false) |

Headers

| Method | Path | Body | Description | |--------|------|------|-------------| | POST | /headers | {"headers":{"Authorization":"Bearer token"}} | Inject custom headers into all future fetch requests |

Trace

| Method | Path | Description | |--------|------|-------------| | POST | /trace/start | Start a PerformanceObserver trace (resource, paint, LCP, layout-shift, etc.) | | POST | /trace/stop | Stop tracing and return collected performance entries |

Health

| Method | Path | Description | |--------|------|-------------| | GET | /health | Returns {"ok":true} — useful to check if the agent-control server is running |

Ref Lifecycle

Every call to /snapshot or /find assigns elements a ref like @e1, @e2, etc. These refs are invalidated whenever you take a new snapshot — the ref map is cleared and rebuilt from scratch. If the page navigates or the DOM changes significantly, take a fresh snapshot before interacting.

# 1. Snapshot → get refs
curl http://localhost:9876/snapshot?format=compact
# @e1 [button] "Save"

# 2. Use the ref
curl -X POST http://localhost:9876/click -d '{"ref":"@e1"}'

# 3. After DOM changes, snapshot again for fresh refs
curl http://localhost:9876/snapshot?format=compact
# @e1 [button] "Saved ✓"   ← refs are reassigned

Semantic Locators (Find)

The /find endpoint lets you locate elements without needing a snapshot first. It assigns a new ref to the found element and optionally performs an action in one call.

# Find a button by role and click it
curl -X POST http://localhost:9876/find \
  -d '{"by":"role","value":"button","name":"Submit","action":"click"}'

# Find an input by label and fill it
curl -X POST http://localhost:9876/find \
  -d '{"by":"label","value":"Email","action":"fill","text":"[email protected]"}'

# Find by test ID
curl -X POST http://localhost:9876/find \
  -d '{"by":"testid","value":"search-input","action":"type","text":"query"}'

# Find the 3rd list item (0-based index)
curl -X POST http://localhost:9876/find \
  -d '{"by":"nth","value":"li","index":2}'

Supported locator strategies: role, text, label, placeholder, testid, alt, title, first, last, nth.

Use "exact":true for exact text/name matching instead of substring matching.

Network Interception

Intercept, mock, and block network requests made by the webview:

# Start intercepting
curl -X POST http://localhost:9876/network/intercept

# Mock an API response
curl -X POST http://localhost:9876/network/route \
  -d '{"url":"/api/users","body":"{\"users\":[{\"name\":\"Alice\"}]}"}'

# Block requests to analytics
curl -X POST http://localhost:9876/network/route \
  -d '{"url":"analytics.example.com","abort":true}'

# View captured requests
curl http://localhost:9876/network/requests

# Reset (restore original fetch/XHR)
curl -X POST http://localhost:9876/network/reset

The interceptor patches window.fetch and XMLHttpRequest. Tauri IPC calls (ipc:// URLs) are automatically excluded to avoid breaking invoke().

Security

Debug-only: The HTTP server is gated behind #[cfg(debug_assertions)] and is completely compiled out of release builds.
Localhost-only: The server listens on localhost:9876. It is not accessible from other machines.
Not for production: This plugin gives full control over the webview — eval, DOM manipulation, cookie/storage access. Never ship it in a release build.

Platform Notes

| Feature | Platform | Dependency | |---------|----------|------------| | Screenshot (/screenshot) | macOS only | screencapture (built-in) | | Recording (/record/start, /record/stop) | macOS only | screencapture -v (built-in) | | Download (/download) | Cross-platform | curl (must be on PATH) |

Architecture

The plugin is composed of three parts:

Rust HTTP server — A raw TCP server built with tokio::net::TcpListener that manually parses HTTP/1.1 requests. It runs as a spawned async task during plugin setup (debug builds only).
JS shim (guest-js/index.js) — Embedded at compile time via include_str!("../guest-js/index.js"). It's injected into the webview on every navigation using Tauri's on_navigation hook with a 200ms delay to let the new page settle. The shim exposes window.__DEVTOOLS__ with all DOM interaction methods.
Eval bridge — The communication channel between Rust and JS:
- Rust generates a unique 16-hex-char request ID
- Rust calls webview.eval() with wrapped JS that executes the expression
- The JS result is sent back via window.__TAURI_INTERNALS__.invoke('plugin:agent-control|agent_control_respond', {requestId, data})
- Rust receives the result through a tokio::sync::oneshot channel
- A configurable timeout (default 5s) cleans up pending requests

┌──────────┐   HTTP    ┌────────────┐  eval()   ┌───────────┐
│  Client  │ ───────── │ Rust TCP   │ ────────── │  Webview  │
│ (curl)   │           │ Server     │            │  (JS)     │
└──────────┘           └────────────┘  invoke()  └───────────┘
                             ▲ ──────────────────────┘
                             │  agent_control_respond(id, data)
                             │  via oneshot channel

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tauri-plugin-agent-control

What it does

Installation

1. Add the Rust dependency

2. Register the plugin

3. Add capability permissions

4. Run in dev mode

AI Agent Setup

Install the Amp/AI agent skill

Quick Start

API Reference

Snapshot

Interactions

Getters

State Checks

Wait

Eval

Console / Errors

Cookies

Storage

Navigation

Find (Semantic Locators)

Mouse / Keyboard

Highlight

Dialogs

Viewport

State

Screenshot

Recording

Download

Network

Geolocation

Offline

Headers

Trace

Health

Ref Lifecycle

Semantic Locators (Find)

Network Interception

Security

Platform Notes

Architecture

License