web-task-api

v0.10.0

Published

21 days ago

Browser-task runtime for MCP and HTTP automation with runs, sessions, and recipes.

Downloads

1,417

0High
0Medium
0Low

richjojo

mcp browser-automation playwright web-tasks structured-output claude-code opencode

Web Task API

web-task-api is a generalized browser-task runtime for projects that want to treat websites like programmable APIs.

For MCP discovery and host UI, the human-facing title should be Web Task. The package and registry IDs stay web-task-api for compatibility because the same package ships both the HTTP API and the MCP server.[^2]

It exposes one API for:

starting from a URL + goal
letting an agent drive a real browser
validating structured output against a schema
reusing persistent browser profiles and promoted recipes
storing traces and artifacts for replay/debugging

The same runtime now ships in two surfaces:

HTTP API for application-to-application integration
MCP server for Claude Code, OpenCode, and other MCP clients[^1]

Environment

Node runtime: see .nvmrc
npm toolchain: see package.json#packageManager
TypeScript 6.0.x
tracked repo-local git hook entrypoints in .git-hooks/

This rollout intentionally treats the Node 24 line as the checked support floor.

Standard commands:

./scripts/bootstrap.sh
./scripts/check.sh
./scripts/fix.sh
./scripts/test.sh

./scripts/bootstrap.sh installs dependencies and Playwright.

Why this exists

Instead of building one brittle adapter per website, this project uses an agent-first browser runtime:

default path: goal-driven browser control
optimization path: reusable recipes for common flows
escape hatch: login profiles and artifacts for debugging failures

That gives a more future-proof foundation for “API for any site” style automation.

Implemented MVP

Fastify HTTP API
stdio MCP server
Playwright browser runtime
PI-engine-backed planner path for freeform browser control in the local stack
Legacy CLIProxy and OpenCode planner adapters kept for fallback compatibility
Auto planner mode that prefers PI first, then falls back to legacy adapters when the PI engine is unavailable
Mock agent for local deterministic demos/tests
Recipe registry and matching
Persistent browser profile reuse
Run artifacts and step traces
Local demo and end-to-end tests

Initial workflow targets

generic search/form workflows
Dexscreener token/pair reading starter recipe
GMGN token/wallet reading starter recipe
competitive-programming scouting recipes for solved.ac, BOJ, Codeforces, AtCoder, QOJ, and Jungol

For Dexscreener/GMGN, treat the shipped recipes as starter recipes, not guaranteed turnkey integrations yet. A warmed persistent browser source is often required because fresh headless sessions can hit Cloudflare or similar anti-bot checks. The runtime now fails fast for these protected-site recipes unless you provide one of:

request.profile
BROWSER_USER_DATA_DIR
sessionId for a warmed session that already preserves browser storage across tasks

Quick start

Install dependencies:
```
./scripts/bootstrap.sh
```
Start the HTTP API:
```
npm run dev
```
Or start the MCP server:
```
npm run dev:mcp
```
Run the demo flow:
```
npm run demo
```

MCP

The package now exposes the MCP server binary directly:

npx -y web-task-api

That launches the stdio MCP server.

The HTTP runtime remains available separately:

npx -y -p web-task-api web-task-api-http

MCP tools

webtask_run — start a new browser task from a goal and optional URL
webtask_get_task — inspect one persisted task run
webtask_list_tasks — discover recent persisted task runs before fetching one in detail
webtask_list_models — inspect the catalog-backed planner models and supported variants
webtask_list_recipes — list reusable starter recipes before a run
webtask_create_session — create continuity for related tasks
webtask_list_sessions — list saved continuity sessions
webtask_get_session — inspect one saved session and its recent history
webtask_update_session — update session metadata like name, notes, or defaults
webtask_health — inspect runtime readiness, planner availability, and inventory counts

Tool-selection guidance follows MCP best practice: use human-readable titles/descriptions, make the “when should I use this tool?” boundary explicit, and publish accurate behavior hints instead of vague marketing copy.[^3]

MCP config examples

Claude Code: examples/claude.mcp.json
OpenCode: examples/opencode.json

API

`POST /v1/tasks/run`

Runs a browser task synchronously and returns structured results.

Example request is in examples/demo-task.json.

`GET /v1/tasks/:taskId`

Returns the persisted run record with step trace and artifact paths.

`GET /v1/tasks`

Lists recent persisted task summaries with optional status, sessionId, and limit filters.

`GET /v1/recipes`

Lists registered recipes.

`GET /v1/models`

Lists the catalog-backed planner models and supported thinking variants. On this machine the runtime prefers the shared work-atlas catalog when present and falls back to the bundled snapshot otherwise.

`POST /v1/sessions`

Creates a reusable session for connected tasks. Sessions can carry:

guest vs profile mode
default start URL
default planner config
notes
compact task history

`GET /v1/sessions`

Lists saved sessions.

`GET /v1/sessions/:sessionId`

Returns session metadata and recent task history.

`PATCH /v1/sessions/:sessionId`

Updates session metadata like notes, default start URL, or the bound profile for an existing profile-mode session. Guest sessions cannot be rebound into named profiles by patch.

`GET /health`

Returns a readiness snapshot with planner availability, storage roots, defaults, and recipe/session/task counts.

TypeScript client

Software can use the bundled client:

import { WebTaskApiClient } from "web-task-api";

const client = new WebTaskApiClient({ baseUrl: "http://127.0.0.1:4317" });
const session = await client.createSession({
  name: "axiom trader",
  mode: "profile",
  profile: "axiom",
  notes: "Authenticated Axiom trading session",
});
const result = await client.runTask({
  goal: "Extract token name and price",
  startUrl: "https://example.com",
  sessionId: session.id,
  agent: { kind: "auto" },
});
const recent = await client.listTasks({ sessionId: session.id, limit: 5 });
const health = await client.health();

Connected tasks with sessions

Sessions let related web tasks share:

browser/profile identity
guest-session cookies and local storage across tasks
default start URL
planner defaults
recent task context

Example pattern:

Create session for axiom profile
Run login/manual warmup task once
Run later research/action tasks with the same sessionId
Inspect session history to see what the agent already found

Guest sessions also work: create a mode: "guest" session and repeated tasks will preserve browser storage between runs under that session ID.

For protected recipes, that guest session still needs to be warmed first before you rely on it as a continuity source.

Browser profiles

To create a reusable login profile:

npm run profile:login -- --id my-profile --url https://example.com/login

This opens a real persistent browser profile. Log in manually or solve bot challenges, then press Enter in the terminal. The runtime saves a reusable Chromium user-data directory at <data-root>/profiles/<id>/user-data-dir and later tasks can use "profile": "my-profile".

This matters for sites like Dexscreener or GMGN that may block fresh headless sessions behind Cloudflare or similar anti-bot checks.

If you want the runtime to behave as closely as possible to your normal local Chrome, you can also point it at an existing browser profile:

BROWSER_USER_DATA_DIR=/path/to/your/chrome/profile

That is the closest match to “it works in my Chrome already”.

Runtime storage roots

By default the runtime keeps mutable data out of the ambient working directory:

Linux: ~/.local/share/web-task-api
macOS: ~/Library/Application Support/web-task-api
Windows: %LOCALAPPDATA%\web-task-api

Under that data root the runtime writes:

profiles/<id>/user-data-dir
runs/<taskId>/...
sessions/<sessionId>.json

Bundled starter recipes are read from the installed package's recipes/ directory, not from the current shell cwd.

Useful overrides:

WEB_TASK_API_DATA_DIR — custom mutable data root
WEB_TASK_API_RECIPES_DIR — custom recipes directory
WEB_TASK_API_TEMP_DIR — custom temp root used when the incoming temp env points at your home/cwd

Planner backends

Recommended: PI engine

This is the default non-mock path for the local stack.

Useful environment variables:

WEB_TASK_API_MODEL — shared default planner model, default deepseek/deepseek-v4-flash
WEB_TASK_API_VARIANT — shared default planner thinking variant, default high
WEB_TASK_API_PI_URL — default http://127.0.0.1:8793
WEB_TASK_API_PI_MODEL — optional PI-specific model override
WEB_TASK_API_PI_VARIANT — optional PI-specific thinking variant override

Example:

{
  "agent": {
    "kind": "pi"
  }
}

Compatibility fallback: CLIProxyAPI

CLIProxyAPI is treated as a multi-provider router, not a single-provider API key wrapper. You can point this product at any model alias/provider path exposed by your CLIProxy setup.

Useful environment variables:

CLIPROXY_BASE_URL — default http://127.0.0.1:8317/v1
CLIPROXY_AUTH_TOKEN — optional client token if your CLIProxy instance requires one
CLIPROXY_MODEL — planner model alias/name exposed by your proxy, for example whatever provider/model mapping you configured there

Example:

{
  "agent": {
    "kind": "cliproxy"
  }
}

Easiest local path right now: `auto`

If you want the runtime to pick the best available planner automatically, use:

{
  "agent": {
    "kind": "auto"
  }
}

auto probes the local PI engine first. If that lane is unavailable, it falls back to CLIProxy when reachable and model-configured, then to OpenCode as the last compatibility path. Catalog-backed provider/model selections stay on the PI/OpenCode lane instead of being reinterpreted as CLIProxy aliases. This path is verified locally against the fixture flow; for real sites, treat it as the recommended runtime path, not a guarantee that every protected site will work without profile warmup.

Optional: OpenCode

If you already run OpenCode headless and want to reuse that stack, the project also supports an OpenCode planner adapter.

Useful variables:

OPENCODE_BASE_URL
OPENCODE_MODEL — optional OpenCode-specific model override, defaulting to WEB_TASK_API_MODEL

Then use:

{
  "agent": {
    "kind": "opencode"
  }
}

Per-call overrides stay available through the API, client, and MCP tools. Example:

{
  "agent": {
    "kind": "opencode",
    "model": "openai/gpt-5.4",
    "variant": "high"
  }
}

Main files

docs/design.md — architecture, decisions, and implementation plan
docs/releasing.md — tag-driven release flow and MCP registry packaging notes
src/ — server, runtime, agent, browser, and storage code
tests/ — end-to-end verification with a local fixture site
scripts/ — demo runner and profile bootstrap

References

[^1]: docs/design.md for the detailed system design, tradeoffs, and roadmap.

[^2]: server.json is the MCP registry metadata source of truth; package.json carries the npm package and executable metadata.

[^3]: Model Context Protocol, "Tools" specification and tool-annotation guidance — titles, descriptions, JSON Schema field descriptions, and accurate hints improve host UX and tool selection.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme