agbrowse

v0.1.18

Published

5 days ago

Standalone Chrome/CDP browser automation and web-ai workflow skills for AI agents.

0High
0Medium
0Low

agbrowse

Standalone Chrome/CDP browser automation and web-ai CLI for AI agents. It turns browser work into small, inspectable terminal commands: observe the page, act by stable references, collect screenshots/console/network evidence, and run ChatGPT, Gemini, or Grok web UI sessions without paying an MCP token tax.

agbrowse is a serverless extraction of the cli-jaw / 30_browser browser workflow. It gives an agent a small CLI surface for:

DOM/ref based browser control
screenshots and coordinate clicks
console/network/DOM diagnostics
adaptive reading for one candidate URL via agbrowse fetch
structured web-ai prompt rendering
live ChatGPT, Gemini, and Grok web UI execution
file upload and context-package upload for implemented providers
ChatGPT code-mode zip generation and later artifact re-extraction

It does not require a long-running MCP server. Each command is a short-lived Node process that reconnects to the same Chrome DevTools Protocol endpoint.

What's New in 0.1.17

GPT-5.6 Chat contract: select Chat families with --family gpt-5.6-sol|gpt-5.5|gpt-5.4|gpt-5.3|o3 and use canonical --effort medium|high|xhigh values. The runtime understands the current flat Instant (5.5) / Medium / High / Extra High / Pro Intelligence picker.
ChatGPT Work surface v1: submit through the dedicated agbrowse web-ai work send --prompt "..." --power 1..6 command or MCP web_ai_work_send. Chat send/query/poll/watch and web_ai_submit_prompt fail closed on an active Work surface.
Long-run recovery: ChatGPT Pro polls receive a 5400-second default deadline, while Grok Heavy and Deep Research keep independent 3600-second tiers. Saved sessions retain their original deadline across shell exits.
Search and extraction: the modular search skill now separates discovery from original-page proof, and agbrowse extract maps tables or JSON-LD to a supplied schema with fail-closed validation.
Agent-first distribution and QA routing: skills install ships browser, web-ai, search, and vision-click; Playwright/browser-QA intent routes to agbrowse for ad-hoc inspection while preserving maintained project E2E suites.
GitHub Pages redesign: the docs landing page now presents browser control, web-AI, search, and evidence as full-screen product lanes with reduced-motion support.

Provider UI automation remains beta because provider DOM and account state can change. Schema-bound CLI extraction remains experimental; see the capability truth table for exact labels.

Public Surface

| Surface | Link | Status | | --- | --- | --- | | npm package | agbrowse | public package metadata | | Repository | lidge-jun/agbrowse | public source | | Docs landing page | https://lidge-jun.github.io/agbrowse/ | live GitHub Pages site | | Developer docs | docs/dev/index.html / docs/dev/ko/index.html | English and Korean V1 docs | | Architecture source | structure/INDEX.md | capability and release truth source | | Production notes | docs/production-readiness.md | verification and risk checklist |

GitHub Pages publishes the repository /docs directory at https://lidge-jun.github.io/agbrowse/. Developer documentation is available in English and Korean under docs/dev/.

Quick Start

npm install -g agbrowse
agbrowse --help
agbrowse skills get core --full
agbrowse start
agbrowse navigate "https://chatgpt.com/"
agbrowse snapshot --interactive --max-nodes 120

For web-ai smoke tests after logging in to the provider:

agbrowse web-ai query \
  --vendor chatgpt \
  --url https://chatgpt.com/ \
  --model pro \
  --inline-only \
  --allow-copy-markdown-fallback \
  --prompt "Reply exactly AGBROWSE_OK"

For long Pro / Deep Think runs that should survive shell exit:

SID=$(agbrowse web-ai send --vendor chatgpt --model pro --inline-only \
        --prompt "..." --json | jq -r .sessionId)
agbrowse web-ai poll --vendor chatgpt --session "$SID"
agbrowse web-ai work send --prompt "Analyze this repository" --power 4

Agent rule: observe before acting. Use status, tabs, snapshot --interactive, and web-ai status before mutating a page. Set AGBROWSE_JSON_ERRORS=1 for parseable failure envelopes.

Standalone Search

# Full pipeline: query rewrite → fetch original pages → evidence score
agbrowse search "Next.js 15 app router migration" --json

# Verify a single URL's readability and content
agbrowse search --verify "https://nextjs.org/docs/app" --json

# Pipe your CLI's built-in search results for deep verification
echo '[{"url":"...","title":"..."}]' | agbrowse search "query" --stdin-results --json

# Escalate to web-ai when evidence is insufficient
agbrowse search "서울시 2026 청년 지원금" --deep --vendor grok --json

Research Subcommands

The research command family breaks a search problem into smaller, inspectable planning and enrichment steps. None of them execute a live web search on their own.

agbrowse research plan --query "Next.js 15 app router migration" --json
agbrowse research normalize-results --file results.json --backend exa --json
agbrowse research enrich-fetch --plan plan.json --results results.json --json
agbrowse research browse-plan --plan plan.json --enrichment enrichment.json --json

plan decomposes a query into constraints, source hints, and focused sub-queries. normalize-results converts provider-specific rows (Exa, Tavily, Perplexity, Brave, browser SERP) into a search-results-v1 candidate list. enrich-fetch reads those candidates through the adaptive fetch ladder and produces a research-fetch-enrichment-v1 evidence ledger. browse-plan converts remaining weak candidates into a reasoned browser command plan without mutating browser state.

Structured Extraction

agbrowse extract pulls structured data from a URL or local HTML file using a JSON schema, without calling an LLM. When no structure matches the schema, it returns a fail-closed no_mappable_structure verdict instead of silent partial data. Tier 2 web-ai escalation is available as an explicit opt-in.

# Extract table data matching a schema (Tier 1, LLM-free)
agbrowse extract "https://example.com/products" --schema products.json --json

# Extract from a local HTML file
agbrowse extract --from-file page.html --schema products.json --json

# Tier 2: escalate to web-ai on Tier 1 failure
agbrowse extract "https://example.com/products" --schema products.json --escalate-web-ai --vendor grok

What It Is Good For

Browser automation for agents: navigate, snapshot, click refs, type, capture screenshots, inspect console/network, and keep the active CDP target stable across commands.
Web-AI execution: submit and poll ChatGPT, Gemini, and Grok sessions with provider-specific model selection and fail-closed capability checks.
ChatGPT code artifacts: ask ChatGPT to build a small project, package it as /mnt/data/*.zip, retrieve the zip headlessly, and re-extract it later from the same saved conversation.
Evidence-heavy research: require source audits, save answer artifacts, and preserve traces without relying on hidden browser state.
Standalone deep search: agbrowse search — any CLI agent can pipe its web search results for original-page verification and evidence scoring, or escalate to web-ai for deep synthesis.
Standalone skill distribution: install bundled browser, web-ai, search, and vision-click skills into cli-jaw or Codex skill roots.

Architecture Snapshot

agent shell
  -> agbrowse bin
  -> browser skill runtime
  -> Chrome DevTools Protocol
  -> target tab / provider web UI

web-ai command
  -> provider adapter
  -> tab/session guard
  -> prompt renderer
  -> poller + artifact writer
  -> source/trace/policy gates

The runtime keeps browser state under BROWSER_AGENT_HOME and stores durable web-ai sessions separately from the shell process, so a later poll can resume the same provider tab by session id.

Verification Policy

Use the smallest gate that matches the changed surface:

npm run typecheck
npm run test:release-gates
npm run smoke:bins
npm run test:mcp
npm run test:source-audit
npm run test:trace-policy
npm run gate:all

Current remote CI signal: the scheduled Contract Drift Check workflow is passing on main. Release publishing is dispatched through release.yml.

Safety Model

Provider DOMs are treated as untrusted and can drift.
Unsupported vendors, unsupported model aliases, missing composers, and unsafe context-package paths fail closed.
~/.browser-agent contains browser profile/session state and must not be committed or shared.
CAPTCHA bypass, stealth, credential stuffing, and guaranteed provider account entitlement checks are out of scope.
Adaptive fetch uses DNS pinning (curl --resolve) and per-hop validated redirects to mitigate SSRF and DNS rebinding; see the Adaptive URL Fetch section for details.
The package metadata says MIT; this repository currently has no standalone root LICENSE file, so downstream users should rely on package metadata until one is added.

Status

This repository is packaged as a standalone skill/runtime.

Source structure as of 2026-07-11:

| Path | Files | Lines | Role | | --- | ---: | ---: | --- | | skills/browser/ | 55 | 16 320 | Chrome lifecycle, CDP, refs, tabs, adaptive fetch v2, search, extract, Runway | | skills/search/ | 5 | 896 | proof-first search skill hub and modular references | | web-ai/ | 113 | 27 441 | provider automation, sessions, MCP, eval, policy, trace | | test/unit/ | 141 | 17 628 | deterministic module tests | | test/integration/ | 21 | 3 165 | CLI, MCP, policy, provider fixture tests | | scripts/ | 10 | 1 621 | release gates, eval runner, strict-baseline checks | | docs/ | 41 | 3 540 | adoption, trace, production-readiness, developer docs |

Architecture and release-claim source of truth live in structure/INDEX.md and the Phase 11+ truth table lives in structure/phase_status.md. Update that folder when CLI, web-ai, MCP, eval, or release-gate behavior changes.

Ready surfaces:

agbrowse CLI bin
persistent Chrome profile under BROWSER_AGENT_HOME
stable default CDP port 9222
explicit --port / CDP_PORT override
active tab persistence via CDP target id
browser primitive tests
web-ai contract tests
source-audit and answer-artifact gates for research workflows
narrow MCP bridge surface: web_ai_*, browser_snapshot, and browser_click_ref with strict input schemas
offline DOM churn eval fixtures
trace and safety-policy schemas
benchmark trajectory schema and offline bundle writer
standalone search (agbrowse search) with query rewrite, adaptive fetch, and evidence scoring
research subcommands (plan, normalize-results, enrich-fetch, browse-plan) for decomposed search planning

Beta surfaces:

ChatGPT, Gemini, and Grok live web-ai send/poll/query flows
provider model and reasoning-effort selection
provider source/citation quality checks
ChatGPT code mode (web-ai code) and later artifact extraction (web-ai code-extract)

Experimental or deferred surfaces:

adaptive URL fetch (agbrowse fetch <url>) as a URL reader, not search
adaptive fetch 203.x modules: TLS impersonation, yt-dlp media reader, Camoufox stealth lane, feed parser, BM25 reranker, structured extractor, lane-classified candidate discovery
web-ai capability registry, interstitial detector, freshness gate, diagnostics stage taxonomy, and provider lifecycle adapter
hosted/cloud browser operation
remote external-cdp provider mode
broader MCP production bridge beyond the listed tools
leaderboard or competitor benchmark score claims

What remains intentionally out of scope for the standalone runtime:

cli-jaw server APIs
root cli-jaw watcher/notification dashboards
guaranteed provider account access
captcha or Cloudflare bypass
billing/subscription entitlement checks

Provider UIs change frequently. Live web-ai flows are smoke-tested behavior, not a contractual API from the providers.

Install CLI

From npm:

npm install -g agbrowse

Older global installs may print a short stderr-only update notice when npm has a newer agbrowse version. Agents should tell the user before changing the global CLI install:

npm install -g agbrowse@latest

Set AGBROWSE_UPDATE_CHECK=0 to hide the notice. The check is skipped for JSON output, MCP stdio, CI, and help commands.

From this repository:

git clone https://github.com/lidge-jun/agbrowse.git
cd agbrowse
npm install
npm link

Direct local usage without linking:

node skills/browser/browser.mjs status
node skills/browser/browser.mjs fetch https://example.com --json --trace
node skills/browser/browser.mjs web-ai render --vendor chatgpt --prompt "hello"

Adaptive URL Fetch (v2)

agbrowse fetch <url> reads one candidate URL through a 6-phase adaptive escalation ladder and returns evidence. It is useful after a search tool or user has produced a URL.

agbrowse fetch "https://example.com/article"
agbrowse fetch "https://example.com/article" --json --trace
agbrowse fetch "https://example.com/article" --browser never
agbrowse fetch "https://example.com/article" --no-browser
agbrowse fetch "https://example.com/article" --browser required
agbrowse fetch "https://example.com/article" --allow-third-party-reader
agbrowse fetch "https://example.com/article" --browser-session user
agbrowse fetch "https://example.com/article" --browser-session interactive
agbrowse fetch "https://example.com/article" --identity chrome

Escalation ladder (code execution order): public endpoints + direct fetch with identity headers → third-party readers (opt-in) → isolated Chrome render + network API discovery → user session (opt-in) → human-in-the-loop (interactive). Content scoring runs after each phase to decide whether to escalate.

Fetch Modules (203.x)

The v2 ladder includes specialized modules that run at specific escalation rungs:

| Module | Purpose | | --- | --- | | 203.1 TLS impersonation | JA3 fingerprint spoofing via curl-impersonate on 403/429/challenge, inserted before browser escalation | | 203.2 yt-dlp media reader | Extracts metadata and transcripts from video URLs via yt-dlp | | 203.3 Camoufox browser lane | Optional Firefox-based browser session via Camoufox for alternate rendering | | 203.4 Feed parser | RSS, Atom, and JSON Feed detection and parsing into structured evidence | | 203.5 BM25 lexical reranker | Content-relevance scoring using BM25 term weighting | | 203.6 Structured extractor | Table and heading extraction from HTML into structured records | | 203.7 Candidate discovery | Lane-classified candidate URL discovery from page content |

WAF Profile Detection

Before escalation decisions, the fetch ladder fingerprints known WAF/bot- management systems from HTTP response headers and body markers: Cloudflare (managed challenge + Turnstile), Akamai Bot Manager, AWS WAF, Imperva/Incapsula, DataDome, and PerimeterX. Detection results feed into the escalation decision (e.g. TLS impersonation or browser escalation).

SSRF Mitigation and Redirect Safety

All HTTP fetches pin resolved DNS addresses via curl --resolve to close the TOCTOU window between DNS resolution and connection (R4-SSRF). The legacy curl -L redirect behavior is replaced by a per-hop validated redirect loop that re-validates each hop's target against the SSRF allowlist before following it (R4).

In --json mode the selected content field is bounded before serialization so stdout remains parseable even when a public endpoint returns a large JSON document. Results include contentBytes, contentLimitBytes, and contentTruncated; truncation means only the CLI output was compacted, not that the source was rejected. --max-bytes remains the per-attempt read limit.

Automated CAPTCHA solving, credential stuffing, and stealth are forbidden. Human assistance (browser-grade headers, user session, human resolves) is allowed with explicit opt-in flags (--browser-session user|interactive). Built-in public endpoint candidates include GitHub, Reddit, Hacker News, Wikipedia, npm, PyPI, arXiv, Bluesky, Mastodon-compatible statuses, Stack Exchange, dev.to, DOI/CrossRef, OpenLibrary, Wayback CDX, YouTube oEmbed, X/Twitter oEmbed, HN Algolia, V2EX, Lobsters, and generic oEmbed discovery.

Requirements

Node.js 18+
Google Chrome, Chromium, or Brave
playwright-core
Codex CLI only if you use vision-click

On macOS and desktop Linux, headed Chrome is recommended for web-ai provider sites because provider anti-bot checks often reject headless sessions.

Browser Lifecycle

Default runtime state:

| Setting | Default | | --- | --- | | data dir | ~/.browser-agent | | profile dir | ~/.browser-agent/browser-profile | | CDP port | 9222 | | screenshot dir | ~/.browser-agent/screenshots | | state file | ~/.browser-agent/browser-state.json |

The default port does not fluctuate. It stays 9222 unless you pass --port or set CDP_PORT.

agbrowse start
agbrowse status
agbrowse stop

Use a custom home and port when running multiple isolated instances:

BROWSER_AGENT_HOME="$HOME/.browser-agent-work" CDP_PORT=9333 agbrowse start
BROWSER_AGENT_HOME="$HOME/.browser-agent-work" CDP_PORT=9333 agbrowse web-ai status --vendor chatgpt

If Chrome is already listening on the selected CDP port and responds to /json/version, agbrowse reuses it and emits a stderr warning when the running CDP endpoint appears to differ from agbrowse's persisted browser state (no prior state, port mismatch, or startedAt more than an hour old). If another non-CDP process owns the port, startup fails instead of silently choosing a different port.

First Login

Provider web-ai flows need a logged-in browser profile. Do this once:

agbrowse start
agbrowse navigate "https://chatgpt.com/"
agbrowse navigate "https://gemini.google.com/app"
agbrowse navigate "https://grok.com/"

Complete login manually in the headed Chrome window. The profile is reused for later commands.

Do not commit or share ~/.browser-agent; it contains browser session state.

Install Bundled Skills

npm install -g agbrowse installs the agbrowse and agbrowse-vision-click commands immediately. It does not automatically mutate any agent runtime. To register the bundled skills, choose the target skill root explicitly:

agbrowse skills install --target ~/.cli-jaw-3460/skills

For Codex:

agbrowse skills install --target ~/.codex/skills

The default mode copies the bundled browser, web-ai, search, and vision-click skill directories. Use --json when another agent will parse the result:

agbrowse skills install --target ~/.cli-jaw-3460/skills --json

Use --link if you want the target skill directories to track the globally installed npm package:

agbrowse skills install --target ~/.cli-jaw-3460/skills --link

Existing target skills are preserved by default. Replace them explicitly with:

agbrowse skills install --target ~/.cli-jaw-3460/skills --force

Core Browser Commands

agbrowse start [--port 9222] [--headless] [--chrome-path /path/to/chrome]
agbrowse stop
agbrowse status
agbrowse reset --force

Observe:

agbrowse snapshot --interactive --max-nodes 80
agbrowse screenshot --json
agbrowse screenshot --full-page
agbrowse text
agbrowse text --format html
agbrowse get-dom --selector "main" --max-chars 4000
agbrowse console --clear --reload --duration 3000
agbrowse network --reload --duration 2000 --filter api

Act:

agbrowse click e3
agbrowse type e5 "hello" --submit
agbrowse press Enter
agbrowse hover e7
agbrowse mouse-click 400 300
agbrowse resize 1440 900
agbrowse evaluate "document.title"

Tab Management (Phase 9.1)

Multi-tab support isolates each web-ai session in its own browser tab.

agbrowse tabs                          # list all tabs
agbrowse tab-switch 2                  # switch by index
agbrowse tab-switch <targetId>         # switch by CDP target id
agbrowse new-tab <url>                 # create a new tab
agbrowse tab-close <targetId>          # close a tab
agbrowse tab-cleanup                   # close idle tabs and enforce max-tabs
agbrowse tab-cleanup --include-untracked --idle-after 10m

Web-ai tab behavior:

# Default: new tab per send/query (Phase 9.1)
agbrowse web-ai send --vendor chatgpt --inline-only --prompt "hello"

# Legacy: reuse the existing active tab
agbrowse web-ai send --vendor chatgpt --reuse-tab --inline-only --prompt "hello"
export AGBROWSE_REUSE_TAB=1            # global legacy mode

Session-to-tab binding is strong: poll and stop with --session resolve the session's bound tab, not the globally active tab. If the tab was closed, the runtime auto-recovers by creating a new tab and navigating to the saved conversationUrl.

Tab limits:

| Setting | Default | Env var | | --- | --- | --- | | Max tabs | 20 | AGBROWSE_MAX_TABS | | Idle timeout | 30 min | AGBROWSE_TAB_IDLE |

send and query run tab cleanup before opening another tab. Cleanup never closes tabs pinned in the current process or tabs bound to active web-ai sessions. Use agbrowse tabs --json to inspect lastActiveAt, idleForMs, and pinned state before manual cleanup.

Recommended loop:

snapshot --interactive -> act -> snapshot -> verify

Refs are scoped to the latest snapshot. Re-run snapshot --interactive after navigation, reload, tab switch, or any major page mutation.

Vision Click

Use vision-click only when a target is visible in a screenshot but has no usable DOM/ref target, such as canvas/WebGL-heavy UIs. The normal order is ref click first, coordinate click last. Vision results are treated as bbox candidates with confidence; low-confidence or legacy point-only results require verification instead of clicking directly.

agbrowse screenshot --json
agbrowse-vision-click "the visible Submit button"

agbrowse observe-bundle --screenshot --boxes --json > /tmp/bundle.json
agbrowse-vision-click "Submit button" --bundle /tmp/bundle.json --verify-before-click

The vision path handles device-pixel-ratio correction and clip-origin evidence before sending page.mouse.click() coordinates.

Web AI

The web-ai command drives ChatGPT / Gemini / Grok web UIs through the same Chrome that agbrowse start spawns. It treats provider DOM as untrusted and fails closed when required selectors, models, or capabilities are not observed.

Commands:

agbrowse web-ai render            # render the prompt envelope only
agbrowse web-ai status            # check active tab + composer
agbrowse web-ai send              # submit and return a sessionId
agbrowse web-ai poll              # wait for completion
agbrowse web-ai query             # send + poll
agbrowse web-ai stop              # press Escape on the active tab
agbrowse web-ai project-sources   # list/add ChatGPT Project Sources
agbrowse web-ai code              # generate + retrieve ChatGPT code zip artifacts
agbrowse web-ai code-extract      # re-retrieve zip artifacts from an old conversation
agbrowse web-ai context-dry-run   # preview a context package
agbrowse web-ai context-render    # render full prompt + context text

Provider matrix:

| Provider | Inline | File upload | Context package | Model select | Copy fallback | | --- | ---: | ---: | ---: | ---: | ---: | | ChatGPT | yes | yes | yes | yes | yes | | Gemini | yes | yes | yes | yes | yes | | Grok | yes | yes | fail-closed (see Context Packages) | yes | yes |

Unsupported vendors and unsupported model aliases fail closed before any browser mutation.

Every prompt automatically appends an [INSTRUCTIONS] block telling the model to use web search and cite sources inline. Run web-ai render to inspect the exact text that is typed into the composer.

Web-AI Runtime Capabilities (201.x / 203.8)

Recent releases added several cross-cutting runtime features to the web-ai layer:

| ID | Capability | Summary | | --- | --- | --- | | 201#1-8 | Capability registry | Declarative per-vendor capability lookup via lookupCapability; fail-closed when a feature is unsupported | | 201#3, #5 | Annotated screenshots | Screenshots with element annotations and read-only product-surface metadata | | 201#4 | Interstitial detector | Unified interstitial, popup, and overlay detection across providers | | 201#6 | Diagnostics stage taxonomy | Richer failure diagnostics with typed stage values in error envelopes | | 201#7 | Provider lifecycle adapter | Contract for provider lifecycle management (startup, teardown, health) | | 201#9 | Freshness gate | Docs-first freshness gate that checks source-of-truth currency before answering | | 203.8 | Live-status report | Typed standalone status struct for ongoing web-ai sessions |

Polling Timeouts

web-ai poll / query / watch accept --timeout <seconds>. Timeout resolution is explicit timeout → stored session deadline remainder → tier default → vendor fallback. A resumed poll keeps the deadline created by the original submit unless the caller explicitly overrides it.

| Long-running tier | Default --timeout | Roughly | | --- | ---: | --- | | chatgpt-pro | 5400 | 90 minutes | | grok-heavy | 3600 | 60 minutes | | deep-research | 3600 | 60 minutes |

| Vendor fallback when the tier is unknown | Default --timeout | Roughly | | --- | ---: | --- | | ChatGPT | 1200 | 20 minutes | | Gemini | 1200 | 20 minutes | | Grok | 600 | 10 minutes |

Do not equate ChatGPT's UI-side reasoning budget with the agbrowse poll deadline. The roughly 40-minute Pro budget is a user report and was not present in the 2026-07-10 DOM. The 5400-second (90-minute) chatgpt-pro default is agbrowse polling headroom; grok-heavy and deep-research remain independent 3600-second tiers. The provider tab and agbrowse Chrome process remain open when polling times out.

Sessions

web-ai send returns a 26-char ULID sessionId that survives shell exit, OS sleep, and Bash timeouts. Sessions persist at $BROWSER_AGENT_HOME/web-ai-sessions.json (default ~/.browser-agent).

# Long Pro / Deep Think run — fire-and-forget from one shell, resume from another.
SID=$(agbrowse web-ai send --vendor chatgpt --model pro --inline-only \
        --prompt "long Pro prompt..." --json | jq -r .sessionId)

# Later, in any shell, on the same machine:
agbrowse web-ai poll --vendor chatgpt --session "$SID"

poll resolves the session in priority order: --session <id> > active target id > vendor latest > legacy baseline. On a shared CDP port, session-less poll/stop auto-bind only when exactly one active provider session exists; two or more active sessions fail closed with session.target-ambiguous and candidate sessionId/targetId evidence. Each completion / timeout updates the session record with status, conversationUrl, and answer. Completed sessions also expose local artifact descriptors in agbrowse web-ai sessions show <id> when transcript, report, or image artifacts were saved.

Session-to-tab binding (Phase 9.1): every session owns its own tab. The record stores targetId, tabId, and tabState (createdAt, lastActiveAt, recoveryCount, closeCount). stop --session <id> resolves that bound tab and sends Escape as an interrupt without taking over the running poll's target lease. If the bound tab is closed mid-operation, the runtime auto-recovers once by creating a new tab and navigating to the saved conversationUrl where the command permits navigation.

Temporary Chat sessions are never archived, including when archive mode is forced, because they are not durable ChatGPT conversations.

Add --deadline <iso> to override the default deadline (now + --timeout) and --navigate to allow sessions resume to switch tabs when the saved conversationUrl differs from the current tab.

Durable session recovery

Session recovery is target-bound. poll --session, watch --session, sessions resume, and sessions reattach resolve the session's stored target first, then recover/navigate only when the command permits it. Use agbrowse web-ai sessions doctor <id> --json when a shell was interrupted or a provider tab outlived a local timeout.

Failure envelope

Set AGBROWSE_JSON_ERRORS=1 (or pass --json) for machine-readable failures. Every error becomes:

{
  "ok": false,
  "status": "error",
  "error": {
    "name": "WebAiError",
    "errorCode": "cdp.target-mismatch",
    "stage": "connect",
    "message": "active tab is not ChatGPT: https://example.com/",
    "retryHint": "tab-switch",
    "vendor": "chatgpt",
    "mutationAllowed": false,
    "selectorsTried": [],
    "evidence": { "url": "https://example.com/" }
  }
}

Poll-stage target drift is returned as a command result, not just an error envelope. The result includes ok: false, status: "target-mismatch", expectedTargetId, actualTargetId, port, a targetMismatch object, and a recovery command such as:

agbrowse web-ai poll --vendor chatgpt --session "$SID" --navigate --json

Initial errorCode catalog:

cdp.unreachable, cdp.target-mismatch
provider.composer-not-visible, provider.model-mismatch, provider.attachment-preflight, provider.attachment-evidence-missing, provider.commit-not-verified, provider.poll-timeout, provider.runtime-disabled
capability.unsupported
session.target-ambiguous
context.over-budget, context.symlink-rejected
grok.context-pack-not-allowed
internal.unhandled

Exit code is 1 on every failure; --json always lands a single parseable envelope on stderr (no double-printing).

Render First

agbrowse web-ai render \
  --vendor chatgpt \
  --project "agbrowse" \
  --goal "review the upload flow" \
  --prompt "Find the riskiest edge case."

The envelope is structured and stable:

[SYSTEM]
...

[USER]
## Project
...

## Goal
...

## Question
...

ChatGPT

agbrowse web-ai query \
  --vendor chatgpt \
  --url https://chatgpt.com/ \
  --model pro \
  --inline-only \
  --allow-copy-markdown-fallback \
  --prompt "Reply exactly CHATGPT_OK"

Model aliases:

| Input | Current resolution | | --- | --- | | instant, fast | GPT-5.5 Instant; no reasoning effort | | thinking, think | selected family + thinking tier; defaults to medium | | pro | selected family + flat Pro row; omit effort | | --effort medium\|high\|xhigh | Medium / High / Extra High | | --family gpt-5.6-sol\|gpt-5.5\|gpt-5.4\|gpt-5.3\|o3 | Chat family aliases; omit to preserve current UI family |

Legacy effort normalization: light|standard → medium, extended → high (one stderr warning), heavy → xhigh. Legacy Pro effort resolves to flat Pro and emits one no-selection stderr warning. gpt-5.3 is no longer a synonym for instant; use --family gpt-5.3.

Legacy UI (before 2026-07-10)

The simplified Intelligence picker in the Legacy UI (before 2026-07-10) exposed Instant, Medium, High, Extra High, and Pro Extended. Legacy model-switcher-* rows and composer-pill fallbacks remained supported.

ChatGPT Work

Use the dedicated Work entrypoint; Chat send/query/poll/watch and web_ai_submit_prompt reject an active Work surface.

agbrowse web-ai work send --prompt "Analyze this repository" --power 4

--power is an integer from 1 through 6. For MCP clients, use web_ai_work_send with prompt and power. Do not add a surface field to web_ai_submit_prompt.

ChatGPT Code Mode

web-ai code is a ChatGPT-only beta for generating small codebases through the visible ChatGPT web UI, then retrieving the resulting zip without clicking a download button. The prompt contract asks ChatGPT to create a durable PLAN.md or 00_plan.md checklist inside the generated artifact, use turn_plan.update_turn_plan only when that tool is actually available during the response, keep visible todo/checklist tools to 8 or fewer top-level items, put extra detailed stage instructions in the plan file, treat that visible todo UI as transient after completion, implement, self-check, package, and answer with both a human clickable sandbox link and a machine-readable plain path. Each web-ai code call automatically uploads skills/web-ai/modules/gpt-dev-agent-context.zip as the first attachment so ChatGPT has the Linux sandbox and serial-agent build rules before the user build spec.

Single zip:

agbrowse web-ai code \
  --vendor chatgpt \
  --model thinking \
  --effort medium \
  --prompt "Create a Flask hello-world MVP." \
  --output-zip ./result.zip

If --output-zip is omitted, agbrowse saves under the current working directory as code-artifact-<conversation>.zip.

Several named zips:

agbrowse web-ai code \
  --vendor chatgpt \
  --model thinking \
  --effort medium \
  --multi-zip \
  --output-dir ./artifacts \
  --prompt "Create backend.zip and frontend.zip as separate deliverables."

--multi-zip cannot be combined with --output-zip; use --output-dir. If --output-dir is omitted, agbrowse saves under the current working directory as code-artifacts-<conversation>/.

Later extraction from an existing conversation:

agbrowse web-ai code-extract \
  --vendor chatgpt \
  --url "https://chatgpt.com/c/<conversation-id>" \
  --output-zip ./result.zip

If the ChatGPT conversation tab is already open, --url can be omitted. If the conversation was created by agbrowse and the session record still exists, pass --session <sessionId> instead. Multiple old zips can be recovered with --multi-zip --output-dir ./artifacts.

The extractor does not send a follow-up prompt. It scans the saved conversation JSON for /mnt/data/*.zip, mints the provider download URL, fetches the cookie-bound payload inside the page, validates the zip, and writes it locally. The original conversation URL/session/current tab plus a logged-in ChatGPT browser profile are still required; a copied /mnt/data/result.zip line alone is not enough.

Expected final assistant answer shape:

DOWNLOAD: [result.zip](sandbox:/mnt/data/result.zip)
MACHINE: /mnt/data/result.zip

For multi-zip output, ChatGPT repeats the same two-line block for each zip. The DOWNLOAD: line is for humans in the ChatGPT UI; the MACHINE: line is for agbrowse and other automation.

New code-mode runs fail closed if the recovered code zip does not contain PLAN.md or 00_plan.md. code-extract remains able to recover old conversations, but old artifacts may predate the plan-file contract. Do not treat disappearance of the visible todo UI after the response finishes as a failure; the zip-root plan file is the durable checklist. Small generated projects should keep the top-level checklist to 8 items or fewer. Complex projects should add textual detailed stage instructions instead of expanding the visible todo/checklist beyond 8. Completed items in the zip-root plan file should be marked [x] before final packaging.

Verify recovered archives locally when correctness matters:

unzip -t ./result.zip
unzip -l ./result.zip

Gemini

agbrowse web-ai query \
  --vendor gemini \
  --url https://gemini.google.com/app \
  --model deepthink \
  --inline-only \
  --prompt "Reply exactly GEMINI_OK"

Model aliases:

flash-lite, fast, gemini-fast
flash, gemini-flash
pro, gemini-pro
thinking, think, gemini-thinking are legacy compatibility aliases for pro

Versioned UI labels such as Gemini 3.n Pro are normalized internally; prefer the stable aliases above.

Tool aliases:

deepthink, deep-think, deep_think, deep think

Gemini deepthink activates the visible Deep think tool before submitting the prompt. It is intentionally separate from the thinking model alias.

Grok

agbrowse web-ai query \
  --vendor grok \
  --url https://grok.com/ \
  --model expert \
  --inline-only \
  --prompt "Reply exactly GROK_OK"

Model aliases:

auto, automatic
fast, quick
expert, thinking, think
grok-4.3, grok43, grok-43, beta
heavy

File Upload

agbrowse web-ai query \
  --vendor gemini \
  --url https://gemini.google.com/app \
  --model fast \
  --file /path/to/user-requested-file.pdf \
  --prompt "Summarize the attached file."

Use --file only when the user explicitly wants that single file uploaded. For source/project context, use context packaging instead of creating a temporary .txt/.md file.

Upload success is not input-only. The runtime verifies visible attachment evidence before send and sent-turn evidence after send where the provider DOM exposes it.

--max-upload-file-size <bytes> sets the per-file cap for live provider uploads through --file. This is intentionally separate from context package selection: --max-context-file-size <bytes> is the preferred context budget flag, while --max-file-size <bytes> remains a legacy alias for that context budget.

Generated Images

ChatGPT generated-image output is beta and opt-in:

agbrowse web-ai query \
  --vendor chatgpt \
  --url https://chatgpt.com/ \
  --inline-only \
  --output-image ./out.png \
  --prompt "Create an image of a small robot holding a banana."

When ChatGPT returns multiple images for one --output-image ./out.png request, agbrowse writes sibling files as out.png, out-2.png, out-3.png. Explicit image output is fail-closed: if no generated image can be detected or saved, the command returns provider.image-output instead of silently succeeding.

Image input remains regular upload:

agbrowse web-ai query \
  --vendor chatgpt \
  --file ./input.png \
  --prompt "Describe this image."

Batch Follow-Ups

ChatGPT batch follow-ups are explicit and sequential in the same command:

agbrowse web-ai query \
  --vendor chatgpt \
  --inline-only \
  --prompt "Analyze this design." \
  --follow-up "Summarize the risks." \
  --follow-up "List the next three actions."

This is an in-command batch mode. For a later follow-up in the same saved conversation window, use query --session <id> --prompt <text>:

agbrowse web-ai query \
  --vendor chatgpt \
  --session "$SID" \
  --inline-only \
  --output-image ./next.png \
  --prompt "Create another image in this same conversation."

--follow-up is ChatGPT-only and cannot be combined with --research deep.

Deep Research

--research deep activates ChatGPT Deep Research mode as an experimental beta:

agbrowse web-ai query \
  --vendor chatgpt \
  --inline-only \
  --research deep \
  --timeout 1800 \
  --prompt "Research the current official status and cite sources."

Deep Research saves a report artifact when available, records researchMode: "deep" in the session, skips auto archive, and auto-confirms ChatGPT's post-submit Deep Research plan card when its iframe-rendered, time-limited Start button appears (live observed as an approximately 60-second countdown). Account blocks or missing provider UI surfaces are reported explicitly; do not treat this as a ready cross-provider capability.

ChatGPT Project Sources

Project Sources are append-only and require an explicit ChatGPT project URL:

agbrowse web-ai project-sources list \
  --chatgpt-url https://chatgpt.com/g/project_123 --json

agbrowse web-ai project-sources add \
  --chatgpt-url https://chatgpt.com/g/project_123 \
  --file ./docs/context.md \
  --dry-run summary

--dry-run validates the project URL and local files without browser mutation. Live add waits for upload evidence before reporting uploaded: true. Delete, replace, and clear operations are intentionally unsupported.

Context Packages

Use context packages when the prompt plus files would be too large or when you want untrusted file content separated from the main instruction block.

Upload transport writes one web-ai-context-package-<id>.zip archive. The archive contains CONTEXT_PACKAGE.md plus the selected source files; do not create a temporary .txt or .md file yourself for source context.

Use ChatGPT or Gemini for context packaging. Grok context packages fail closed by default — web-ai send/query --vendor grok with --context-from-files / --context-file / --context-transport upload throws with stage: 'grok-context-pack-not-allowed'. Pass --allow-grok-context-pack to override deliberately; the runtime still emits grok-context-pack-not-recommended when the override is used.

Dry run:

agbrowse web-ai context-dry-run \
  --vendor chatgpt \
  --prompt "Review these files" \
  --context-from-files "web-ai/*.mjs" \
  --json

Live upload:

agbrowse web-ai query \
  --vendor chatgpt \
  --url https://chatgpt.com/ \
  --context-from-files "web-ai/*.mjs" \
  --context-transport upload \
  --prompt "Reply exactly CONTEXT_OK if the package contains question.mjs."

Inline context:

agbrowse web-ai query \
  --vendor chatgpt \
  --inline-only \
  --context-from-files "web-ai/question.mjs" \
  --context-transport inline \
  --prompt "Review this file."

Copy Markdown Fallback

--allow-copy-markdown-fallback asks the runtime to use the provider Copy button after the DOM response completes. The implementation intercepts the page's navigator.clipboard.writeText/write call and does not read the OS clipboard. The flag is the explicit policy opt-in for this capture path; do not pair it with --unsafe-allow in normal CLI use.

agbrowse web-ai query \
  --vendor chatgpt \
  --model pro \
  --inline-only \
  --allow-copy-markdown-fallback \
  --prompt "Return a markdown table."

The fallback is opt-in because provider copy buttons are UI details and can change. A custom policy can still disable it with allowClipboardWrite: false, or allow MCP/server-side copy capture with allowClipboardWrite: true.

Source Audit

Use --require-source-audit on poll or query when a research answer must carry inline sources next to factual claims. The audit checks completed answerText locally and fails closed when claims are unsourced.

agbrowse web-ai query \
  --vendor grok \
  --model expert \
  --inline-only \
  --require-source-audit \
  --source-audit-scope "official product docs and release notes" \
  --source-audit-date "2026-05-05" \
  --prompt "Summarize the latest official product changes with sources."

Absence claims such as "no official response was found" require --source-audit-scope and --source-audit-date. Use --source-audit-ratio <0..1> only when partial sourcing is deliberate; the default requires every detected claim to carry an inline source.

Active Tab Safety

tab-switch stores a CDP target id, and mutating commands resolve the active page by that target id before falling back to page order.

agbrowse tabs
agbrowse tab-switch 0DD58EC9517DB9514D37AE74AC21829F
agbrowse web-ai status --vendor gemini

For live web-ai work, prefer passing --url so the provider runtime can verify the target host before mutation.

Environment Variables

| Variable | Default | Purpose | | --- | --- | --- | | BROWSER_AGENT_HOME | ~/.browser-agent | profile, screenshots, state, web-ai-sessions.json session store | | CDP_PORT | 9222 | default DevTools port | | AGBROWSE_JSON_ERRORS | unset | set 1 to force JSON failure envelopes regardless of --json | | AGBROWSE_UPDATE_CHECK | enabled outside CI | set 0 to hide update notices, 1 to force the check | | AGBROWSE_UPDATE_CHECK_TTL | 24h | cache TTL for npm latest-version checks | | AGBROWSE_UPDATE_CHECK_LATEST | unset | override latest version for tests/diagnosis | | CHROME_HEADLESS | unset | set 1 for headless startup | | CHROME_NO_SANDBOX | unset | set 1 only in Docker/CI if needed | | CHROME_BINARY_PATH | auto-detect | custom Chrome executable | | BROWSER_SCRIPT | bundled browser script | used by vision-click |

Troubleshooting

| Symptom | Likely cause | Action | | --- | --- | --- | | CDP connection failed | Chrome is not running on the selected port | agbrowse start | | port in use but not CDP | another process owns 9222 | choose CDP_PORT=9333 or stop the process | | provider says sign in | profile is not logged in | open the provider URL and log in manually | | wrong tab was used | stale active target | run tabs, then tab-switch <targetId> | | upload never appears | provider UI changed | run snapshot, get-dom, and update provider selectors | | Cloudflare/human check | provider anti-bot page | complete the check manually in headed Chrome |

Development

npm install
npm test
npm run test:unit
npm run test:integration

Useful focused checks:

npx vitest run test/unit/browser-active-tab.test.mjs --reporter=verbose
npx vitest run test/integration/web-ai-cli-contract.test.mjs --reporter=verbose

Release

agbrowse publishes through npm Trusted Publishing from GitHub Actions. The local release scripts prepare the version commit, push main, dispatch .github/workflows/release.yml, and watch the run; they do not run a real local npm publish.

Run release commands from a clean main checkout:

npm run release                    # bump patch, dry-run through release.yml
npm run release -- minor           # bump minor, dry-run
npm run release -- 0.2.0           # explicit version, dry-run
npm run release -- 0.2.0 --publish # real publish through GitHub Actions OIDC
npm run release -- watch           # watch latest release.yml run

Preview releases keep the existing preview version shape <base>-preview.<timestamp> and use npm dist-tag preview:

npm run release:preview
npm run release:preview -- 0.2.0
npm run release:preview -- --publish
PREID=rc STAMP=20260621040500 npm run release:preview -- 0.2.0 --publish

The release workflow validates the requested version against package.json, requires main, runs the release gates, performs npm publish --dry-run by default, and only creates the git tag plus GitHub Release after a successful real npm publish. The npm package's trusted publisher must be configured as:

Repository: lidge-jun/agbrowse
Workflow:   release.yml
Action:     npm publish

The release path includes named claim gates for MCP, source audit, trace/policy, structure drift, fixture evals, package dry-run, and high-severity dependency audit. Use npm run test:mcp, npm run test:source-audit, and npm run test:release-gates when checking those surfaces directly.

Phase 22 also wires single-name release gates that fold those checks into one runner (scripts/release-gates.mjs):

npm run gate:all                                  # run every named gate
npm run gate:typecheck                            # node --check + structure drift
npm run gate:tests                                # unit + MCP + source-audit + trace-policy
npm run gate:truth-table-fresh                    # CAPABILITY_TRUTH_TABLE.md ≤ 7 days old
npm run gate:mcp-scope-frozen                     # only the 2 frozen browser_* tools
npm run gate:no-experimental-in-readme-ready-section

The capability/claim truth table for both agbrowse and the cli-jaw mirror lives at structure/CAPABILITY_TRUTH_TABLE.md; update that file in the same commit as any capability or claim change.

Strict-migration baseline checks shipped alongside the gates:

npm run check:strict-baseline    # JSDoc opt-in regression guard
npm run check:module-graph       # module dependency graph regression
npm run smoke:bins               # published bin entrypoints boot
npm run typecheck                # tsc --noEmit on the strict surface

Security Notes

Do not expose the CDP port to untrusted networks.
Do not commit BROWSER_AGENT_HOME.
evaluate executes arbitrary page JavaScript and should only be used by a trusted local agent.
Provider accounts, subscriptions, and generated content remain the user's responsibility.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agbrowse

What's New in 0.1.17

Public Surface

Quick Start

Standalone Search

Research Subcommands

Structured Extraction

What It Is Good For

Architecture Snapshot

Verification Policy

Safety Model

Status

Install CLI

Adaptive URL Fetch (v2)

Fetch Modules (203.x)

WAF Profile Detection

SSRF Mitigation and Redirect Safety

Requirements

Browser Lifecycle

First Login

Install Bundled Skills

Core Browser Commands

Tab Management (Phase 9.1)

Vision Click

Web AI

Web-AI Runtime Capabilities (201.x / 203.8)

Polling Timeouts

Sessions

Durable session recovery

Failure envelope

Render First

ChatGPT

Legacy UI (before 2026-07-10)

ChatGPT Work

ChatGPT Code Mode

Gemini

Grok

File Upload

Generated Images

Batch Follow-Ups

Deep Research

ChatGPT Project Sources

Context Packages

Copy Markdown Fallback

Source Audit

Active Tab Safety

Environment Variables

Troubleshooting

Development

Release

Security Notes

License