browser-agent-mcp-farm

v0.7.0

Published

10 days ago

Local browser research farm (MCP server) with a cite-or-fail verification floor: every cited claim must re-match hash-registered bytes, exported as tamper-evident, offline-re-verifiable evidence bundles.

0High
0Medium
0Low

ezboom1111

mcp model-context-protocol playwright browser-automation evidence web-research anti-hallucination claim-gate agent

Browser-Agent MCP Farm

Local browser research farm exposed as an MCP stdio server — a verification floor for agent web research: it captures what the browser actually saw and fails any cited claim that doesn't re-match hash-registered bytes (cite-or-fail), then exports tamper-evident bundles a second agent re-verifies offline. It proves your citations are grounded in the captured bytes — not that the page equals live-origin truth (see docs/THREAT_MODEL.md).

Removed (2026-06-10): the source-navigation selector/calibration subsystem. Per-site selector recipes, their calibration loops, and promotion machinery were excised — model vision + consented-browser capture solved the problem they were built for, and selector recipes rot permanently. The deterministic core (hash registration, cite-or-fail claim gate, caged judge, Merkle bundles, acquisition-tier routing) is unaffected; destination triage and tier routing survive as libraries. Plan + what was kept: docs/SELECTOR_STACK_EXCISION.md; the selector-era README is archived at docs/archive/README-selector-era.md.

Install & 60-second Quickstart

Requirements: Node 24+. Install dependencies and the Chromium browser once:

npm ci
npx playwright install --with-deps chromium
npm run build

Team onboarding shortcut (from a cloned repo): ./install.ps1 (Windows) or sh install.sh (macOS/Linux) runs the steps above and register-all in one go. After registration, serve auto-installs Chromium on first run if it is missing (opt out with FARM_SKIP_BROWSER_AUTOINSTALL=1).

node ./dist/cli.js register-all

This registers an absolute path to your local build (the right choice for a git-clone dev install). When the farm is distributed as a published npm package, register a portable, package-manager-upgradable invocation instead — the host config then carries no build directory and upgrades flow through the package manager (no path re-register):

node ./dist/cli.js register-all --npx
# or pin / use a private scope:  --npx --package-spec @your-org/[email protected]

Because the Claude skill is installed as a copy, serve self-heals a stale skill snapshot after an upgrade (re-copies it when the installed version marker differs from the running package). Re-run register-all after an upgrade to also refresh the Codex guidance block.

Run one auditable, claim-gated evidence capture from the CLI:

node ./dist/cli.js evidence-run --url https://example.com/ --no-frames --wait-ms 0 --timeout-ms 10000

From an agent, call the MCP tool mcp__browser-agent-mcp-farm__farm_evidence_run with { "url": "https://example.com/" }, then read the result with farm_read_report using the returned reportPath. The full agent workflow is in skills/browser-agent-mcp-farm/SKILL.md.

Run the quality gate (build, typecheck, dependency-boundary guard, browser guard, tests + coverage, smokes, audit, STATUS):

npm run verify

Build/test status is tracked in STATUS.md (generated by the gate).

Scope

This package implements the local capture-and-verify slice:

Playwright BrowserContext per lease
lease ownership, TTL, heartbeat, max page, and domain checks
read-only page open/capture
read-write browser actions except payment-like pages
storage-state and persistent-profile modes
profile lock to prevent concurrent writes to the same saved login state
proxy and fingerprint options per lease
artifact bundle writer with hashes, including image-like media artifacts and media indexes
structured transcript artifacts parsed from legitimately captured WebVTT files
MCP stdio server wrapper
MCP farm_evidence_run workflow tool
Codex and Claude MCP auto-registration
wait, selector wait, scroll, and capture-after-idle tools
timestamped browser-visible frame sampling for media elements
typed evidence kinds, claim types, and verification levels
final claim-gate checks for visual frame, transcript cue, and audio transcription evidence
optional OCR pass over sampled frames with timestamp, language, confidence, word-box, script, and price-like text-profile metadata when tesseract.js is installed
dense frame sampling windows around browser-exposed transcript cue hits, OCR text hits, and browser-visible scene-change hits, with typed per-source diagnostics in run assessments
explicit-credential official API metadata attempts and per-run API cache artifacts
source strategy classification for search, map, blog, portal/news, travel booking, commerce, video/social, and generic web sources
source coverage registry for category/locale/top-slot planning, including explicit ko-KR, en-US, ja-JP search, and global representative slots, support tiers, AI derivative evidence, and private-network capture policy
(removed 2026-06-10) the per-site source-navigation selector/recipe/calibration subsystem: selector recipes rot, and a consented browser + model vision reads portal pages without them. The selector-era scope chronicle is preserved in docs/archive/README-selector-era.md; the excision plan and what was kept (destination triage + acquisition-tier routing as libraries) in docs/SELECTOR_STACK_EXCISION.md.
browser-visible obstruction classification for login walls, app interstitials, bot blocks, region gates, age gates, and unavailable media pages
cautious browser overlay dismissal before evidence capture for ordinary close/not-now/reject/necessary-only surfaces, without clicking login, CAPTCHA, age-gate, payment, or app-open actions
local HTTP queue for evidence-run jobs
package metadata and GitHub Actions verification workflow
unit and smoke tests

Out of scope:

payment actions
DRM bypass or raw platform video download
production remote multi-user server
published npm distribution

Commands

npm ci
npm test
npm run test:ocr-integration
npm run test:official-api
npm run build
npm run verify
node .\dist\cli.js serve
node .\dist\cli.js serve-http --port 3333
node .\dist\cli.js smoke
node .\dist\cli.js smoke-web --timeout-ms 10000
node .\dist\cli.js smoke-media
node .\dist\cli.js smoke-proxy
node .\dist\cli.js claim-gate --run-dir <path> --mode final --min-claims 1
node .\dist\cli.js html-preview --run-dir <path>
node .\dist\cli.js critique-next --queue <path>
node .\dist\cli.js critique-complete --queue <path> --task-id MEDIA-CRIT-01
node .\dist\cli.js platform-capabilities --url https://www.youtube.com/watch?v=dQw4w9WgXcQ
node .\dist\cli.js official-api-readiness --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --youtube-api-key-env FARM_YOUTUBE_API_KEY
node .\dist\cli.js source-registry --category search --locale ko-KR
node .\dist\cli.js destination-recovery-plan --run-dir <evidence-run-dir> --format commands
node .\dist\cli.js evidence-run --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --timestamps-sec 0,10 --dense-sampling
node .\dist\cli.js evidence-run --url https://example.com/ --profile my-site --headed --official-api
node .\dist\cli.js auth-login --profile my-site --url https://example.com/login --wait-ms 120000 --chrome
node .\dist\cli.js auth-cdp-launch --profile my-site --url https://accounts.google.com/ --port 9222
node .\dist\cli.js auth-cdp-import --profile my-site --cdp-url http://127.0.0.1:9222 --save-now --cookie-domains google.com,youtube.com
node .\dist\cli.js profile-list
node .\dist\cli.js register-all

claim-gate exits non-zero when a claim cites missing or unregistered evidence. In --mode final, it also fails zero-claim reports by default.

smoke-media serves a local page with PNG, SVG, poster, VTT, and video resources. Image-like resources and VTT files are written under media/; captured VTT files are also parsed into structured/*.transcripts/*.json. Video/audio/stream resources are indexed in structured/*.media-index.json unless a legitimate byte source is captured without bypassing platform limits.

html-preview writes html/farm-evidence-preview.html with screenshot thumbnails and links to raw artifacts.

critique-next prints exactly one next media critical review task. It does not mutate the queue. critique-complete advances the queue only when that task's configured output file exists and is non-empty, so a 10-round review cannot be collapsed into one untracked response.

platform-capabilities prints a static, source-linked capability map for YouTube, Instagram, TikTok, or a generic browser fallback. It does not fetch the URL; it labels each evidence path as available, unavailable, or not_attempted with credential and legal constraints.

official-api-readiness --url <url> checks official API credential readiness without calling provider APIs. It reports which supported lookups exist for the URL, which credential env var references were supplied, and whether those env vars are set, while never printing raw token values. Search, hashtag, profile, or listing URLs on supported platforms report missing_media_id until a direct media/item URL or destination follow-up provides a stable media ID. Add --fail-not-ready in automation when any missing reference/env/media ID should exit non-zero.

evidence-run is the first-class workflow wrapper: it writes platform capability artifacts, writes a source strategy artifact, attempts a browser page capture, samples timestamped browser-visible frames unless --no-frames is set, writes an assessment report, optionally runs OCR over sampled frames, optionally collects credentials-gated official API metadata, classifies browser-visible obstructions, adds typed claim/citation ledgers, and runs the final claim gate. Dense sampling can trigger from browser-exposed transcript cues, OCR text hits, and browser-visible scene changes detected through a small canvas fingerprint of sampled video frames. Run assessments preserve dense sampling events with the trigger source, hit timestamps, planned timestamps, captured timestamps, caps, and scene-change distances when available. Scene-change diagnostics also include threshold recommendations when the sampled distance distribution suggests keeping, lowering, raising, or manually reviewing the current threshold, and scene-change hit expansion can be capped independently from the dense frame capture budget. When --dense-sampling and --ocr are both enabled, verified OCR text hits can trigger additional browser-visible frame sampling around the hit timestamp, followed by OCR over those dense frames. Before capture, the workflow also attempts a bounded dismissal pass for ordinary overlays such as close buttons, not-now prompts, newsletter modals, and reject/necessary-only cookie banners. It records that pass as browser_overlay_dismissal evidence when an action occurs, and it intentionally skips login, CAPTCHA, age-gate, payment, accept-all, and app-open buttons. The pass is configurable through overlayDismissal in MCP/HTTP input and through CLI flags. Login walls, app-open interstitials, bot blocks, region/age gates, and unavailable-media pages are recorded as structured browser_obstruction evidence instead of being treated as successful content access. Audio and transcript understanding remain marked unverified unless an authorized caption body, transcript cue, or audio transcription artifact exists in the run. The farm performs no audio capture or speech-to-text (an explicit non-goal): audio_transcription is a lawful provider/operator-supplied transcript only, and the autonomous pipeline emits captured captions as transcript_cue, never a fabricated transcription. The workflow also reports stage timings for setup, browser open/capture/frame sampling, official API, OCR, OCR-hit dense sampling, scene-change dense sampling, overlay dismissal, obstruction classification, claim gate, and final report generation.

Every source-registry entry carries a legalBasis recording the lawful access posture under which the farm reads it: public_browser_visible (public pages a human can view without auth, robots-respecting, no bypass), official_api (the provider's official API under the operator's own credentials), user_provided (an operator-supplied authenticated session the user owns), derivative_citation (AI/aggregator output used only to point at primary sources), or planning_only. It records the INTENDED basis, not a license, and pairs with the hard non-goals — no login/CAPTCHA/paywall/age-gate bypass, no raw audio/video stream download, no payment/booking/account actions.

Source-navigation recipe execution was removed 2026-06-10 with the selector subsystem (docs/SELECTOR_STACK_EXCISION.md): per-site selector recipes rot, and a consented browser + model vision navigates portal pages without them. Navigate with your own (consented) browser or the manual farm tools, then capture/register the bytes; the claim gate still validates the historical source_navigation_* / destination_* evidence kinds in old runs, including their destination-provenance citation chain.

destination-recovery-plan --run-dir <evidence-run-dir> is a read-only handoff inspector for destination_triage artifacts. It extracts blockedChildRecoveryAdvice from a completed evidence run and can print JSON, Markdown, all commands, setup-only commands, retry-only commands, or a preflight check without opening a browser. Use this after a run reports blocked child recovery candidates to copy the Chrome persistent-profile setup and headed evidence-run commands from the artifact bundle instead of re-opening raw triage JSON. If an older triage artifact only contains blockedChildRecoveryCandidates, the CLI synthesizes equivalent profile/headed recovery advice from those candidates, and it tolerates UTF-8 BOM-prefixed JSON artifacts written by Windows handoff scripts. JSON and Markdown output mark whether each item came from original artifact advice or synthesized recovery candidates. Markdown output includes the same preflight summary, and --check-profiles adds saved-profile readiness to that handoff. Add --format check --check-profiles --fail-check when QA should confirm the deterministic recovery profile already exists before retry execution, or --only-check-ok when rendering only passing recovery commands.

Useful evidence-run options:

--profile <name> reuses a saved profile from auth-login.
--persistent-profile uses a full Chromium user data directory.
--headed opens a visible Chromium window for CLI debugging.
--http-fetch tries a browserless HTTP GET first (tier-0) and escalates to the browser if it declines; no screenshot/frames on the tier-0 path.
--auto-capture like --http-fetch, but escalates on any decline (client-rendered shell / non-HTML / off-domain / bot-block), so it is never a worse capture than the browser. A server-rendered page is captured without launching Chromium and labelled http_fetch.
--text-only text capture profile: blocks image/media/font + ad-host subrequests and skips the page screenshot (faster text/structure-only runs).
--capture-cache opt-in replay: reuse a fresh (≤ 1 h) prior bare-ephemeral capture by content hash instead of launching the browser; the page claim is labelled cached_capture with its staleness age.
--no-overlay-dismissal disables the cautious pre-capture overlay dismissal pass.
--overlay-dismissal-max-actions <0-10> changes how many ordinary overlay dismissals can happen before capture; the default is 3.
--ocr runs bounded OCR over sampled frame screenshots. The tesseract.js engine auto-installs as an optional dependency, so it is normally present; if a lean/offline install skipped it, OCR records an OCR-unavailable artifact (run npm install tesseract.js to enable). Empty text and low-confidence text are recorded as partial status so they do not become verified OCR evidence. OCR text-profile metadata distinguishes price, percent/discount, map/local, travel/commerce, rating, distance, hours, contact/address, reservation, menu, and commerce policy text.
--ocr-language <lang> passes a language code such as eng or eng+kor to tesseract.js.
--ocr-min-confidence <0-100> marks OCR text partial when reported confidence is below the threshold.
--dense-sampling captures additional frame windows around browser-exposed transcript cue hits, browser-visible scene changes, and, when OCR is enabled and available, OCR text hits.
--dense-scene-threshold <1-64> sets the 8x8 visual fingerprint hamming distance needed to treat adjacent sampled frames as a scene-change hit.
--dense-scene-max-hits <1-120> caps how many scene-change midpoints are expanded before the dense frame cap is applied; by default it follows --dense-max-frames.
--no-dense-scene-change disables scene-change dense sampling while leaving transcript/OCR dense sampling enabled.
--official-api attempts supported platform APIs only through explicit env var credential references such as --youtube-api-key-env YOUTUBE_API_KEY.

See docs/OFFICIAL_API.md for YouTube, Instagram, TikTok credential setup and the opt-in live integration harness. Normal npm test runs do not call live official APIs.

See docs/OCR.md for optional tesseract.js setup and the opt-in OCR integration harness. Normal npm test does not require the OCR engine.

See docs/DOCUMENTATION_MAP.md for the complete development documentation map, including the recommended Codex/Claude reading order, QA/QC process, release notes, and the current verification caveat. See docs/CLAUDE_HANDOFF.md for a copyable handoff prompt for Claude.

See docs/SOURCE_STRATEGY.md for the generic plan for Naver Map, Naver Blog, Google Search/Maps, Agoda, Trip.com, Booking.com, Expedia, and similar sources. See docs/INFORMATION_SOURCE_TAXONOMY.md for the category/locale coverage registry that prioritizes top platform slots across search, social, community, content, news, review, map/local, commerce, knowledge, private network, recommendation, and AI-agent sources.

serve-http starts a local JSON queue. Use --concurrency <n> to run multiple evidence jobs and --max-terminal-jobs <n> to bound retained completed, failed, and canceled jobs. Endpoints:

GET /health
POST /evidence-run
GET /jobs
GET /jobs?status=queued|running|completed|failed|canceled
GET /jobs/:id
POST /jobs/:id/cancel
DELETE /jobs/:id
POST /jobs/prune

Queued jobs are canceled immediately. Running jobs receive an abort signal and unwind at workflow and BrowserPool abort checkpoints; owned browser pools are released/shut down during cleanup. This server is intended for local orchestration, not a production shared service. Job responses include lifecycle diagnostics such as startedAt, finishedAt, queueDurationMs, runDurationMs, totalDurationMs, and abortLatencyMs when applicable.

auth-login opens a visible browser and saves storage state under ~/.gstack/browser-profiles/<profile>/storage-state.json. Use it for normal service login flows: the site opens its login/consent popup, the user finishes login manually, then the saved profile can be reused by farm leases. Add --persistent-profile when the site needs a full Chromium user data directory instead of storage-state only.

The saved profile directory is created owner-only (POSIX chmod 0700; Windows icacls). On Windows you can additionally encrypt storage-state.json at rest with DPAPI by setting FARM_ENCRYPT_STORAGE_STATE=1: the file is stored as an opaque DPAPI wrapper and transparently decrypted in memory when a lease uses it (no plaintext temp is ever written). Encryption is opt-in and best-effort — off Windows, or if it fails, the file stays plaintext under the owner-only directory — and decryption of an existing wrapper is always attempted even with the flag unset. DPAPI CurrentUser protects an at-rest/offline copy of the file, not against code already running as the logged-in user. The persistent-profile user-data directory is Chromium's own (already DPAPI-encrypted) store. Add --chrome or --browser-channel chrome to use the installed Chrome channel instead of the bundled Playwright Chromium for sites that reject automation-oriented browser builds.

auth-cdp-launch opens a user-controlled Chrome window with a local DevTools port and a farm profile user-data directory. auth-cdp-import then attaches to that Chrome session and saves cookies/storage state into the farm profile without reading passwords. Use this pair when a platform rejects direct Playwright login: launch Chrome, complete login in that Chrome window, then run auth-cdp-import --profile <name> --cdp-url http://127.0.0.1:9222. Add --save-now to skip the Enter prompt and --cookie-domains <a,b> to save only the target platform's cookies/origins from the attached Chrome session.

Only one active lease may use a given saved profile at a time. This prevents two browser workers from overwriting the same cookies, localStorage, or IndexedDB snapshot.

Payment pages remain blocked for write actions.

register-all installs the MCP server into the local Codex and Claude user configs and creates timestamped backups before editing config files.

GStack Upgrade Safety

This farm is an independent local package. It runs from the absolute path registered in Codex/Claude config. A normal gstack skill upgrade updates ~/.codex/skills/gstack*; it should not overwrite this local package or the MCP config marker block.

After any gstack or agent-host upgrade, run:

npm run verify
node .\dist\cli.js register-all
claude mcp get browser-agent-mcp-farm

If Codex does not expose mcp__browser-agent-mcp-farm__* tools after an upgrade, restart Codex once and run register-all again.

MCP Write Tools

Write tools require a lease with capability: "read-write":

farm_click
farm_fill
farm_press
farm_select_option

Read/navigation helpers are available for slower dynamic pages and long-scroll research pages:

farm_evidence_run
farm_wait
farm_wait_for_selector
farm_scroll
farm_capture_after_idle
farm_sample_frames

farm_sample_frames seeks a browser-visible media element to timestamped positions and writes one screenshot bundle per frame. It does not download raw video bytes. Each frame metadata includes timestamp, seek result, active caption cues when the page exposes them, and a small browser-visible visual fingerprint when the page allows canvas reads. It also records available <track> elements and text-track metadata in the summary artifact.

farm_evidence_run exposes the same evidence workflow through MCP. It uses the server BrowserPool lifecycle, so visible headed debugging remains a CLI-only option.

Trust, Verification & Structured Tools

Read-only tools to inspect and re-verify evidence (no browser):

farm_list_runs — discover prior run directories (artifact/claim counts).
farm_read_report — read a run's Markdown report by reportPath.
farm_list_artifacts — list a run's artifact ledger (optional evidenceKind).
farm_read_artifact — read one artifact's bytes (text/base64), re-hashing on read to flag tampering.
farm_run_claim_gate — re-validate a run's claims against its artifacts.
farm_capabilities — server identity, evidence kinds, non-goals, optional deps.

Cite-or-fail authoring (so the gate covers your own answer, not just runner output):

farm_register_evidence — register the bytes you saw as a hash-verified artifact.
farm_add_claim — author a claim citing an artifact with an anchor; the gate rejects a claim whose quoted text is not present in the cited bytes.

Portable attestation and structured derivatives:

farm_export_bundle / farm_verify_bundle — a Merkle-rooted (optionally Ed25519-signed) manifest a second agent can re-verify offline; also available as the export-bundle / verify-bundle CLI commands. Add --archive-file <bundle.evb> to produce/verify a self-contained .evb that embeds the artifact bytes and verifies with no access to the original run directory.
farm_extract_structured — deterministic JSON-LD / Open Graph / typed price+rating extraction from captured HTML (a site claim — cross-check it).

Worked agent-to-agent verifiable exchange

The .evb archive lets one agent trust another's evidence by trusting hashes, not the producer:

# Agent A (captured the evidence, holds a private signing key)
node dist/cli.js export-bundle --run-dir <A-run> --archive-file bundle.evb \
  --private-key-env A_SIGNING_KEY

# Agent B receives only bundle.evb + A's PUBLIC key — no run dir, no browser
node dist/cli.js verify-bundle --archive-file bundle.evb --public-key-env A_PUBLIC_KEY
# -> { ok: true, complete: true, merkleMatches: true, signatureValid: true }

B re-hashes the embedded bytes, recomputes the Merkle root, and checks A's signature fully offline. It rejects a bundle whose bytes were altered in transit (tamperedArtifacts), and a bundle signed by an impostor key (signatureValid: false). The success criterion is "one cooperating second agent can verify", not "the world converges" — see tests/evidence-exchange.test.ts for the worked A→B example.

See docs/THREAT_MODEL.md for exactly what the claim gate and the bundle prove and do not prove.

The cite-or-fail boundary is regression-tested by a versioned, seed-deterministic property-based fuzz corpus (scripts/fuzz-corpus.json, run by npm run qa:fuzz and the qa CI workflow): it currently holds 0 hallucination leaks across 1,200 fabrication / near-miss / recombination trials over 8 corpus seeds. A new hard case is added by appending a seed, never removing one.

The payment guard blocks write actions on URLs, selectors, and target element text/attributes containing payment-like terms such as checkout, payment, billing, credit-card, card number, cvv, pay now, or 결제.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme