@mercuryo-ai/agentbrowse

v0.2.63

Published

20 days ago

Browser automation primitives library for AI agents

0High
0Medium
0Low

xor777

agentbrowse browser automation ai-agent browser-sdk playwright cdp form-filling

@mercuryo-ai/agentbrowse

Give your AI agent a real browser — and a deterministic way to fill fields on it.

AgentBrowse is the browser runtime for agent systems that need to work with real web pages, plus a small set of data-plane primitives for deciding which value belongs in which observed field without routing the values through the LLM prompt. Your app keeps full control of orchestration and business logic; AgentBrowse handles the page and the deterministic apply step.

The package has two main parts:

Browser runtime — launch or attach a browser, observe(...) the page, act(...) on targets, extract(...) structured data, close(...) when done.
Data-plane primitives — match(subject, { from }) picks a candidate for an observed target or form, resolve(plan, { with }) turns a plan into a ready value or artifact through a caller-supplied adapter, and fill(session, subject, plan, { resolver? }) applies the result to the browser deterministically. Default match results never expose raw values to serializable output.

Typical workflow:

open a browser (or attach to an existing one) and get a session;
ask AgentBrowse what's on the page with observe(...);
either act(...) directly, or match(...) → fill(...) when you want a deterministic field-value decision;
use extract(...) when you need structured data instead of an action;
close the session when you are done.

That shape fits naturally into a worker, backend service, CLI, or agent runtime you already have.

Key Terms

Three terms come up repeatedly in the API:

session — the handle AgentBrowse returns from launch(...) or attach(...). You pass it into every later call. The session carries browser identity, runtime state, and sticky-owner metadata so healthy commands reuse one browser owner instead of opening a fresh root CDP attach for every call. If you persist a session across process runs, the next command may repair that owner while the underlying browser is still alive; otherwise the session fails closed and you should launch(...) or attach(...) again.
ref (also targetRef, scopeRef, fillRef) — a stable reference returned by observe(...). You act on references, never on raw CSS selectors. Refs are valid for the page state that produced them, not forever. Navigation, route changes, or a major DOM re-render invalidate them — call observe(...) again to get fresh refs.
CDP — the Chrome DevTools Protocol. Chrome, Chromium, and Playwright all speak it. If a browser exposes a CDP WebSocket URL, AgentBrowse can attach to it.

Optionally, AgentBrowse can call an LLM to understand pages at a higher level. That layer is called the assistive runtime and is only required for extract(...) and goal-driven observe(session, goal).

Install

npm i @mercuryo-ai/agentbrowse

If you want the operator-facing CLI command, install the separate global CLI package:

npm i -g @mercuryo-ai/agentbrowse-cli@latest

@mercuryo-ai/agentbrowse is the library package for imports. It does not install the agentbrowse shell command.

Quick Start

This is the normal managed-browser flow. It does not require LLM setup.

import {
  act,
  close,
  launch,
  navigate,
  observe,
  screenshot,
  status,
} from '@mercuryo-ai/agentbrowse';

const launchResult = await launch('https://example.com');
if (!launchResult.success) {
  throw new Error(launchResult.reason ?? launchResult.message);
}

const { session } = launchResult;

try {
  const observeResult = await observe(session);
  if (!observeResult.success) {
    throw new Error(observeResult.reason ?? observeResult.message);
  }

  const firstActionableTarget = observeResult.targets.find((target) => typeof target.ref === 'string');

  if (firstActionableTarget?.ref) {
    const actResult = await act(session, firstActionableTarget.ref, 'click');
    if (!actResult.success) {
      throw new Error(actResult.reason ?? actResult.message);
    }
  }

  const navigateResult = await navigate(session, 'https://example.com/checkout');
  if (!navigateResult.success) {
    throw new Error(navigateResult.reason ?? navigateResult.message);
  }

  const screenshotResult = await screenshot(session, '/tmp/checkout.png');
  if (!screenshotResult.success) {
    throw new Error(screenshotResult.reason ?? screenshotResult.message);
  }

  const statusResult = await status(session);
  if (!statusResult.alive) {
    throw new Error('Browser is no longer reachable.');
  }
} finally {
  await close(session);
}

Runnable examples live in examples/:

first run npm run build when executing them from this repo
npx tsx examples/basic.ts
npx tsx examples/attach.ts
npx tsx examples/extract.ts
npx tsx examples/match-resolve-fill.ts

The library entrypoint does not load .env files. Environment loading only happens in the CLI entrypoint.

Both launch(...) and attach(...) bootstrap the same sticky-owner lifecycle. That owner may live in-process or in an internal detached host, but it is not a user-managed daemon contract.

Match, Resolve, Fill

Once observe(...) has returned a target or a fillable form, three primitives cover the end-to-end decision of putting a value into a field:

match decides which candidate value fits the observed target or fillable form. Pure and local.
resolve turns a needs_resolution plan into a ready value by calling a caller-supplied adapter. This is the only step that may reach a network, a vault, or an approval UI.
fill applies the matched value to the browser through the same deterministic path as act(...).

Simplest case — the value is already in memory:

import { match, resolve, fill } from '@mercuryo-ai/agentbrowse';

const matched = await match(emailTarget, {
  from: { email: '[email protected]' },
});
// matched.kind === 'ready'
await fill(session, emailTarget, matched);

When the value lives behind a lookup, the candidate carries a resolve plan and your adapter handles the fetch:

const pending = await match(passwordTarget, {
  from: [
    {
      fieldKey: 'password',
      type: 'secret',
      resolve: { kind: 'vault_lookup', key: 'login.password' },
    },
  ],
});
// pending.kind === 'needs_resolution'

const ready = await resolve(pending, { with: vaultResolver });
await fill(session, passwordTarget, ready);

Core rules:

No raw values in public results. match returns plans and refs, not secrets; dereferencing happens inside fill(...) through a non-enumerable accessor. Serialising a ready result across a process boundary drops the value — ship the needs_resolution plan instead and let the downstream side run resolve → fill.
One resolver slot, two accepted shapes. The main AgentbrowseMatchResolver has a required resolve plus optional resolveBatch and optional fill; the narrow AgentbrowseGroupFillHandler has just fill. Both are accepted in the same { resolver } slot on fill(...); resolve(plan, { with }) only accepts the main interface. Use the narrow handler when the caller already has the grouped artifact in hand and only needs to apply it.
Determinism first. Resolved value/artifact refs are stable (value:${candidateRef}:resolved), so the same plan produces the same refs across calls — safe to use in snapshot tests and trace correlation.

See:

Match / Resolve / Fill Guide for the full mental model, walk-throughs, and design rules.
Testing Guide for fixture builders (createFixtureMatchStore, createFixtureResolver, …) used in unit tests for wrappers around these primitives.

Attach To An Existing Browser

If you already have a browser that exposes a CDP WebSocket URL, use attach(...) instead of launch(...).

Common sources of a CDP URL:

a local Chrome or Chromium started with the --remote-debugging-port flag;
a managed cloud browser (Browserbase, Browserless, and similar) that hands you a WebSocket URL;
any other browser runtime Playwright can reach through CDP.

import { attach, observe } from '@mercuryo-ai/agentbrowse';

const attached = await attach('ws://127.0.0.1:9222/devtools/browser/browser-id');
if (!attached.success) {
  throw new Error(attached.reason ?? attached.message);
}

const observeResult = await observe(attached.session);
if (!observeResult.success) {
  throw new Error(observeResult.reason ?? observeResult.message);
}

If your provider gives you a labeled remote session, you can carry that label in the session handle:

const attached = await attach(remoteCdpUrl, {
  provider: 'browserbase',
});

attach(...) is not a separate reconnect mode. It is the second bootstrap path into the same sticky-owner execution model as launch(...). After attach succeeds, later browser commands reuse or repair that owner instead of performing a fresh provider-level root attach on every healthy command.

What Each Main API Does

| API | Use it when | Typical result | | --- | --- | --- | | launch(url?, options?) | You need a new browser session | session, current url, current title | | attach(cdpUrl, options?) | You already have a running browser that exposes CDP | session, current url, current title | | observe(session, goal?) | You want to understand the page | targets, scopes, signals, fillable forms | | act(session, targetRef, action, value?) | You want to click, type, select, fill, or press | action result and target metadata | | match(subject, { from }) | You want to pick a deterministic candidate for an observed target or fillable form | match result (plan or ready ref; no raw value) | | resolve(plan, { with }) | You want to turn one or many needs_resolution plans into ready refs through a caller-supplied adapter | match result(s) marked as ready | | fill(session, subject, plan, { resolver? }) | You want to apply a match result to the browser through the deterministic path | browser action result or structured failure | | navigate(session, url) | You want to move to another page | page metadata after navigation | | extract(session, schema, scopeRef?) | You want structured JSON from the page | data that matches your schema | | screenshot(session, path?) | You want a screenshot artifact | saved path and page metadata | | status(session) | You want to know whether the session is still healthy | liveness, page info, runtime summary | | close(session) | You are done with the browser | close result |

Two common questions:

observe(session) gives you a general inventory of the page.
observe(session, goal) focuses that inventory around a single control to find. Goals shaped as "find <target> in <surface>" ("find the email field", "find May 5 in the open calendar") give the strongest match. For multi-step work, run one goal per observe call.

All main APIs return the same broad result shape:

success path: { success: true, ... }
failure path: { success: false, error, message, reason, ... }

When You Need An Assistive Runtime

You only need assistive runtime when AgentBrowse should call an LLM.

In practice, that mainly means:

extract(...)
better quality goal-based observe(session, goal)

The runtime interface is intentionally small: you provide an object that can create an OpenAI-compatible chat-completions client.

// Pseudocode shape only. For a runnable fetch-based adapter, see
// `examples/extract.ts` and `docs/assistive-runtime.md`.
import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';

const client = createAgentbrowseClient({
  assistiveRuntime: createMyFetchBackedRuntime(),
});

The same pattern works for OpenRouter and other OpenAI-compatible backends.

See:

Assistive Runtime Guide

Session Persistence, Proxy, And Diagnostics

Normal usage is explicit-session based:

call launch(...) or attach(...)
keep the returned session
pass that session into later calls

If you want to restore a session across process runs, use the optional store helpers:

import {
  createBrowserSessionStore,
  loadBrowserSession,
  saveBrowserSession,
} from '@mercuryo-ai/agentbrowse';

saveBrowserSession(session);
const restored = loadBrowserSession();

const store = createBrowserSessionStore({
  rootDir: '/tmp/my-app/browser-state',
});

store.save(session);
const restoredFromCustomRoot = store.load();

Persisted session files contain versioned sticky-owner metadata, not a live Playwright connection. loadBrowserSession() and custom stores intentionally return null for incompatible reconnect-era records or incomplete owner metadata instead of auto-migrating them. After loading a session, call status(restored) or let the next browser command verify or repair ownership. If the underlying browser is gone, the session fails closed and you start fresh with launch(...) or attach(...).

There is no separate daemon API to supervise. close(session) is the public lifecycle boundary for shutting down the internal owner host and, when applicable, the managed browser itself.

If you want to use a proxy, pass it directly to launch(...):

const launchResult = await launch('https://example.com', {
  useProxy: true,
  proxy: 'http://user:[email protected]:8080',
});

Diagnostics are optional. If you need tracing or custom logging, use a client:

import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';

const client = createAgentbrowseClient({
  diagnostics: {
    startStep() {
      return {
        finish() {},
      };
    },
  },
});

See:

Configuration Guide

Testing Wrappers Around AgentBrowse

If your package wraps AgentBrowse and you want a stable test helper for the assistive runtime, use the dedicated testing subpath:

import {
  installFetchBackedTestAssistiveRuntime,
  uninstallTestAssistiveRuntime,
} from '@mercuryo-ai/agentbrowse/testing';

See:

Testing Guide

Protected Fill

Protected fill is for cases where your application already has sensitive values and wants AgentBrowse to apply them to a previously observed form through a guarded browser execution path.

Import it separately:

import { fillProtectedForm } from '@mercuryo-ai/agentbrowse/protected-fill';

See:

Protected Fill Guide

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@mercuryo-ai/agentbrowse

Key Terms

Install

Quick Start

Match, Resolve, Fill

Attach To An Existing Browser

What Each Main API Does

When You Need An Assistive Runtime

Session Persistence, Proxy, And Diagnostics

Testing Wrappers Around AgentBrowse

Protected Fill

Documentation