@nuanu-ai/agentbrowse

v0.2.47

Published

17 hours ago

Browser automation CLI for AI agents: control a CDP browser, observe UI surfaces, act on refs, extract data, capture screenshots, complete protected fills, and solve captchas

0High
0Medium
0Low

xor777

agentbrowse browser automation ai-agent cdp

@nuanu-ai/agentbrowse

Building agent payments?
We’re building a system that helps AI agents complete payments safely.
This package is currently published mainly because we’re still testing it ourselves, so if you’re reading this, you’re probably earlier than we expected.
If you’re building a new agent system, or improving an existing one, we’d be glad to talk and can offer early access.
Telegram: @albertkai

Browser automation CLI for AI agents.

agentbrowse controls a CDP-reachable browser for external agents that need explicit browser primitives. It navigates pages, observes UI surfaces and target refs, performs actions, extracts structured data, captures screenshots, supports protected stored-secret fill flows, and can solve captchas when the active session supports them.

This package publishes the public agentbrowse CLI.

Install

Run without installing globally:

npx @nuanu-ai/agentbrowse launch https://example.com

Or install globally:

npm i -g @nuanu-ai/agentbrowse
agentbrowse launch https://example.com

Requirements

Node.js 18+
a Chrome-compatible browser runtime available on the machine
an AgentPay API key for CLI startup and browsing commands

Commands

Configure AgentPay gateway access:

agentbrowse init ap_...
agentbrowse init ap_... --api-url https://your-project.supabase.co/functions/v1/api

Launch browser and optionally navigate:

agentbrowse launch https://example.com
agentbrowse launch https://example.com --headless

By default, launch runs in headful mode. Use --headless only when you intentionally want a hidden browser:

agentbrowse launch https://example.com --headless

If you want to state the default mode explicitly in scripts, use --headful.

Navigate current session:

agentbrowse navigate https://example.com/checkout

Observe available actions/elements:

agentbrowse observe
agentbrowse observe "open checkout and find the pay button"

Act on a previously observed target:

agentbrowse act t12 click
agentbrowse act t15 fill "[email protected]"

Extract structured data:

agentbrowse extract '{"productName":"string","price":"number"}'
agentbrowse extract '{"shipping":"string"}' t21

observe is the tool for visible interactive controls and refs. extract is for bounded structured page data; do not use it to enumerate buttons, links, inputs, or refs when observe already provides that inventory.

Capture a screenshot:

agentbrowse screenshot
agentbrowse screenshot --path ./checkout.png

Inspect or close the current session:

agentbrowse status
agentbrowse close

Refresh stored-secret metadata for the current page or a specific URL:

agentbrowse get-secrets-catalog
agentbrowse get-secrets-catalog https://example.com/checkout

Create and complete a protected stored-secret fill:

agentbrowse create-intent f12 ss_card_visa
agentbrowse poll-intent intent_123
agentbrowse fill-secret f12 intent_123

Solve captcha when the active session supports it:

agentbrowse solve-captcha --timeout 90

Configure

agentbrowse uses the AgentPay backend for browser reasoning and gateway-backed operations:

agentbrowse init ap_...

Use AGENTPAY_API_KEY and AGENTPAY_API_URL only as explicit runtime overrides.

Runtime model

agentbrowse persists the active browser session under ~/.agentpay
agentbrowse init persists AgentPay gateway configuration for future runs
mock stored secrets for live/manual runs come from one canonical file: ~/.agentpay/mock-stored-secrets.json
repo JSON fixtures under src/secrets/ are fallback seeds and test inputs, not an additional runtime config source
all commands require AgentPay gateway configuration; prefer agentbrowse init and use env vars only as runtime overrides
the external AI agent remains the orchestration owner
agentbrowse is a single-step browser toolset, not an internal reactive form loop
runtime may enrich observe output with semantic hints and validation evidence, but it should not silently auto-submit, auto-retry, or maintain hidden durable field-state machines on behalf of the agent
protected fills use an explicit intent flow:
- get-secrets-catalog(url?)
- create-intent(fillRef, storedSecretRef)
- poll-intent(intentId)
- fill-secret(fillRef, intentId)
main workflow is:
- observe(goal?)
- act(targetRef, action, value?)
- extract(schema, scopeRef?)
screenshots are explicit via screenshot [--path <file>]
solve-captcha requires both:
- a session with captcha-solving capability
- AgentPay gateway configuration

Human-readable progress is written to stderr. Command results are written to stdout.

When launch detects a newer npm release, it prints the reminder to stderr without changing the JSON result written to stdout.

observe() and extract() responses expose where runtime work came from:

top-level source shows the winning execution source, for example dom or stagehand
resolvedBy shows the concrete path within that source
degraded: true + degradationReason means a bounded degraded path was used
each returned target includes source, for example dom or stagehand
each returned scope may include extractScopeLifetime:
- snapshot = usable only for the current observed page state
- durable = intended to survive later rebinding
reusing an old snapshot scope after a later step can fail with expired_extract_scope; the correct recovery is a fresh observe

Default verification for this package is no longer unit-only:

pnpm --filter @nuanu-ai/agentbrowse test includes local fixture browser integration coverage for runtime and extract flows
the default suite also covers one explicit Stagehand-assisted observe fallback plus launch/session smoke at the command boundary
the e2e portion binds local fixture servers on 127.0.0.1; sandboxed runners that forbid loopback listen() calls can fail early with listen EPERM, so rerun the same command outside the sandbox before treating it as a product regression

See package docs:

Releases are published automatically from main after checks pass. @nuanu-ai/agentbrowse is the only CLI package published by the current npm release workflow.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@nuanu-ai/agentbrowse

Install

Requirements

Commands

Configure

Runtime model