@nuanu-ai/agentbrowse
v0.2.47
Published
Browser automation CLI for AI agents: control a CDP browser, observe UI surfaces, act on refs, extract data, capture screenshots, complete protected fills, and solve captchas
Maintainers
Readme
@nuanu-ai/agentbrowse
Building agent payments?
We’re building a system that helps AI agents complete payments safely.
This package is currently published mainly because we’re still testing it ourselves, so if you’re reading this, you’re probably earlier than we expected.
If you’re building a new agent system, or improving an existing one, we’d be glad to talk and can offer early access.
Telegram: @albertkai
Browser automation CLI for AI agents.
agentbrowse controls a CDP-reachable browser for external agents that need
explicit browser primitives. It navigates pages, observes UI surfaces and
target refs, performs actions, extracts structured data, captures screenshots,
supports protected stored-secret fill flows, and can solve captchas when the
active session supports them.
This package publishes the public agentbrowse CLI.
Install
Run without installing globally:
npx @nuanu-ai/agentbrowse launch https://example.comOr install globally:
npm i -g @nuanu-ai/agentbrowse
agentbrowse launch https://example.comRequirements
- Node.js 18+
- a Chrome-compatible browser runtime available on the machine
- an AgentPay API key for CLI startup and browsing commands
Commands
Configure AgentPay gateway access:
agentbrowse init ap_...
agentbrowse init ap_... --api-url https://your-project.supabase.co/functions/v1/apiLaunch browser and optionally navigate:
agentbrowse launch https://example.com
agentbrowse launch https://example.com --headlessBy default, launch runs in headful mode. Use --headless only when you
intentionally want a hidden browser:
agentbrowse launch https://example.com --headlessIf you want to state the default mode explicitly in scripts, use
--headful.
Navigate current session:
agentbrowse navigate https://example.com/checkoutObserve available actions/elements:
agentbrowse observe
agentbrowse observe "open checkout and find the pay button"Act on a previously observed target:
agentbrowse act t12 click
agentbrowse act t15 fill "[email protected]"Extract structured data:
agentbrowse extract '{"productName":"string","price":"number"}'
agentbrowse extract '{"shipping":"string"}' t21observe is the tool for visible interactive controls and refs.
extract is for bounded structured page data; do not use it to enumerate
buttons, links, inputs, or refs when observe already provides that inventory.
Capture a screenshot:
agentbrowse screenshot
agentbrowse screenshot --path ./checkout.pngInspect or close the current session:
agentbrowse status
agentbrowse closeRefresh stored-secret metadata for the current page or a specific URL:
agentbrowse get-secrets-catalog
agentbrowse get-secrets-catalog https://example.com/checkoutCreate and complete a protected stored-secret fill:
agentbrowse create-intent f12 ss_card_visa
agentbrowse poll-intent intent_123
agentbrowse fill-secret f12 intent_123Solve captcha when the active session supports it:
agentbrowse solve-captcha --timeout 90Configure
agentbrowse uses the AgentPay backend for browser reasoning and gateway-backed
operations:
agentbrowse init ap_...Use AGENTPAY_API_KEY and AGENTPAY_API_URL only as explicit runtime
overrides.
Runtime model
agentbrowsepersists the active browser session under~/.agentpayagentbrowse initpersists AgentPay gateway configuration for future runs- mock stored secrets for live/manual runs come from one canonical file:
~/.agentpay/mock-stored-secrets.json - repo JSON fixtures under
src/secrets/are fallback seeds and test inputs, not an additional runtime config source - all commands require AgentPay gateway configuration; prefer
agentbrowse initand use env vars only as runtime overrides - the external AI agent remains the orchestration owner
agentbrowseis a single-step browser toolset, not an internal reactive form loop- runtime may enrich
observeoutput with semantic hints and validation evidence, but it should not silently auto-submit, auto-retry, or maintain hidden durable field-state machines on behalf of the agent - protected fills use an explicit intent flow:
get-secrets-catalog(url?)create-intent(fillRef, storedSecretRef)poll-intent(intentId)fill-secret(fillRef, intentId)
- main workflow is:
observe(goal?)act(targetRef, action, value?)extract(schema, scopeRef?)
- screenshots are explicit via
screenshot [--path <file>] solve-captcharequires both:- a session with captcha-solving capability
- AgentPay gateway configuration
Human-readable progress is written to stderr. Command results are written to
stdout.
When launch detects a newer npm release, it prints the reminder to stderr
without changing the JSON result written to stdout.
observe() and extract() responses expose where runtime work came from:
- top-level
sourceshows the winning execution source, for exampledomorstagehand resolvedByshows the concrete path within that sourcedegraded: true+degradationReasonmeans a bounded degraded path was used- each returned target includes
source, for exampledomorstagehand - each returned scope may include
extractScopeLifetime:snapshot= usable only for the current observed page statedurable= intended to survive later rebinding
- reusing an old snapshot scope after a later step can fail with
expired_extract_scope; the correct recovery is a freshobserve
Default verification for this package is no longer unit-only:
pnpm --filter @nuanu-ai/agentbrowse testincludes local fixture browser integration coverage for runtime and extract flows- the default suite also covers one explicit Stagehand-assisted observe fallback plus launch/session smoke at the command boundary
- the e2e portion binds local fixture servers on
127.0.0.1; sandboxed runners that forbid loopbacklisten()calls can fail early withlisten EPERM, so rerun the same command outside the sandbox before treating it as a product regression
See package docs:
Releases are published automatically from main after checks pass. @nuanu-ai/agentbrowse is the only CLI package published by the current npm release workflow.
