agentic-browser-cli

v1.0.0

Published

2 months ago

AI-powered browser automation CLI — automate the web with natural language using Ollama, Anthropic, OpenAI, Azure, AWS Bedrock, Google Vertex AI, or Groq

0High
0Medium
0Low

aisdet

ai browser automation cli ollama anthropic openai azure bedrock vertexai groq langchain playwright mcp llm agent web-automation natural-language

AI Browser CLI

An AI-powered browser automation CLI that accepts natural-language queries and executes web tasks autonomously — fully local, no cloud APIs required.

How it works

User Query (CLI)
      │
      ▼
Provider Selection  ←  ollama | anthropic | openai | azure | bedrock | vertexai | groq
      │
      ▼
LangChain ReAct Agent
      │
      ├── LLM Layer (chosen provider)
      │     ├── Ollama          (local, no API key)
      │     ├── Anthropic Claude
      │     ├── OpenAI
      │     ├── Azure OpenAI
      │     ├── AWS Bedrock
      │     ├── Google Vertex AI
      │     └── Groq
      │
      ├── Tools Layer
      │     ├── Primary  : @playwright/mcp  (MCP subprocess)
      │     └── Fallback : Direct Playwright (in-process)
      │
      └── Session Memory       ← optional context carry-over

Prerequisites

| Requirement | Version | Notes | |-------------|---------|-------| | Node.js | ≥ 18 | nodejs.org | | At least one LLM provider — choose any: | | | | Ollama (local) | any | ollama serve + ollama pull llama3 | | Anthropic Claude | — | ANTHROPIC_API_KEY in .env | | OpenAI | — | OPENAI_API_KEY in .env | | Azure OpenAI | — | AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT | | AWS Bedrock | — | AWS credentials in .env or IAM role | | Google Vertex AI | — | GOOGLE_CLOUD_PROJECT + gcloud auth | | Groq | — | GROQ_API_KEY in .env |

Installation

# 1. Install npm dependencies
npm install

# 2. Install the Chromium browser binary used by Playwright
npm run install:browsers

# 3. Copy and edit environment variables
cp .env.example .env

Optional: global install

npm link
# then call: ai-browser "…"

Configuration (`.env`)

Copy .env.example to .env and fill in the values for the provider(s) you want to use.

Common settings

| Variable | Default | Description | |----------|---------|-------------| | DEFAULT_PROVIDER | ollama | Provider used when --provider is omitted | | HEADLESS | false | Set true to run the browser without a visible window | | BROWSER_TIMEOUT | 30000 | Timeout (ms) for each browser action | | MAX_ITERATIONS | 50 | Maximum agent reasoning steps per query | | AGENT_TIMEOUT | 300000 | Hard timeout (ms) for the full agent run | | MEMORY_DIR | .memory | Folder where session data is stored | | DEBUG | false | Set true to enable verbose output |

Ollama (local)

| Variable | Default | Description | |----------|---------|-------------| | OLLAMA_BASE_URL | http://localhost:11434 | Ollama server endpoint | | DEFAULT_MODEL | llama3 | Default model when --model is omitted |

Anthropic Claude

| Variable | Description | |----------|-------------| | ANTHROPIC_API_KEY | API key from console.anthropic.com |

OpenAI

| Variable | Description | |----------|-------------| | OPENAI_API_KEY | API key from platform.openai.com |

Azure OpenAI

| Variable | Description | |----------|-------------| | AZURE_OPENAI_API_KEY | Azure resource API key | | AZURE_OPENAI_ENDPOINT | https://<resource>.openai.azure.com/ | | AZURE_OPENAI_DEPLOYMENT | Deployment / model name | | AZURE_OPENAI_API_VERSION | API version (default 2024-10-21) |

AWS Bedrock

| Variable | Description | |----------|-------------| | AWS_ACCESS_KEY_ID | AWS access key (or use IAM role) | | AWS_SECRET_ACCESS_KEY | AWS secret key | | AWS_SESSION_TOKEN | Optional session token | | AWS_REGION | Region (default us-east-1) |

Google Vertex AI

| Variable | Description | |----------|-------------| | GOOGLE_CLOUD_PROJECT | GCP project ID | | GOOGLE_CLOUD_LOCATION | Region (default us-central1) |

Run gcloud auth application-default login before using Vertex AI.

Groq

| Variable | Description | |----------|-------------| | GROQ_API_KEY | API key from console.groq.com |

Usage

node src/cli.js [options] <query>

Options:
  -P, --provider <provider>    LLM provider to use       (default: ollama)
                               ollama | anthropic | openai | azure | bedrock | vertexai | groq
  -m, --model <model>          Model / deployment name   (skips the picker)
  -H, --headless               Run browser headlessly
  -v, --verbose                Print tool calls and debug info
  -s, --screenshot             Auto-save screenshots
  --no-memory                  Disable session memory for this run
  --max-iterations <number>    Cap agent reasoning steps  (default: 50)
  --timeout <ms>               Agent timeout              (default: 300000)
  -V, --version                Show version
  -h, --help                   Show help

Built-in subcommands

# Check connection status for ALL configured providers
node src/cli.js status

# List models for a specific provider
node src/cli.js models                    # Ollama (default)
node src/cli.js models --provider anthropic
node src/cli.js models --provider groq

Examples

Using Ollama (local)

node src/cli.js --provider ollama "Search best JavaScript frameworks in 2025"
node src/cli.js --provider ollama --model mistral "Find iPhone 16 price on Amazon"

Using Anthropic Claude

node src/cli.js --provider anthropic "Go to news.ycombinator.com and list the top 5 stories"
node src/cli.js --provider anthropic --model claude-3-opus-20240229 "Extract the main headline from bbc.com"

Using OpenAI

node src/cli.js --provider openai "Fill the contact form on example.com with name 'Jane Doe'"
node src/cli.js --provider openai --model gpt-4-turbo --verbose "Go to MDN and summarise the Fetch API page"

Using Azure OpenAI

node src/cli.js --provider azure "Go to github.com/trending and take a screenshot"

Using AWS Bedrock

node src/cli.js --provider bedrock --model anthropic.claude-3-5-sonnet-20241022-v2:0 "Search for Node.js tutorials"

Using Google Vertex AI

node src/cli.js --provider vertexai "Go to google.com/maps and search for coffee near me"

Using Groq

node src/cli.js --provider groq --model llama3-70b-8192 "Summarise the front page of reuters.com"

Interactive mode (no --provider flag)

node src/cli.js
# → shown a provider picker, then a model picker, then a task prompt

Headless + screenshot

node src/cli.js --provider openai --headless --screenshot "Go to github.com/trending"

Project structure

ai-browser-cli/
│
├── src/
│   ├── cli.js                 ← Entry point  (Commander CLI + provider selection)
│   │
│   ├── agent/
│   │   ├── agent.js           ← LangGraph ReAct agent + streaming
│   │   ├── tools.js           ← MCP tools (primary) + direct Playwright (fallback)
│   │   └── prompts.js         ← System prompt + task-planning template
│   │
│   ├── llm/
│   │   ├── providers.js       ← Multi-provider LLM factory (all 7 providers)
│   │   └── ollama.js          ← Ollama-specific helpers (health-check, pull)
│   │
│   ├── browser/
│   │   └── playwright.js      ← BrowserController (chromium singleton)
│   │
│   ├── memory/
│   │   └── memory.js          ← SessionMemory + LongTermMemory
│   │
│   └── utils/
│       ├── logger.js          ← Coloured logger factory
│       └── retry.js           ← withRetry / withTimeout helpers
│
├── .env.example
├── .gitignore
├── package.json
└── README.md

Tool system

MCP tools (via `@playwright/mcp`)

When available, the agent runs @playwright/mcp as a subprocess and loads its tools through the MCP protocol. These include:

browser_navigate · browser_click · browser_fill · browser_snapshot · browser_screenshot · browser_press_key · browser_scroll · browser_wait_for

Direct Playwright tools (fallback)

If the MCP server cannot start, the agent falls back to Playwright running in-process:

| Tool | Description | |------|-------------| | open_url | Navigate to a URL | | click_element | Click a CSS-selected element | | type_text | Fill an input field | | press_key | Send a keyboard key | | extract_content | Read page text | | get_page_info | Current URL + title | | scroll_page | Scroll the viewport | | wait_for_element | Wait for an element | | wait | Fixed-time pause | | take_screenshot | Capture a screenshot |

Memory system

Session memory (default on)

Stores the last 10 query/answer pairs in .memory/session.json
Injected as context into the next run's system prompt
Disable for a single run: --no-memory

Long-term memory (optional)

Key/value JSON store in .memory/long-term.json
Enable via ENABLE_LONG_TERM_MEMORY=true in .env

Troubleshooting

Ollama not connecting

ollama serve                          # start the server
curl http://localhost:11434/api/tags  # verify it responds

Model not installed (Ollama)

ollama pull llama3
node src/cli.js models --provider ollama   # confirm it appears

Cloud provider credentials missing

The CLI prints a setup hint with the exact variables to add. Copy them into your .env file and re-run. You can also verify all providers at once:

node src/cli.js status

Vertex AI authentication error

gcloud auth application-default login

AWS Bedrock access denied

Ensure the IAM policy attached to your credentials includes bedrock:InvokeModel for the target model ARN.

Playwright / browser not found

npm run install:browsers   # installs Chromium
npx playwright install     # installs all browsers

MCP server fails to start

The agent automatically falls back to direct Playwright. Use --verbose to confirm which mode is active.

Security notes

Never pass real passwords as CLI arguments (they appear in shell history).
Store credentials in .env (excluded from version control via .gitignore).
The agent will not retry login more than twice to avoid account lockouts.
No browser cookies or credentials are persisted between runs.

License

MIT