browser-use

v0.3.0

Published

11 days ago

A TypeScript-first library for programmatic browser control, designed for building AI-powered web agents.

0High
0Medium
0Low

TypeScript port of the popular Python browser-use library — with a native Node.js experience, full type safety, and first-class support for all major LLM providers.

✨ Features

🤖 Autonomous Browser Control — AI-driven navigation, clicking, typing, form filling, scrolling, and tab management
🧠 10+ LLM Providers — OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Groq, Ollama, DeepSeek, OpenRouter, Mistral, Cerebras, and custom providers
👁️ Vision Support — Screenshot-based understanding for visual web interactions
🔧 45+ Built-in Actions — Navigation, element interaction, scrolling, forms, tabs, content extraction, file I/O, and more
🧩 Custom Actions — Extensible registry with Zod schema validation, domain restrictions, and page filters
🔌 MCP Server — Model Context Protocol support for Claude Desktop and MCP-compatible clients
⌨️ CLI Tool — Interactive and one-shot modes for quick browser tasks
🔒 Security First — Sensitive data masking, domain restrictions, and Chromium sandboxing
📊 Observability — Event system, telemetry, performance tracing, and session recording (GIF)
🐳 Docker Ready — Configurable for containerized and CI/CD environments

🚀 Quick Start

Installation

npm install browser-use
# Playwright browsers are installed automatically via postinstall

Set Up Your API Key

export OPENAI_API_KEY=sk-your-api-key
# or ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.

Run Your First Agent

import { Agent } from 'browser-use';
import { ChatOpenAI } from 'browser-use/llm/openai';

const agent = new Agent({
  task: 'Go to google.com and search for "TypeScript tutorials"',
  llm: new ChatOpenAI({
    model: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY,
  }),
});

const history = await agent.run();
console.log('Result:', history.final_result());
console.log('Success:', history.is_successful());

npx tsx example.ts

Use the CLI

# Interactive mode
npx browser-use

# One-shot task
npx browser-use "Go to example.com and extract the page title"

# With specific model
npx browser-use --model claude-sonnet-4-20250514 -p "Search for AI news"

# Headless mode
npx browser-use --headless -p "Check the weather"

# MCP server mode
npx browser-use --mcp

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                    Browser-Use                       │
├─────────────────────────────────────────────────────┤
│  Agent ← MessageManager ← LLM Providers            │
│    ↓                                                 │
│  Controller → Action Registry → BrowserSession      │
│                                      ↓               │
│                                  DomService          │
└─────────────────────────────────────────────────────┘

| Component | Description | | ------------------ | ---------------------------------------------------------------------- | | Agent | Central orchestrator — runs the observe → think → act loop | | Controller | Manages action registration and execution via Registry | | BrowserSession | Playwright wrapper — browser lifecycle, tab management, screenshots | | DomService | Extracts interactive elements with indexed mapping for LLM consumption | | MessageManager | Manages LLM conversation history with token optimization | | LLM Providers | Unified BaseChatModel interface across 10+ providers |

How It Works

Agent receives a natural language task
DomService extracts the current page state (interactive elements + optional screenshot)
LLM analyzes the state and returns actions to take
Controller validates and executes actions through the Registry
Results feed back to the LLM for the next step
Loop continues until done action or max_steps

🔌 LLM Providers

| Provider | Import | Vision | Notes | | ----------------- | ---------------------------- | ------ | --------------------------------------------- | | OpenAI | browser-use/llm/openai | ✅ | Default provider, reasoning models (o1/o3/o4) | | Anthropic | browser-use/llm/anthropic | ✅ | Prompt caching support | | Google Gemini | browser-use/llm/google | ✅ | Extended thinking support | | Azure OpenAI | browser-use/llm/azure | ✅ | Enterprise deployment | | AWS Bedrock | browser-use/llm/aws | ✅ | Claude via AWS | | Groq | browser-use/llm/groq | ❌ | Fastest inference | | Ollama | browser-use/llm/ollama | ❌ | Local/self-hosted models | | DeepSeek | browser-use/llm/deepseek | ❌ | Cost-effective | | OpenRouter | browser-use/llm/openrouter | Varies | Multi-model routing | | Mistral | browser-use/llm/mistral | Varies | Mistral models | | Cerebras | browser-use/llm/cerebras | ❌ | Fast inference |

// OpenAI
import { ChatOpenAI } from 'browser-use/llm/openai';
const llm = new ChatOpenAI({
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

// Anthropic
import { ChatAnthropic } from 'browser-use/llm/anthropic';
const llm = new ChatAnthropic({
  model: 'claude-sonnet-4-20250514',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Google Gemini
import { ChatGoogle } from 'browser-use/llm/google';
const llm = new ChatGoogle('gemini-2.5-flash');

// Ollama (local)
import { ChatOllama } from 'browser-use/llm/ollama';
const llm = new ChatOllama('llama3', 'http://localhost:11434');

// OpenAI Reasoning Models
const llm = new ChatOpenAI({ model: 'o3-mini', reasoningEffort: 'medium' });

🎯 Code Examples

Data Extraction

const agent = new Agent({
  task: `Go to amazon.com, search for "wireless keyboard",
         extract the name, price, and rating of the first 5 products as JSON`,
  llm,
  use_vision: true,
});

const history = await agent.run(30);
console.log(history.final_result());

Form Filling with Sensitive Data

const agent = new Agent({
  task: 'Login to the dashboard',
  llm,
  sensitive_data: {
    '*.example.com': {
      username: process.env.SITE_USERNAME!,
      password: process.env.SITE_PASSWORD!,
    },
  },
  browser_session: new BrowserSession({
    browser_profile: new BrowserProfile({
      allowed_domains: ['*.example.com'],
    }),
  }),
});

Custom Actions

import { Controller, ActionResult } from 'browser-use';
import { z } from 'zod';

const controller = new Controller();

controller.registry.action('Save screenshot to file', {
  param_model: z.object({
    filename: z.string().describe('Output filename'),
  }),
})(async function save_screenshot(params, ctx) {
  const screenshot = await ctx.page.screenshot();
  fs.writeFileSync(`./screenshots/${params.filename}`, screenshot);
  return new ActionResult({
    extracted_content: `Screenshot saved as ${params.filename}`,
  });
});

const agent = new Agent({ task: '...', llm, controller });

Vision Mode & Session Recording

const agent = new Agent({
  task: 'Navigate to hacker news and summarize the top stories',
  llm,
  use_vision: true,
  vision_detail_level: 'high', // 'auto' | 'low' | 'high'
  generate_gif: './session.gif',
});

Multi-Tab Workflows

const agent = new Agent({
  task: `Compare "Sony WH-1000XM5" prices:
    1. Open amazon.com and search for the product
    2. Open bestbuy.com in a new tab and search
    3. Provide a comparison summary`,
  llm,
  use_vision: true,
});

Event System

const agent = new Agent({ task: '...', llm });

agent.eventbus.on('CreateAgentStepEvent', (event) => {
  console.log('Step completed:', event.step_id);
});

await agent.run();

⚙️ Configuration

Agent Options

const agent = new Agent({
  task: 'Your task',
  llm,
  use_vision: true, // Enable screenshot analysis
  max_actions_per_step: 5, // Actions per LLM call
  max_failures: 3, // Max retries on failure
  generate_gif: './recording.gif', // Session recording
  validate_output: true, // Strict output validation
  use_thinking: true, // Extended thinking prompts
  llm_timeout: 60, // LLM call timeout (seconds)
  step_timeout: 180, // Step timeout (seconds)
  extend_system_message: 'Be concise', // Custom prompt additions
});

const history = await agent.run(50); // Max 50 steps

Browser Profile

import { BrowserProfile, BrowserSession } from 'browser-use';

const profile = new BrowserProfile({
  headless: true,
  viewport: { width: 1920, height: 1080 },
  user_data_dir: './my-profile', // Persistent sessions
  allowed_domains: ['*.example.com'], // Domain restrictions
  highlight_elements: true, // Visual debugging
  proxy: { server: 'http://proxy:8080' },
});

const session = new BrowserSession({ browser_profile: profile });
const agent = new Agent({ task: '...', llm, browser_session: session });

Environment Variables

| Variable | Description | | ----------------------------- | ---------------------------------------------- | | OPENAI_API_KEY | OpenAI API key | | ANTHROPIC_API_KEY | Anthropic API key | | GOOGLE_API_KEY | Google API key | | BROWSER_USE_HEADLESS | Run browser headlessly (true/false) | | BROWSER_USE_LOGGING_LEVEL | Log level: debug, info, warning, error | | BROWSER_USE_ALLOWED_DOMAINS | Comma-separated domain allowlist | | ANONYMIZED_TELEMETRY | Enable/disable anonymous telemetry |

See Configuration Guide for the full list.

🔌 MCP Server (Claude Desktop)

Browser-Use can run as an MCP server, exposing browser automation as tools for Claude Desktop:

npx browser-use --mcp

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "browser-use": {
      "command": "npx",
      "args": ["browser-use", "--mcp"],
      "env": {
        "OPENAI_API_KEY": "your-api-key"
      }
    }
  }
}

Available MCP tools: browser_run_task, browser_navigate, browser_click, browser_type, browser_scroll, browser_get_state, browser_extract, browser_screenshot, browser_close.

See MCP Server Guide for more details.

🔒 Security

Sensitive Data Masking — Credentials are automatically masked in logs and LLM context
Domain Restrictions — Lock browser navigation to trusted domains
Domain-scoped Secrets — Credentials are only injected on matching domains
Hard Safety Gate — sensitive_data requires allowed_domains by default
Chromium Sandbox — Enabled by default for production security

const agent = new Agent({
  task: 'Login and fetch invoices',
  llm,
  sensitive_data: {
    '*.example.com': {
      username: process.env.USERNAME!,
      password: process.env.PASSWORD!,
    },
  },
  browser_session: new BrowserSession({
    browser_profile: new BrowserProfile({
      allowed_domains: ['*.example.com'],
    }),
  }),
});

See Security Guide for production deployment best practices.

📚 Documentation

| Document | Description | | ---------------------------------------- | ------------------------------------ | | Quick Start | Get started in 5 minutes | | Architecture | System design and component overview | | API Reference | Complete API documentation | | Configuration | All configuration options | | LLM Providers | Provider setup and comparison | | Actions | Built-in and custom actions | | MCP Server | MCP integration guide | | Security | Security best practices | | Examples | More code examples | | Contributing | Contribution guidelines |

🛠️ Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Lint & format
npm run lint
npm run prettier

# Type checking
npm run typecheck

# Run an example
npx tsx examples/simple-search.ts

Requirements

Node.js >= 18.0.0
LLM API Key — At least one supported provider
Playwright — Installed automatically as a dependency