npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

browser-use

v0.3.0

Published

A TypeScript-first library for programmatic browser control, designed for building AI-powered web agents.

Readme


TypeScript port of the popular Python browser-use library — with a native Node.js experience, full type safety, and first-class support for all major LLM providers.

✨ Features

  • 🤖 Autonomous Browser Control — AI-driven navigation, clicking, typing, form filling, scrolling, and tab management
  • 🧠 10+ LLM Providers — OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Groq, Ollama, DeepSeek, OpenRouter, Mistral, Cerebras, and custom providers
  • 👁️ Vision Support — Screenshot-based understanding for visual web interactions
  • 🔧 45+ Built-in Actions — Navigation, element interaction, scrolling, forms, tabs, content extraction, file I/O, and more
  • 🧩 Custom Actions — Extensible registry with Zod schema validation, domain restrictions, and page filters
  • 🔌 MCP Server — Model Context Protocol support for Claude Desktop and MCP-compatible clients
  • ⌨️ CLI Tool — Interactive and one-shot modes for quick browser tasks
  • 🔒 Security First — Sensitive data masking, domain restrictions, and Chromium sandboxing
  • 📊 Observability — Event system, telemetry, performance tracing, and session recording (GIF)
  • 🐳 Docker Ready — Configurable for containerized and CI/CD environments

🚀 Quick Start

Installation

npm install browser-use
# Playwright browsers are installed automatically via postinstall

Set Up Your API Key

export OPENAI_API_KEY=sk-your-api-key
# or ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.

Run Your First Agent

import { Agent } from 'browser-use';
import { ChatOpenAI } from 'browser-use/llm/openai';

const agent = new Agent({
  task: 'Go to google.com and search for "TypeScript tutorials"',
  llm: new ChatOpenAI({
    model: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY,
  }),
});

const history = await agent.run();
console.log('Result:', history.final_result());
console.log('Success:', history.is_successful());
npx tsx example.ts

Use the CLI

# Interactive mode
npx browser-use

# One-shot task
npx browser-use "Go to example.com and extract the page title"

# With specific model
npx browser-use --model claude-sonnet-4-20250514 -p "Search for AI news"

# Headless mode
npx browser-use --headless -p "Check the weather"

# MCP server mode
npx browser-use --mcp

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                    Browser-Use                       │
├─────────────────────────────────────────────────────┤
│  Agent ← MessageManager ← LLM Providers            │
│    ↓                                                 │
│  Controller → Action Registry → BrowserSession      │
│                                      ↓               │
│                                  DomService          │
└─────────────────────────────────────────────────────┘

| Component | Description | | ------------------ | ---------------------------------------------------------------------- | | Agent | Central orchestrator — runs the observe → think → act loop | | Controller | Manages action registration and execution via Registry | | BrowserSession | Playwright wrapper — browser lifecycle, tab management, screenshots | | DomService | Extracts interactive elements with indexed mapping for LLM consumption | | MessageManager | Manages LLM conversation history with token optimization | | LLM Providers | Unified BaseChatModel interface across 10+ providers |

How It Works

  1. Agent receives a natural language task
  2. DomService extracts the current page state (interactive elements + optional screenshot)
  3. LLM analyzes the state and returns actions to take
  4. Controller validates and executes actions through the Registry
  5. Results feed back to the LLM for the next step
  6. Loop continues until done action or max_steps

🔌 LLM Providers

| Provider | Import | Vision | Notes | | ----------------- | ---------------------------- | ------ | --------------------------------------------- | | OpenAI | browser-use/llm/openai | ✅ | Default provider, reasoning models (o1/o3/o4) | | Anthropic | browser-use/llm/anthropic | ✅ | Prompt caching support | | Google Gemini | browser-use/llm/google | ✅ | Extended thinking support | | Azure OpenAI | browser-use/llm/azure | ✅ | Enterprise deployment | | AWS Bedrock | browser-use/llm/aws | ✅ | Claude via AWS | | Groq | browser-use/llm/groq | ❌ | Fastest inference | | Ollama | browser-use/llm/ollama | ❌ | Local/self-hosted models | | DeepSeek | browser-use/llm/deepseek | ❌ | Cost-effective | | OpenRouter | browser-use/llm/openrouter | Varies | Multi-model routing | | Mistral | browser-use/llm/mistral | Varies | Mistral models | | Cerebras | browser-use/llm/cerebras | ❌ | Fast inference |

// OpenAI
import { ChatOpenAI } from 'browser-use/llm/openai';
const llm = new ChatOpenAI({
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

// Anthropic
import { ChatAnthropic } from 'browser-use/llm/anthropic';
const llm = new ChatAnthropic({
  model: 'claude-sonnet-4-20250514',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Google Gemini
import { ChatGoogle } from 'browser-use/llm/google';
const llm = new ChatGoogle('gemini-2.5-flash');

// Ollama (local)
import { ChatOllama } from 'browser-use/llm/ollama';
const llm = new ChatOllama('llama3', 'http://localhost:11434');

// OpenAI Reasoning Models
const llm = new ChatOpenAI({ model: 'o3-mini', reasoningEffort: 'medium' });

🎯 Code Examples

Data Extraction

const agent = new Agent({
  task: `Go to amazon.com, search for "wireless keyboard",
         extract the name, price, and rating of the first 5 products as JSON`,
  llm,
  use_vision: true,
});

const history = await agent.run(30);
console.log(history.final_result());

Form Filling with Sensitive Data

const agent = new Agent({
  task: 'Login to the dashboard',
  llm,
  sensitive_data: {
    '*.example.com': {
      username: process.env.SITE_USERNAME!,
      password: process.env.SITE_PASSWORD!,
    },
  },
  browser_session: new BrowserSession({
    browser_profile: new BrowserProfile({
      allowed_domains: ['*.example.com'],
    }),
  }),
});

Custom Actions

import { Controller, ActionResult } from 'browser-use';
import { z } from 'zod';

const controller = new Controller();

controller.registry.action('Save screenshot to file', {
  param_model: z.object({
    filename: z.string().describe('Output filename'),
  }),
})(async function save_screenshot(params, ctx) {
  const screenshot = await ctx.page.screenshot();
  fs.writeFileSync(`./screenshots/${params.filename}`, screenshot);
  return new ActionResult({
    extracted_content: `Screenshot saved as ${params.filename}`,
  });
});

const agent = new Agent({ task: '...', llm, controller });

Vision Mode & Session Recording

const agent = new Agent({
  task: 'Navigate to hacker news and summarize the top stories',
  llm,
  use_vision: true,
  vision_detail_level: 'high', // 'auto' | 'low' | 'high'
  generate_gif: './session.gif',
});

Multi-Tab Workflows

const agent = new Agent({
  task: `Compare "Sony WH-1000XM5" prices:
    1. Open amazon.com and search for the product
    2. Open bestbuy.com in a new tab and search
    3. Provide a comparison summary`,
  llm,
  use_vision: true,
});

Event System

const agent = new Agent({ task: '...', llm });

agent.eventbus.on('CreateAgentStepEvent', (event) => {
  console.log('Step completed:', event.step_id);
});

await agent.run();

⚙️ Configuration

Agent Options

const agent = new Agent({
  task: 'Your task',
  llm,
  use_vision: true, // Enable screenshot analysis
  max_actions_per_step: 5, // Actions per LLM call
  max_failures: 3, // Max retries on failure
  generate_gif: './recording.gif', // Session recording
  validate_output: true, // Strict output validation
  use_thinking: true, // Extended thinking prompts
  llm_timeout: 60, // LLM call timeout (seconds)
  step_timeout: 180, // Step timeout (seconds)
  extend_system_message: 'Be concise', // Custom prompt additions
});

const history = await agent.run(50); // Max 50 steps

Browser Profile

import { BrowserProfile, BrowserSession } from 'browser-use';

const profile = new BrowserProfile({
  headless: true,
  viewport: { width: 1920, height: 1080 },
  user_data_dir: './my-profile', // Persistent sessions
  allowed_domains: ['*.example.com'], // Domain restrictions
  highlight_elements: true, // Visual debugging
  proxy: { server: 'http://proxy:8080' },
});

const session = new BrowserSession({ browser_profile: profile });
const agent = new Agent({ task: '...', llm, browser_session: session });

Environment Variables

| Variable | Description | | ----------------------------- | ---------------------------------------------- | | OPENAI_API_KEY | OpenAI API key | | ANTHROPIC_API_KEY | Anthropic API key | | GOOGLE_API_KEY | Google API key | | BROWSER_USE_HEADLESS | Run browser headlessly (true/false) | | BROWSER_USE_LOGGING_LEVEL | Log level: debug, info, warning, error | | BROWSER_USE_ALLOWED_DOMAINS | Comma-separated domain allowlist | | ANONYMIZED_TELEMETRY | Enable/disable anonymous telemetry |

See Configuration Guide for the full list.

🔌 MCP Server (Claude Desktop)

Browser-Use can run as an MCP server, exposing browser automation as tools for Claude Desktop:

npx browser-use --mcp

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "browser-use": {
      "command": "npx",
      "args": ["browser-use", "--mcp"],
      "env": {
        "OPENAI_API_KEY": "your-api-key"
      }
    }
  }
}

Available MCP tools: browser_run_task, browser_navigate, browser_click, browser_type, browser_scroll, browser_get_state, browser_extract, browser_screenshot, browser_close.

See MCP Server Guide for more details.

🔒 Security

  • Sensitive Data Masking — Credentials are automatically masked in logs and LLM context
  • Domain Restrictions — Lock browser navigation to trusted domains
  • Domain-scoped Secrets — Credentials are only injected on matching domains
  • Hard Safety Gatesensitive_data requires allowed_domains by default
  • Chromium Sandbox — Enabled by default for production security
const agent = new Agent({
  task: 'Login and fetch invoices',
  llm,
  sensitive_data: {
    '*.example.com': {
      username: process.env.USERNAME!,
      password: process.env.PASSWORD!,
    },
  },
  browser_session: new BrowserSession({
    browser_profile: new BrowserProfile({
      allowed_domains: ['*.example.com'],
    }),
  }),
});

See Security Guide for production deployment best practices.

📚 Documentation

| Document | Description | | ---------------------------------------- | ------------------------------------ | | Quick Start | Get started in 5 minutes | | Architecture | System design and component overview | | API Reference | Complete API documentation | | Configuration | All configuration options | | LLM Providers | Provider setup and comparison | | Actions | Built-in and custom actions | | MCP Server | MCP integration guide | | Security | Security best practices | | Examples | More code examples | | Contributing | Contribution guidelines |

🛠️ Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Lint & format
npm run lint
npm run prettier

# Type checking
npm run typecheck

# Run an example
npx tsx examples/simple-search.ts

Requirements

  • Node.js >= 18.0.0
  • LLM API Key — At least one supported provider
  • Playwright — Installed automatically as a dependency

📄 License

MIT © Web LLM