claude-web-client

v1.0.0

Published

3 months ago

Browser automation for AI assistants using Playwright's accessibility tree - giving Claude local browsing capabilities without vision models

0High
0Medium
0Low

noodleofdeath

claude ai accessibility playwright puppeteer browser-automation chromium web-scraping ai-agent llm anthropic claude-code a11y semantic-web

Claude Web Client

A professional browser automation tool that gives Claude AI local web browsing capabilities through code, not vision. Built on Playwright and Puppeteer, it provides both a CLI interface for AI assistants and a programmatic API for JavaScript/TypeScript applications.

Key Innovation: Uses Playwright's accessibility tree to bridge the gap between visual UIs and text-based AI - giving Claude a semantic understanding of web pages without needing vision capabilities!

⚠️ Important: What This Is (And Isn't)

✅ What This IS:

Cost-effective web interaction - Uses local compute + text-based APIs instead of expensive vision models
Code-native browsing - Playwright returns semantic page structure as code/JSON that Claude natively understands
Semantic layout understanding - Claude can understand element roles, hierarchy, and interactions through accessibility tree
Text and structure extraction - Perfect for scraping, form filling, navigation, data extraction
Professional automation - Built on battle-tested tools (Playwright/Puppeteer) used by thousands of companies

❌ What This ISN'T:

NOT multimodal vision browsing - Claude cannot "see" images, charts, or visual layouts
NOT for visual content - Cannot describe what images look like or understand visual design
NOT pixel-perfect understanding - Works with semantic structure, not visual appearance
NOT a replacement for vision models - For visual tasks, you still need Claude with vision capabilities

🎯 Best Use Cases:

Web scraping and data extraction
Form automation and testing
Navigation and interaction with web apps
Reading text content from pages
Monitoring websites for changes
E2E testing without visual validation

💡 Cost Reduction Strategy:

Instead of: Screenshot → Vision Model → $$$ We use: Accessibility Tree → Text Tokens → 10-100x cheaper

This is about making web automation cost-effective by using professional tools that return code, Claude's native language.

Features

🤖 CLI for AI assistants - Simple commands Claude can execute
📦 Programmatic API - Import and use in your own scripts
🧠 Accessibility Tree - Playwright's semantic page representation for LLM understanding
🎭 Dual Engine Support - Use Playwright (recommended) or Puppeteer
🔒 Type-safe - Full TypeScript support with type definitions
🚀 Easy to use - Works with both JavaScript and TypeScript
🌐 Full browser control - Navigate, interact, execute JS, take screenshots
📸 Screenshot support - Capture full pages or viewports
🎯 Element interaction - Click, type, and wait for elements
📄 Content extraction - Get HTML, text, or structured accessibility data

Quick Start

New to using this with Claude? → Read the Claude Usage Guide

Installation

Local Installation

npm install

Global Installation

npm install -g .

After global installation, you can use the claude-web-client command from anywhere:

claude-web-client launch
claude-web-client navigate https://example.com
claude-web-client text

Usage

claude-web-client can be used in two ways:

CLI Mode - For AI assistants like Claude to execute browser commands
Programmatic Mode - Import as a library in your JavaScript/TypeScript projects

🤖 CLI Usage (For Claude AI)

The CLI provides various commands to interact with the browser:

Launch Browser

node bin/cli.js launch          # Launch in headless mode
node bin/cli.js launch --no-headless  # Launch with visible browser

Navigate to URL

node bin/cli.js navigate https://example.com

Take Screenshot

node bin/cli.js screenshot screenshot.png
node bin/cli.js screenshot --type jpeg output.jpg

Get Page Content

node bin/cli.js content   # Get HTML content
node bin/cli.js text      # Get text content only

Execute JavaScript

node bin/cli.js execute "document.title"
node bin/cli.js execute "document.querySelectorAll('a').length"

Interact with Elements

node bin/cli.js click "#submit-button"
node bin/cli.js type "#search-input" "search query"
node bin/cli.js wait ".loading-complete"

Get Page Info

node bin/cli.js info

Close Browser

node bin/cli.js close

Example: Claude Browsing the Web

When I (Claude) need to browse the web, I can use these commands:

# Launch browser
claude-web-client launch

# Navigate to a page
claude-web-client navigate https://news.ycombinator.com

# Get the page text to read it
claude-web-client text

# Execute JavaScript to extract specific data
claude-web-client execute "Array.from(document.querySelectorAll('.titleline > a')).slice(0,5).map(a => a.textContent)"

# Take a screenshot
claude-web-client screenshot hn-screenshot.png

# Close when done
claude-web-client close

Or run the demo script:

bash examples/claude-demo.sh

📦 Programmatic Usage (For Scripts)

Import BrowserService into your JavaScript or TypeScript projects:

JavaScript Example (Recommended: Playwright)

import { UnifiedBrowser } from 'claude-web-client';

const browser = new UnifiedBrowser('playwright'); // or 'puppeteer'

// Launch and navigate
await browser.launch({ headless: true });
await browser.navigate('https://example.com');

// Get accessibility tree - semantic page representation for AI!
const { tree } = await browser.getAccessibilityTree();
console.log('Page structure:', tree);

// Get page content
const { text } = await browser.getText();
console.log(text);

// Execute JavaScript
const result = await browser.execute('document.title');
console.log(result.result);

// Take screenshot
await browser.screenshot('./screenshot.png');

// Clean up
await browser.close();

Legacy Puppeteer Example

import { BrowserService } from 'claude-web-client';

const browser = new BrowserService();
await browser.launch();
// ... same API as above (but no accessibility tree)
await browser.close();

TypeScript Example

import { BrowserService, NavigateResult } from 'claude-web-client';

async function scrapeWebsite(url: string): Promise<string> {
  const browser = new BrowserService();

  try {
    await browser.launch({ headless: true });

    const navResult: NavigateResult = await browser.navigate(url);

    if (navResult.success) {
      const { text } = await browser.getText();
      return text || '';
    }

    throw new Error(navResult.error);
  } finally {
    await browser.close();
  }
}

Run the Examples

# Basic usage example (Puppeteer)
node examples/basic-usage.js

# Form interaction example (Puppeteer)
node examples/form-interaction.js "Claude AI"

# 🌟 Playwright with accessibility tree - THE GAME CHANGER
node examples/playwright-accessibility.js

# TypeScript example (requires ts-node)
npx ts-node examples/typescript-example.ts

🧠 Why Accessibility Tree Matters for AI

Traditional browser automation requires AI to either:

See screenshots (requires vision models, expensive, slow)
Parse raw HTML (messy, includes styles/scripts, hard to understand)
Rely on selectors (brittle, requires knowing page structure)

Playwright's Accessibility Tree provides a better way:

✅ Semantic structure - Clean representation of what's actually on the page
✅ Text-based - Perfect for LLMs like Claude that work with text/code
✅ Fast - No image processing needed
✅ Reliable - Based on ARIA and semantic HTML
✅ Actionable - Includes roles, names, and values of interactive elements

What Claude CAN do with the accessibility tree:

✅ Understand semantic structure (headings, buttons, links, forms)
✅ Read all text content and labels
✅ Interpret element roles and relationships
✅ Find interactive elements by purpose
✅ Navigate and interact with forms
✅ Understand general page layout hierarchy

What Claude CANNOT do:

❌ See images or describe visual content
❌ Understand colors, fonts, or visual styling
❌ Detect visual layouts or spatial positioning
❌ Read text embedded in images
❌ Understand charts, graphs, or diagrams visually

The key insight: Claude understands the semantic meaning of page elements, not their visual appearance.

Example accessibility tree output:

{
  "role": "WebArea",
  "name": "Example Page",
  "children": [
    {
      "role": "heading",
      "name": "Welcome",
      "level": 1
    },
    {
      "role": "button",
      "name": "Click me",
      "focused": false
    }
  ]
}

Claude can read this and understand "there's a heading that says Welcome and a button I can click"!

📚 API Reference

BrowserService Methods

All methods return promises with result objects:

| Method | Parameters | Returns | Description | |--------|-----------|---------|-------------| | launch(options?) | LaunchOptions | ActionResult | Launch browser instance | | navigate(url) | string | NavigateResult | Navigate to URL | | screenshot(path?, options?) | string?, ScreenshotOptions? | ScreenshotResult | Take screenshot | | getContent() | - | ContentResult | Get page HTML | | getText() | - | TextResult | Get page text | | execute(script) | string | ExecuteResult | Execute JavaScript | | click(selector) | string | ActionResult | Click element | | type(selector, text) | string, string | ActionResult | Type into input | | waitForSelector(selector, timeout?) | string, number? | ActionResult | Wait for element | | getPageInfo() | - | PageInfo | Get page title/URL | | close() | - | ActionResult | Close browser |

See src/browser-service.d.ts for full TypeScript definitions.

🏗 Architecture

claude-web-client/
├── src/
│   ├── browser-service.js      # Core Puppeteer wrapper
│   ├── browser-service.d.ts    # TypeScript definitions
│   └── index.js                # Public API exports
├── bin/
│   └── cli.js                  # CLI interface
└── examples/
    ├── basic-usage.js          # Basic JS example
    ├── form-interaction.js     # Form interaction example
    ├── typescript-example.ts   # TypeScript example
    └── claude-demo.sh          # CLI demo for Claude

💰 Cost Analysis: Why This Matters

Running Costs Comparison

| Solution | Where It Runs | Cost per 1000 Interactions | Speed | |----------|--------------|---------------------------|-------| | Anthropic Web Search | Cloud | ~$5-10 (vision model calls) | Moderate | | Vision-based browsing | Cloud | ~$10-20 (screenshot uploads) | Slow | | This Tool (Local) | Your machine | $0.10-0.50 (text tokens only) | Fast | | This Tool (EC2 t3.micro) | AWS | ~$0.01/hour + tokens | Fast |

The Economics

Traditional approach:

Screenshot: 1000+ image tokens (~$0.01-0.02 each)
Vision processing: Slow (upload + process)
Data leaves your machine

Our approach:

Accessibility tree: ~100 text tokens (~$0.0001)
Text processing: Fast (local compute)
Data stays local

Real-world example:

100 web interactions/day
Traditional: ~$500-1000/month
This tool: $5-10/month (100x cheaper!)

Deployment Options

Local (Best for development)
- Install on your machine
- Zero compute cost
- Full privacy
EC2 Instance (Best for production)
- t3.micro (~$7/month)
- Run 24/7 for your team
- Still 10-50x cheaper than cloud browsing
Docker Container
- Deploy anywhere
- Scale as needed
- Predictable costs

Community-Driven Battle Testing

This is open source because:

Community contributions make it better
Battle-tested by real users
Faster bug fixes and features
Cost savings benefit everyone

As more people use and contribute:

More edge cases handled
Better reliability
More examples and use cases
Becomes the standard for AI web automation

The Vision: Make web browsing for AI assistants accessible and affordable for everyone, not just those who can afford expensive cloud solutions.

🚀 Future Enhancements

[ ] Multiple page/tab support
[ ] Session persistence
[ ] Cookie management
[ ] Network request interception
[ ] PDF generation
[ ] Mobile device emulation
[ ] Performance metrics collection
[ ] Proxy support
[ ] Custom headers and user agents

License

MIT