claude-web-client
v1.0.0
Published
Browser automation for AI assistants using Playwright's accessibility tree - giving Claude local browsing capabilities without vision models
Maintainers
Readme
Claude Web Client
A professional browser automation tool that gives Claude AI local web browsing capabilities through code, not vision. Built on Playwright and Puppeteer, it provides both a CLI interface for AI assistants and a programmatic API for JavaScript/TypeScript applications.
Key Innovation: Uses Playwright's accessibility tree to bridge the gap between visual UIs and text-based AI - giving Claude a semantic understanding of web pages without needing vision capabilities!
⚠️ Important: What This Is (And Isn't)
✅ What This IS:
- Cost-effective web interaction - Uses local compute + text-based APIs instead of expensive vision models
- Code-native browsing - Playwright returns semantic page structure as code/JSON that Claude natively understands
- Semantic layout understanding - Claude can understand element roles, hierarchy, and interactions through accessibility tree
- Text and structure extraction - Perfect for scraping, form filling, navigation, data extraction
- Professional automation - Built on battle-tested tools (Playwright/Puppeteer) used by thousands of companies
❌ What This ISN'T:
- NOT multimodal vision browsing - Claude cannot "see" images, charts, or visual layouts
- NOT for visual content - Cannot describe what images look like or understand visual design
- NOT pixel-perfect understanding - Works with semantic structure, not visual appearance
- NOT a replacement for vision models - For visual tasks, you still need Claude with vision capabilities
🎯 Best Use Cases:
- Web scraping and data extraction
- Form automation and testing
- Navigation and interaction with web apps
- Reading text content from pages
- Monitoring websites for changes
- E2E testing without visual validation
💡 Cost Reduction Strategy:
Instead of: Screenshot → Vision Model → $$$
We use: Accessibility Tree → Text Tokens → 10-100x cheaper
This is about making web automation cost-effective by using professional tools that return code, Claude's native language.
Features
- 🤖 CLI for AI assistants - Simple commands Claude can execute
- 📦 Programmatic API - Import and use in your own scripts
- 🧠 Accessibility Tree - Playwright's semantic page representation for LLM understanding
- 🎭 Dual Engine Support - Use Playwright (recommended) or Puppeteer
- 🔒 Type-safe - Full TypeScript support with type definitions
- 🚀 Easy to use - Works with both JavaScript and TypeScript
- 🌐 Full browser control - Navigate, interact, execute JS, take screenshots
- 📸 Screenshot support - Capture full pages or viewports
- 🎯 Element interaction - Click, type, and wait for elements
- 📄 Content extraction - Get HTML, text, or structured accessibility data
Quick Start
New to using this with Claude? → Read the Claude Usage Guide
Installation
Local Installation
npm installGlobal Installation
npm install -g .After global installation, you can use the claude-web-client command from anywhere:
claude-web-client launch
claude-web-client navigate https://example.com
claude-web-client textUsage
claude-web-client can be used in two ways:
- CLI Mode - For AI assistants like Claude to execute browser commands
- Programmatic Mode - Import as a library in your JavaScript/TypeScript projects
🤖 CLI Usage (For Claude AI)
The CLI provides various commands to interact with the browser:
Launch Browser
node bin/cli.js launch # Launch in headless mode
node bin/cli.js launch --no-headless # Launch with visible browserNavigate to URL
node bin/cli.js navigate https://example.comTake Screenshot
node bin/cli.js screenshot screenshot.png
node bin/cli.js screenshot --type jpeg output.jpgGet Page Content
node bin/cli.js content # Get HTML content
node bin/cli.js text # Get text content onlyExecute JavaScript
node bin/cli.js execute "document.title"
node bin/cli.js execute "document.querySelectorAll('a').length"Interact with Elements
node bin/cli.js click "#submit-button"
node bin/cli.js type "#search-input" "search query"
node bin/cli.js wait ".loading-complete"Get Page Info
node bin/cli.js infoClose Browser
node bin/cli.js closeExample: Claude Browsing the Web
When I (Claude) need to browse the web, I can use these commands:
# Launch browser
claude-web-client launch
# Navigate to a page
claude-web-client navigate https://news.ycombinator.com
# Get the page text to read it
claude-web-client text
# Execute JavaScript to extract specific data
claude-web-client execute "Array.from(document.querySelectorAll('.titleline > a')).slice(0,5).map(a => a.textContent)"
# Take a screenshot
claude-web-client screenshot hn-screenshot.png
# Close when done
claude-web-client closeOr run the demo script:
bash examples/claude-demo.sh📦 Programmatic Usage (For Scripts)
Import BrowserService into your JavaScript or TypeScript projects:
JavaScript Example (Recommended: Playwright)
import { UnifiedBrowser } from 'claude-web-client';
const browser = new UnifiedBrowser('playwright'); // or 'puppeteer'
// Launch and navigate
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Get accessibility tree - semantic page representation for AI!
const { tree } = await browser.getAccessibilityTree();
console.log('Page structure:', tree);
// Get page content
const { text } = await browser.getText();
console.log(text);
// Execute JavaScript
const result = await browser.execute('document.title');
console.log(result.result);
// Take screenshot
await browser.screenshot('./screenshot.png');
// Clean up
await browser.close();Legacy Puppeteer Example
import { BrowserService } from 'claude-web-client';
const browser = new BrowserService();
await browser.launch();
// ... same API as above (but no accessibility tree)
await browser.close();TypeScript Example
import { BrowserService, NavigateResult } from 'claude-web-client';
async function scrapeWebsite(url: string): Promise<string> {
const browser = new BrowserService();
try {
await browser.launch({ headless: true });
const navResult: NavigateResult = await browser.navigate(url);
if (navResult.success) {
const { text } = await browser.getText();
return text || '';
}
throw new Error(navResult.error);
} finally {
await browser.close();
}
}Run the Examples
# Basic usage example (Puppeteer)
node examples/basic-usage.js
# Form interaction example (Puppeteer)
node examples/form-interaction.js "Claude AI"
# 🌟 Playwright with accessibility tree - THE GAME CHANGER
node examples/playwright-accessibility.js
# TypeScript example (requires ts-node)
npx ts-node examples/typescript-example.ts🧠 Why Accessibility Tree Matters for AI
Traditional browser automation requires AI to either:
- See screenshots (requires vision models, expensive, slow)
- Parse raw HTML (messy, includes styles/scripts, hard to understand)
- Rely on selectors (brittle, requires knowing page structure)
Playwright's Accessibility Tree provides a better way:
- ✅ Semantic structure - Clean representation of what's actually on the page
- ✅ Text-based - Perfect for LLMs like Claude that work with text/code
- ✅ Fast - No image processing needed
- ✅ Reliable - Based on ARIA and semantic HTML
- ✅ Actionable - Includes roles, names, and values of interactive elements
What Claude CAN do with the accessibility tree:
- ✅ Understand semantic structure (headings, buttons, links, forms)
- ✅ Read all text content and labels
- ✅ Interpret element roles and relationships
- ✅ Find interactive elements by purpose
- ✅ Navigate and interact with forms
- ✅ Understand general page layout hierarchy
What Claude CANNOT do:
- ❌ See images or describe visual content
- ❌ Understand colors, fonts, or visual styling
- ❌ Detect visual layouts or spatial positioning
- ❌ Read text embedded in images
- ❌ Understand charts, graphs, or diagrams visually
The key insight: Claude understands the semantic meaning of page elements, not their visual appearance.
Example accessibility tree output:
{
"role": "WebArea",
"name": "Example Page",
"children": [
{
"role": "heading",
"name": "Welcome",
"level": 1
},
{
"role": "button",
"name": "Click me",
"focused": false
}
]
}Claude can read this and understand "there's a heading that says Welcome and a button I can click"!
📚 API Reference
BrowserService Methods
All methods return promises with result objects:
| Method | Parameters | Returns | Description |
|--------|-----------|---------|-------------|
| launch(options?) | LaunchOptions | ActionResult | Launch browser instance |
| navigate(url) | string | NavigateResult | Navigate to URL |
| screenshot(path?, options?) | string?, ScreenshotOptions? | ScreenshotResult | Take screenshot |
| getContent() | - | ContentResult | Get page HTML |
| getText() | - | TextResult | Get page text |
| execute(script) | string | ExecuteResult | Execute JavaScript |
| click(selector) | string | ActionResult | Click element |
| type(selector, text) | string, string | ActionResult | Type into input |
| waitForSelector(selector, timeout?) | string, number? | ActionResult | Wait for element |
| getPageInfo() | - | PageInfo | Get page title/URL |
| close() | - | ActionResult | Close browser |
See src/browser-service.d.ts for full TypeScript definitions.
🏗 Architecture
claude-web-client/
├── src/
│ ├── browser-service.js # Core Puppeteer wrapper
│ ├── browser-service.d.ts # TypeScript definitions
│ └── index.js # Public API exports
├── bin/
│ └── cli.js # CLI interface
└── examples/
├── basic-usage.js # Basic JS example
├── form-interaction.js # Form interaction example
├── typescript-example.ts # TypeScript example
└── claude-demo.sh # CLI demo for Claude💰 Cost Analysis: Why This Matters
Running Costs Comparison
| Solution | Where It Runs | Cost per 1000 Interactions | Speed | |----------|--------------|---------------------------|-------| | Anthropic Web Search | Cloud | ~$5-10 (vision model calls) | Moderate | | Vision-based browsing | Cloud | ~$10-20 (screenshot uploads) | Slow | | This Tool (Local) | Your machine | $0.10-0.50 (text tokens only) | Fast | | This Tool (EC2 t3.micro) | AWS | ~$0.01/hour + tokens | Fast |
The Economics
Traditional approach:
- Screenshot: 1000+ image tokens (~$0.01-0.02 each)
- Vision processing: Slow (upload + process)
- Data leaves your machine
Our approach:
- Accessibility tree: ~100 text tokens (~$0.0001)
- Text processing: Fast (local compute)
- Data stays local
Real-world example:
- 100 web interactions/day
- Traditional: ~$500-1000/month
- This tool: $5-10/month (100x cheaper!)
Deployment Options
Local (Best for development)
- Install on your machine
- Zero compute cost
- Full privacy
EC2 Instance (Best for production)
- t3.micro (~$7/month)
- Run 24/7 for your team
- Still 10-50x cheaper than cloud browsing
Docker Container
- Deploy anywhere
- Scale as needed
- Predictable costs
Community-Driven Battle Testing
This is open source because:
- Community contributions make it better
- Battle-tested by real users
- Faster bug fixes and features
- Cost savings benefit everyone
As more people use and contribute:
- More edge cases handled
- Better reliability
- More examples and use cases
- Becomes the standard for AI web automation
The Vision: Make web browsing for AI assistants accessible and affordable for everyone, not just those who can afford expensive cloud solutions.
🚀 Future Enhancements
- [ ] Multiple page/tab support
- [ ] Session persistence
- [ ] Cookie management
- [ ] Network request interception
- [ ] PDF generation
- [ ] Mobile device emulation
- [ ] Performance metrics collection
- [ ] Proxy support
- [ ] Custom headers and user agents
License
MIT
