better-playwright-mcp2
v3.0.1
Published
Better Playwright MCP v2 - Based on Microsoft's playwright-mcp with HTTP server
Maintainers
Readme
better-playwright-mcp
A better Playwright MCP (Model Context Protocol) server that uses a client-server architecture for browser automation.
Why Better?
Traditional browser automation tools send entire page HTML to AI assistants, which quickly exhausts token limits and makes complex web interactions impractical. better-playwright-mcp solves this with an innovative semantic snapshot algorithm that reduces page content by up to 90% while preserving all meaningful elements.
The Problem
- Full page HTML often exceeds 100K+ tokens
- Most HTML is noise: inline styles, tracking scripts, invisible elements
- AI assistants have limited context windows (even with 200K limits)
- Complex web automation becomes impossible after just a few page loads
Our Solution: Semantic Snapshots
Our core innovation is a multi-stage pruning algorithm that:
- Identifies meaningful elements - Interactive elements (buttons, inputs), semantic HTML5 tags, and text-containing elements
- Generates unique identifiers - Each element gets a hash-based
xpattribute derived from its XPath for precise targeting - Removes invisible content - Elements with
display:none, zero dimensions, or hidden parents are marked and removed - Unwraps useless wrappers - Eliminates divs and spans that only wrap other elements
- Strips unnecessary attributes - Keeps only essential attributes like
href,value,placeholder
Result: A clean, semantic representation that typically uses only 10% of the original tokens while maintaining full functionality.
Architecture
This project implements a unique two-tier architecture:
- MCP Server - Communicates with AI assistants via Model Context Protocol
- HTTP Server - Runs in the background to control the actual browser instances
AI Assistant <--[MCP Protocol]--> MCP Server <--[HTTP]--> HTTP Server <---> BrowserThis design allows the MCP server to remain lightweight while delegating browser control to a dedicated HTTP service.
Features
- 🎯 90% token reduction through semantic HTML snapshots
- 🎭 Full Playwright browser automation via MCP
- 🏗️ Client-server architecture for better separation of concerns
- 🛡️ Stealth mode to avoid detection
- 📍 Hash-based element identifiers for precise targeting
- 💾 Persistent browser profiles
- 🚀 Optimized for long-running automation tasks
- 📊 Token-aware output with automatic truncation
- 📄 Save processed HTML to files for external processing
Installation
Global Installation (for CLI usage)
npm install -g better-playwright-mcpLocal Installation (for SDK usage)
npm install better-playwright-mcpUsage
As a JavaScript/TypeScript SDK
You can use the PlaywrightClient SDK programmatically in your Node.js applications:
Prerequisites:
First, start the HTTP server:
npx better-playwright-mcp@latest serverThen use the SDK in your code:
import { PlaywrightClient } from 'better-playwright-mcp';
async function automateWebPage() {
// Connect to the HTTP server (must be running)
const client = new PlaywrightClient('http://localhost:3102');
// Create a page
const { pageId, snapshot } = await client.createPage(
'my-page', // page name
'Test page', // description
'https://example.com' // URL
);
// Save the processed HTML to a file
const result = await client.pageToHtmlFile(pageId); // trim: true by default
console.log('HTML saved to:', result.filePath);
// Returns: { filePath: "/tmp/page-abc123.html", fileSize: 12345, trimmed: true, ... }
// Save original HTML without trimming
const resultNoTrim = await client.pageToHtmlFile(pageId, false);
// Returns original HTML without redundant element removal
// Get accessibility tree snapshot
const accessibilitySnapshot = await client.getAccessibilitySnapshot(pageId);
console.log('Accessibility tree:', accessibilitySnapshot.data);
// Returns accessibility tree with roles, names, and hierarchy
// Get only interesting nodes (default)
const snapshot1 = await client.getAccessibilitySnapshot(pageId, { interestingOnly: true });
// Get full accessibility tree
const snapshot2 = await client.getAccessibilitySnapshot(pageId, { interestingOnly: false });
// Get a semantic snapshot (with xp references)
const snapshot = await client.getPageSnapshot(pageId);
console.log(snapshot);
// Returns simplified HTML like:
// div xp=6204242d
// h1 xp=3fed137b Example Domain
// p xp=070e2633 This domain is for use...
// Interact with the page using xp references from snapshot
await client.browserClick(pageId, '3fed137b'); // Click the h1 element
await client.browserType(pageId, '070e2633', 'Hello World', true); // Type and submit
// Take screenshots
const screenshot = await client.getScreenshot(pageId, { fullPage: true });
// Clean up
await client.closePage(pageId);
}Available Methods:
- Page Management:
createPage,closePage,listPages,activatePage - Navigation:
browserNavigate,browserNavigateBack,browserNavigateForward - Interaction:
browserClick,browserType,browserHover,browserSelectOption - Snapshots:
getPageSnapshot,getAccessibilitySnapshot,pageToHtmlFile,getScreenshot,getPDFSnapshot - Utilities:
waitForTimeout,waitForSelector,scrollToBottom,scrollToTop
Default Mode (MCP)
The MCP server requires an HTTP server to be running. You need to start both:
Step 1: Start the HTTP server
npx better-playwright-mcp@latest serverStep 2: In another terminal, start the MCP server
npx better-playwright-mcp@latestThe MCP server will:
- Start listening on stdio for MCP protocol messages
- Connect to the HTTP server on port 3102
- Route browser automation commands through the HTTP server
Options:
--snapshot-dir <path>- Directory to save snapshots
Standalone HTTP Server Mode
You can also run the HTTP server independently (useful for debugging or custom integrations):
npx better-playwright-mcp@latest serverOptions:
-p, --port <number>- Server port (default: 3102)--host <string>- Server host (default: localhost)--headless- Run browser in headless mode--chromium- Use Chromium instead of Chrome--no-user-profile- Do not use persistent user profile--user-data-dir <path>- User data directory--snapshot-dir <path>- Directory to save snapshots
MCP Tools
When used with AI assistants, the following tools are available:
Page Management
createPage- Create a new browser page with name and descriptionactivatePage- Activate a specific page by IDclosePage- Close a specific pagelistPages- List all managed pages with titles and URLscloseAllPages- Close all managed pageslistPagesWithoutId- List unmanaged browser pagesclosePagesWithoutId- Close all unmanaged pagesclosePageByIndex- Close page by index
Browser Actions
browserClick- Click an element using itsxpidentifierbrowserType- Type text into an elementbrowserHover- Hover over an elementbrowserSelectOption- Select options in a dropdownbrowserPressKey- Press keyboard keysbrowserFileUpload- Upload files to file inputbrowserHandleDialog- Handle browser dialogs (alert, confirm, prompt)browserNavigate- Navigate to a URLbrowserNavigateBack- Go back to previous pagebrowserNavigateForward- Go forward to next pagescrollToBottom- Scroll to bottom of page/elementscrollToTop- Scroll to top of page/elementwaitForTimeout- Wait for specified millisecondswaitForSelector- Wait for element to appear
Snapshot & Utilities
getPageSnapshot- Get semantic HTML snapshot withxpidentifiersgetAccessibilitySnapshot- Get accessibility tree snapshot of the pagegetScreenshot- Take a screenshot (PNG/JPEG)getPDFSnapshot- Generate PDF of the pagegetElementHTML- Get HTML of specific elementpageToHtmlFile- Save processed page HTML to temporary filedownloadImage- Download image from URLcaptureSnapshot- Capture full page with automatic scrolling
How It Works
Semantic Snapshots in Action
Before (original HTML):
<div class="wrapper" style="padding: 20px; margin: 10px;">
<div class="container">
<div class="inner">
<button class="btn btn-primary" onclick="handleClick()"
style="background: blue; color: white;">
Click me
</button>
</div>
</div>
</div>After (semantic snapshot):
button xp=3fa2b8c1 Click meThe algorithm:
- Removes unnecessary wrapper divs
- Strips inline styles and event handlers
- Adds unique identifier (
xpattribute) - a hash of the element's XPath - Preserves only meaningful content
Diff-Based Optimization
To reduce data transfer and token usage:
- First snapshot is always complete
- Subsequent snapshots only include changes (diffs)
- Automatic caching for performance
Stealth Features
Browser instances are configured with:
- Custom user agent strings
- Disabled automation indicators
- WebGL vendor spoofing
- Canvas fingerprint protection
Examples
Creating and Navigating Pages
// MCP Tool Usage
{
"tool": "createPage",
"arguments": {
"name": "shopping",
"description": "Amazon shopping page",
"url": "https://amazon.com"
}
}
// Returns: { pageId: "uuid", snapshot: "..." }Interacting with Elements
// Click on element using its xp identifier
{
"tool": "browserClick",
"arguments": {
"pageId": "uuid",
"ref": "3fa2b8c1" // The xp attribute value from snapshot
}
}
// Type text into input field
{
"tool": "browserType",
"arguments": {
"pageId": "uuid",
"ref": "xp456",
"text": "search query",
"submit": true // Press Enter after typing
}
}Capturing Page State
// Get semantic snapshot
{
"tool": "getPageSnapshot",
"arguments": {
"pageId": "uuid"
}
}
// Take screenshot
{
"tool": "getScreenshot",
"arguments": {
"pageId": "uuid",
"fullPage": true,
"type": "png"
}
}
// Save processed HTML to file
{
"tool": "pageToHtmlFile",
"arguments": {
"pageId": "uuid",
"trim": true // Optional, default: true (removes redundant elements)
}
}
// Returns: { filePath: "/tmp/page-abc123.html", fileSize: 12345, trimmed: true, ... }Development
Prerequisites
- Node.js >= 18.0.0
- TypeScript
- Chrome or Chromium browser
Building from Source
# Clone the repository
git clone https://github.com/yourusername/better-playwright-mcp.git
cd better-playwright-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Run in development mode
npm run devProject Structure
better-playwright-mcp/
├── src/
│ ├── index.ts # MCP mode entry point
│ ├── server.ts # HTTP server mode entry point
│ ├── playwright-mcp.ts # MCP server implementation
│ ├── client/
│ │ └── playwright-client.ts # HTTP client for MCP→HTTP communication
│ ├── server/
│ │ └── playwright-server.ts # HTTP server controlling browsers
│ ├── extractor/
│ │ ├── parse2.ts # HTML parsing with xp identifier generation
│ │ ├── simplify-html.ts # HTML simplification
│ │ └── utils.ts # Extraction utilities
│ └── utils/
│ └── token-limiter.ts # Token counting and limiting
├── bin/
│ └── cli.js # CLI entry point
├── package.json
├── tsconfig.json
├── CLAUDE.md # Instructions for AI assistants
└── README.mdTroubleshooting
Common Issues
MCP server not connecting
- Ensure the HTTP server is accessible on port 3102
- Check firewall settings
- Try running with
DEBUG=* npx better-playwright-mcp
Browser not launching
- Ensure Chrome or Chromium is installed
- Try using
--chromiumflag - Check system resources
Token limit exceeded
- Snapshots are automatically truncated to 20,000 tokens
- Use targeted selectors to reduce snapshot size
- Consider using screenshot instead of snapshot for visual inspection
Debug Mode
Enable detailed logging:
DEBUG=* npx better-playwright-mcpLogs and Records
Operation records are saved to:
- macOS/Linux:
/tmp/playwright-records/ - Windows:
%TEMP%\playwright-records\
Each page has its own directory with timestamped operation logs.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT
