selenium-ai-agent
v3.0.0
Published
AI-powered Selenium MCP server for browser automation — 74 tools with accessibility tree discovery, test generation, self-healing, and Selenium Grid parallel execution for AI agents
Downloads
2,503
Maintainers
Readme
selenium-ai-agent
AI-powered Selenium MCP server for browser automation — 75 tools with accessibility tree discovery, selector teaching, BiDi cross-browser support, Selenium Grid parallel execution, test generation & self-healing pipeline, and session tracing.
One-Click Install
Install
npm install -g selenium-ai-agentOr run directly without installing:
npx selenium-ai-agentRequirements
- Node.js 18+
- Chrome browser (or Firefox/Edge)
- ChromeDriver is automatically managed by selenium-webdriver
Quick Start
Add to your MCP client config:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}Then ask your AI assistant: "Navigate to https://example.com and take a screenshot"
Client Setup
Claude Code
claude mcp add selenium-mcp -- npx selenium-ai-agentOr add to your project .mcp.json:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}Config paths per OS:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Cursor
Add to .cursor/mcp.json (project) or ~/.cursor/mcp.json (global):
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}GitHub Copilot (VS Code 1.99+)
Add to .vscode/mcp.json:
{
"servers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"],
"type": "stdio"
}
}
}Note: Copilot uses
"servers"instead of"mcpServers".
Cline
Open the MCP Servers panel in Cline, click Configure, then Advanced MCP Settings, and add:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}Windsurf
Add to ~/.codeium/windsurf/mcp_config.json (global) or .windsurf/mcp_config.json (project):
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}Tool Auto-Approval (Reducing "Yes" Prompts)
All tools include MCP annotations (readOnlyHint, destructiveHint, etc.) that help clients auto-approve safe tools. Read-only tools like capture_page, recording_status, and grid_status are marked as non-destructive and can be auto-approved by clients that support annotations.
Claude Desktop
After the first approval, click "Always allow" for each tool to stop future prompts. Tools marked readOnlyHint: true may be auto-approved by the client.
Claude Code
Use --allow-mcp selenium-mcp to pre-approve all tools from this server:
claude --allow-mcp selenium-mcpOr configure in .claude/settings.json:
{
"permissions": {
"allow": ["mcp__selenium-mcp__*"]
}
}Cursor / Cline / Windsurf
These clients typically allow you to configure auto-approval per tool or per server in their settings. Check your client's MCP settings for "auto-approve" or "always allow" options.
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| SELENIUM_GRID_URL | — | Grid hub URL (enables parallel features) |
| SELENIUM_BROWSER | chrome | Browser to use (chrome, firefox, edge) |
| SELENIUM_HEADLESS | false | Run browser in headless mode |
| SELENIUM_STEALTH | false | Enable stealth mode (hide automation indicators) |
| SELENIUM_MCP_OUTPUT_MODE | stdout | Output mode: stdout (return data to LLM) or file (save to disk) |
| SELENIUM_MCP_OUTPUT_DIR | auto | Output directory for generated files (auto-detected from project root) |
| SELENIUM_MCP_SAVE_TRACE | false | Save session trace JSON to <output>/traces/ |
| SELENIUM_MCP_UNRESTRICTED_FILES | false | Bypass workspace path validation (allow writing outside output dir) |
| SE_AVOID_STATS | — | Set to true to disable Selenium usage statistics |
Pass env vars in your MCP config:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"],
"env": {
"SELENIUM_HEADLESS": "true",
"SELENIUM_STEALTH": "true",
"SE_AVOID_STATS": "true"
}
}
}
}CLI Flags
npx selenium-ai-agent [flags]| Flag | Description |
|------|-------------|
| --stealth | Enable stealth mode |
| --headless | Run browser headless |
| --save-trace | Save session trace JSON |
| --output-mode=stdout\|file | Set output mode |
| --output-dir=<path> | Set output directory |
| --grid-url=<url> | Set Selenium Grid hub URL |
| --allow-unrestricted-file-access | Bypass workspace file path validation |
Tools (75)
Navigation (5)
| Tool | Description |
|------|-------------|
| navigate_to | Navigate the browser to a URL. Starts browser automatically if not running. |
| go_back | Navigate back in browser history. |
| go_forward | Navigate forward in browser history. |
| refresh_page | Refresh the current page. |
| scroll_page | Scroll the page in a direction (up/down/left/right) by pixel amount, or scroll a specific element into view by CSS selector. |
Page Analysis (2)
| Tool | Description |
|------|-------------|
| capture_page | Capture the current page state as an accessibility tree — returns elements with ARIA roles, semantic hierarchy, and refs (e1, e2, ...). Discovers up to 300 elements with visibility-aware selectors, ancestor scoping, and Shadow DOM traversal. Read-only. |
| take_screenshot | Take a screenshot (viewport, full-page, or element). Uses BiDi when available for full-page/element screenshots, falls back to classic API. Params: origin (viewport/document), ref (element), format (png/jpeg), quality. |
Elements (5)
| Tool | Description |
|------|-------------|
| click_element | Click an element using its ref from the page snapshot. |
| hover_element | Hover over an element using its ref. |
| select_option | Select a dropdown option by value, text, or index. |
| drag_drop | Drag from one element to another using refs. |
| teach_selector | Teach the system a preferred CSS selector for an element. Saved as Phase 0 (highest priority) in future element discovery on that domain. Auto-scopes to site-wide for header/nav/footer elements, or path-specific for content. |
Input (3)
| Tool | Description |
|------|-------------|
| input_text | Type text into an input field or textarea. |
| key_press | Press a keyboard key, optionally with modifiers (ctrl, alt, shift, meta). |
| file_upload | Upload a file through a file input element. |
Mouse (3)
| Tool | Description |
|------|-------------|
| mouse_move | Move mouse to specific coordinates. |
| mouse_click | Click at coordinates with specified button (left, right, middle). |
| mouse_drag | Drag from one position to another. |
Tabs (4)
| Tool | Description |
|------|-------------|
| tab_list | List all open browser tabs with titles and URLs. Read-only. |
| tab_select | Switch to a specific browser tab. |
| tab_new | Open a new browser tab, optionally navigating to a URL. |
| tab_close | Close a specific browser tab. |
Verification (4)
| Tool | Description |
|------|-------------|
| verify_element_visible | Verify that an element is visible on the page (with timeout). Read-only. |
| verify_text_visible | Verify that specific text is visible on the page (with timeout). Read-only. |
| verify_value | Verify that an input element has the expected value. Read-only. |
| verify_list_visible | Verify that multiple text items are all visible on the page. Read-only. |
Browser (7)
| Tool | Description |
|------|-------------|
| wait_for | Wait for a condition: element visible, clickable, present, URL contains, or title contains. |
| execute_javascript | Execute JavaScript code in the browser context with optional return value. |
| resize_window | Resize the browser window to specified dimensions. |
| dialog_handle | Handle browser dialogs (alert, confirm, prompt). |
| console_logs | Get or clear browser console logs. Uses BiDi event collector when available for cross-browser support, falls back to classic log API. |
| network_monitor | Monitor network requests: get requests, clear, or toggle offline mode. |
| pdf_generate | Generate a PDF from the current page. Uses BiDi printPage for cross-browser support (Chrome, Firefox, Edge), falls back to CDP. Params: format, landscape, scale, pageRanges. Optional filePath — omit to return as base64 resource. |
Session (3)
| Tool | Description |
|------|-------------|
| close_browser | Close the browser and end the session. |
| reset_session | Reset the browser session (close and restart). |
| set_stealth_mode | Enable/disable stealth mode — hides navigator.webdriver, patches plugins, sets realistic languages. |
Recording (4)
| Tool | Description |
|------|-------------|
| start_recording | Start recording browser actions for test script generation. |
| stop_recording | Stop recording and return full action log with element locators and framework hint. |
| recording_status | Check if recording is active and show recent actions. Read-only. |
| clear_recording | Clear all recorded browser actions. |
Test Planner (3)
| Tool | Description |
|------|-------------|
| planner_setup_page | Initialize test planning — navigate to app and start exploring. |
| planner_explore_page | Explore a page in detail, discovering elements, forms, and links. |
| planner_save_plan | Save completed test plan to a markdown file. |
Test Generator (6)
| Tool | Description |
|------|-------------|
| generator_setup_page | Initialize test generation session — navigate to app, start recording, set framework. |
| generator_read_log | Retrieve the action log from the recording session. Read-only. |
| generator_write_test | Save generated test code and update .test-manifest.json. Supports verify (validates selectors against live page) and specFile (links to spec). |
| generator_write_seed | Write a seed/bootstrap test (auth, fixtures, env setup) and register in manifest under seedTests[]. |
| generator_save_spec | Save a structured markdown spec to <output>/specs/. |
| generator_read_spec | Read a spec file. Read-only. |
Test Healer (5)
| Tool | Description |
|------|-------------|
| healer_run_tests | Execute tests and return output. Supports manifest mode (reads .test-manifest.json) or explicit mode (provide command + args). Runs seed tests first when present. |
| healer_debug_test | Run a single test in verbose mode with detailed output (15KB stdout, 8KB stderr). |
| healer_fix_test | Apply a fix to a test file with .bak backup. Supports verify (validates selectors in fixed code). |
| healer_inspect_page | Inspect current page against expected locators — reports found, missing, and changed elements with suggested updated locators. Use after test failure to understand UI drift. |
| browser_generate_locator | Generate robust locator strategy for an element by description. Read-only. |
Regression Analyzer (6)
| Tool | Description |
|------|-------------|
| analyzer_setup | Initialize regression analysis session with product URL and business context. |
| analyzer_import_context | Import additional context from files, inline text, or URLs. |
| analyzer_scan_product | Explore product using process walking and page scanning. |
| analyzer_build_risk_profile | Build risk profile from discovered features and context. Read-only. |
| analyzer_save_profile | Save risk profile to YAML or JSON file. |
| analyzer_generate_documentation | Generate product discovery documentation with screenshots. |
Batch (1)
| Tool | Description |
|------|-------------|
| batch_execute | Execute up to 20 tool steps in a single round trip. Intermediate steps skip snapshots for speed. |
Grid Management (4)
| Tool | Description |
|------|-------------|
| grid_status | Check Grid status — nodes, browsers, capacity. Read-only. |
| grid_start | Start Selenium Grid via Docker Compose with configurable Chrome/Firefox node counts. |
| grid_stop | Stop Selenium Grid. |
| grid_scale | Scale Grid to desired number of browser nodes. |
Grid Sessions (5)
| Tool | Description |
|------|-------------|
| session_create | Create a new browser session on the Grid. |
| session_select | Select a grid session as active browser for all subsequent tool calls. |
| session_list | List all active Grid sessions, optionally filtered by tags. Read-only. |
| session_destroy | Destroy a specific Grid session. |
| session_destroy_all | Destroy all Grid sessions, optionally filtered by tags. |
Grid Parallel Execution (3)
| Tool | Description |
|------|-------------|
| parallel_explore | Explore multiple URLs in parallel — each target gets its own Grid session. |
| parallel_execute | Execute multiple task sequences in parallel across Grid sessions. |
| planner_generate_plan | Generate structured test plan from parallel exploration results. |
Grid Exploration Analysis (2)
| Tool | Description |
|------|-------------|
| exploration_merge | Merge multiple exploration results, deduplicate pages, build site map. Read-only. |
| exploration_diff | Compare two exploration results — find added, removed, and changed pages. Read-only. |
Expectation System
Every tool accepts an optional expectation parameter to control what data is included in the response:
{
"expectation": {
"includeSnapshot": true,
"includeConsole": true,
"includeNetwork": true,
"snapshotOptions": { "selector": "#main", "maxLength": 5000 },
"consoleOptions": { "levels": ["error", "warn"], "maxMessages": 10 },
"diffOptions": { "enabled": true, "format": "unified" }
}
}| Option | Description |
|--------|-------------|
| includeSnapshot | Include page snapshot (element list) in the response |
| includeConsole | Include browser console logs |
| includeNetwork | Include network request summary (requires BiDi) |
| snapshotOptions.selector | CSS selector to scope element discovery |
| snapshotOptions.maxLength | Truncate snapshot text at this length |
| consoleOptions.levels | Filter by log level: error, warn, info, log |
| diffOptions.enabled | Return only changes since last snapshot |
| diffOptions.format | Diff format: minimal or unified |
Each tool category has sensible defaults (e.g., navigation tools include snapshot, verification tools don't).
BiDi Cross-Browser Features
The server uses WebDriver BiDi protocol (always enabled) for cross-browser features that go beyond what the classic WebDriver API offers:
- Full-page screenshots —
take_screenshotwithorigin: "document"captures the entire scrollable page, not just the viewport - Element screenshots —
take_screenshotwithref: "e5"captures a specific element - Cross-browser PDF —
pdf_generateworks on Chrome, Firefox, and Edge (was Chrome-only with CDP) - Console events —
console_logsuses BiDiLogInspectorfor real-time console events across all browsers - Network monitoring — BiDi network events provide request/response tracking
- Stealth mode — Injects preload scripts via BiDi
script.addPreloadScriptto mask automation indicators
BiDi features degrade gracefully — if a browser doesn't support a specific BiDi feature, the tool falls back to the classic API.
Selector Teaching & Hints
The teach_selector tool lets you override auto-computed selectors with your own preferred CSS selectors. Taught selectors are persisted to <output>/selector-hints.json and loaded as Phase 0 (highest priority) during element discovery on matching pages.
How It Works
- Call
teach_selectorwith a description and CSS selector while on the page - The selector is validated in-browser (must match exactly 1 visible element)
- Scope is auto-determined: header/nav/footer elements default to site-wide (
*), content elements default to the current path pattern - On subsequent page snapshots, matching hints are loaded and used before any auto-computation
Example
teach_selector({
description: "the NL language link",
css: "a[href='/nl/']",
scope: "*" // optional — auto-determined if omitted
})Hints file structure (selector-hints.json):
{
"example.com": {
"*": [
{ "css": "a[href='/nl/']", "tag": "a", "text": "NL" }
],
"/blog/*": [
{ "css": "#post-title", "tag": "h1", "text": "My Post" }
]
}
}Element Discovery
The server uses a 16-phase selector computation engine that produces human-readable, semantically meaningful CSS and XPath selectors for every discovered element.
Selector Priority (Phases 0–16)
| Phase | Strategy | Example |
|-------|----------|---------|
| 0 | Taught hints | User-taught a[href="/nl/"] |
| 1 | By ID | #login-form |
| 2 | By test ID | [data-testid="submit-btn"] |
| 2b | By descendant test ID | form:has([data-testid="email"]) |
| 3 | By role + name | button[aria-label="Close"] |
| 4 | By label | //label[normalize-space()='Email']//input |
| 5 | By placeholder | input[placeholder="Search..."] |
| 6 | By text | //a[normalize-space()='Sign In'] |
| 7 | By attribute | a[hreflang="nl"], img[alt="Logo"] |
| 9 | By ARIA role | [role="dialog"] |
| 10 | By state | dialog[open], [aria-expanded] |
| 11 | By table cell | #data-table > tbody > tr:nth-child(2) > td:nth-child(3) |
| 12 | By compound attrs | input[type="email"][name="user"] |
| 13 | By semantic class | button.primary-action |
| 14 | By position | #sidebar > ul > li:nth-of-type(3) |
| 15 | By text (loose) | //span[contains(normalize-space(),'Welcome')] |
| 16 | By positional index | (//button[normalize-space()='Save'])[2] |
Key Capabilities
- Visibility-aware — only visible elements are counted for uniqueness, preventing hidden duplicates from causing fallbacks to fragile selectors
- Ancestor scoping — when a selector isn't unique globally, it's scoped to the nearest ancestor with an ID, test attribute, or landmark (
nav[aria-label="Main"] a[href="/"]) - Shadow DOM — traverses open shadow roots, scopes CSS within shadow boundaries, uses
>>>notation for cross-boundary selectors - Two-pass discovery — semantic elements (links, buttons, headings) get refs first; generic elements with test attributes fill remaining budget
- Non-semantic class filtering — auto-skips CSS-in-JS hashes, Tailwind utilities, and framework-generated classes
Test Generation & Healing Pipeline
The generator and healer tools form a complete test automation pipeline:
1. Plan
planner_setup_page → planner_explore_page → planner_save_plan2. Record & Generate
generator_setup_page → [interact with app] → stop_recording → generator_write_test- Recording captures actions with element locators (id, name, text, aria-label)
generator_write_testvalidates selectors against the live page before saving- A
.test-manifest.jsonis created alongside tests with framework, run command, and test list
3. Heal
healer_run_tests → healer_inspect_page → healer_fix_test → healer_run_testshealer_run_testsreads.test-manifest.jsonto auto-discover how to run testshealer_inspect_pagecompares expected locators against the live page to find UI drifthealer_fix_testvalidates selectors in the fixed code before writing- Seed tests (auth, fixtures) are run automatically before the main test when registered in the manifest
Spec Files
Save structured requirements as markdown specs before generating tests:
generator_save_spec → generator_write_test (with specFile param)Session Tracing
Enable tracing to record every tool call and result as structured JSON:
npx selenium-ai-agent --save-traceOr via env var:
{
"env": { "SELENIUM_MCP_SAVE_TRACE": "true" }
}Traces are saved to <output>/traces/session-<timestamp>.json on session close. Each trace entry records:
- Tool name and parameters
- Result content and error status
- Timestamps for performance analysis
Workspace Isolation
By default, all file-writing tools (screenshots, PDFs, test files, plans, analyzer output) validate that paths resolve within the output directory. This prevents accidental writes to system paths.
To override (e.g., for CI/CD or trusted environments):
npx selenium-ai-agent --allow-unrestricted-file-accessThe healer_fix_test tool is exempt — it modifies existing project test files by design.
Selenium Grid
For parallel browser automation across multiple browsers, set SELENIUM_GRID_URL:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"],
"env": {
"SELENIUM_GRID_URL": "http://localhost:4444"
}
}
}
}Quick Start with Docker Compose
The project includes a Docker Compose file for local Grid setup:
# Start Grid with 4 Chrome + 1 Firefox nodes
grid_start
# Or use docker-compose directly
docker compose up -dParallel Workflows
Parallel exploration — explore multiple sections of a site simultaneously:
session_create (x3) → parallel_explore → exploration_mergeParallel execution — run test steps across browsers:
session_create (chrome + firefox) → parallel_executeCross-browser testing — same actions on different browsers:
session_create (chrome) → session_create (firefox) → parallel_executeSee the project README for Docker Compose setup and Grid architecture details.
Output Mode
Control how binary data (screenshots, PDFs) is returned:
| Mode | Behavior |
|------|----------|
| stdout (default) | Return base64-encoded data to the LLM for inline display |
| file | Save to disk in <output>/screenshots/ or <output>/pdfs/ |
npx selenium-ai-agent --output-mode=fileArchitecture
selenium-mcp-server/src/
├── server.ts # MCP server, tool routing, expectation system, tracing
├── context.ts # Browser session state, EventCollector, SessionTracer
├── types.ts # Core types (ToolResult, BrowserConfig, Expectation, Grid types)
├── types/
│ └── manifest.ts # Shared test manifest types (generator ↔ healer)
├── bidi/
│ ├── event-collector.ts # BiDi event subscriptions (console, network, navigation)
│ └── index.ts
├── trace/
│ ├── session-tracer.ts # Tool call + result recording
│ └── index.ts
├── utils/
│ ├── bidi-helpers.ts # BiDi WebSocket URL rewriting + context factory
│ ├── chrome-options.ts # Chrome options builder + stealth scripts
│ ├── element-discovery/ # Accessibility tree discovery (e1-e300)
│ │ ├── index.ts # Barrel exports
│ │ ├── discover.ts # discoverElements() with selector hints
│ │ ├── selector-scripts.ts # Browser-side computeSelector() (15 phases)
│ │ ├── tree-scripts.ts # Browser-side accessibility tree walker
│ │ ├── format-tree.ts # formatAccessibilityTree() (full/smart/minimal)
│ │ └── element-scripts.ts # extractElementInfo(), findElementByInfo()
│ ├── selector-hints.ts # Persistent domain-scoped selector hint storage
│ ├── paths.ts # Output directory resolution
│ ├── sandbox.ts # Workspace path validation
│ ├── selector-validation.ts # Extract + validate selectors from test code
│ ├── schema.ts # Zod → JSON Schema converter
│ └── docker.ts # Docker Compose helpers
├── grid/
│ ├── grid-client.ts # Grid REST API client
│ ├── grid-session.ts # Remote browser session
│ ├── session-pool.ts # Session lifecycle management
│ ├── session-context.ts # Context adapter for grid sessions
│ └── exploration-coordinator.ts
└── tools/ # 75 tools grouped by domain
├── base.ts # BaseTool abstract class + MCP annotations
├── index.ts # Tool registry
├── navigation/ # navigate_to, go_back, go_forward, refresh_page, scroll_page
├── page/ # capture_page, take_screenshot
├── elements/ # click, hover, select, drag_drop, teach_selector
├── input/ # input_text, key_press, file_upload
├── mouse/ # mouse_move, mouse_click, mouse_drag
├── tabs/ # tab_list, tab_select, tab_new, tab_close
├── verification/ # verify_element_visible, verify_text, verify_value, verify_list
├── browser/ # wait, javascript, resize, dialog, console, network, pdf
├── session/ # close_browser, reset_session, set_stealth_mode
├── recording/ # start, stop, status, clear
├── agents/ # planner, generator, healer, spec tools
├── analyzer/ # setup, import, scan, risk, save, documentation
├── batch/ # batch_execute
└── grid/ # 14 grid management + parallel execution toolsLicense
MIT
