npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

selenium-ai-agent

v3.0.0

Published

AI-powered Selenium MCP server for browser automation — 74 tools with accessibility tree discovery, test generation, self-healing, and Selenium Grid parallel execution for AI agents

Downloads

2,503

Readme

selenium-ai-agent

AI-powered Selenium MCP server for browser automation — 75 tools with accessibility tree discovery, selector teaching, BiDi cross-browser support, Selenium Grid parallel execution, test generation & self-healing pipeline, and session tracing.

One-Click Install

Install in VS Code Install in VS Code Insiders Install in Cursor

Install

npm install -g selenium-ai-agent

Or run directly without installing:

npx selenium-ai-agent

Requirements

  • Node.js 18+
  • Chrome browser (or Firefox/Edge)
  • ChromeDriver is automatically managed by selenium-webdriver

Quick Start

Add to your MCP client config:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Then ask your AI assistant: "Navigate to https://example.com and take a screenshot"

Client Setup

Claude Code

claude mcp add selenium-mcp -- npx selenium-ai-agent

Or add to your project .mcp.json:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Config paths per OS:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Cursor

Add to .cursor/mcp.json (project) or ~/.cursor/mcp.json (global):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

GitHub Copilot (VS Code 1.99+)

Add to .vscode/mcp.json:

{
  "servers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "type": "stdio"
    }
  }
}

Note: Copilot uses "servers" instead of "mcpServers".

Cline

Open the MCP Servers panel in Cline, click Configure, then Advanced MCP Settings, and add:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json (global) or .windsurf/mcp_config.json (project):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Tool Auto-Approval (Reducing "Yes" Prompts)

All tools include MCP annotations (readOnlyHint, destructiveHint, etc.) that help clients auto-approve safe tools. Read-only tools like capture_page, recording_status, and grid_status are marked as non-destructive and can be auto-approved by clients that support annotations.

Claude Desktop

After the first approval, click "Always allow" for each tool to stop future prompts. Tools marked readOnlyHint: true may be auto-approved by the client.

Claude Code

Use --allow-mcp selenium-mcp to pre-approve all tools from this server:

claude --allow-mcp selenium-mcp

Or configure in .claude/settings.json:

{
  "permissions": {
    "allow": ["mcp__selenium-mcp__*"]
  }
}

Cursor / Cline / Windsurf

These clients typically allow you to configure auto-approval per tool or per server in their settings. Check your client's MCP settings for "auto-approve" or "always allow" options.


Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | SELENIUM_GRID_URL | — | Grid hub URL (enables parallel features) | | SELENIUM_BROWSER | chrome | Browser to use (chrome, firefox, edge) | | SELENIUM_HEADLESS | false | Run browser in headless mode | | SELENIUM_STEALTH | false | Enable stealth mode (hide automation indicators) | | SELENIUM_MCP_OUTPUT_MODE | stdout | Output mode: stdout (return data to LLM) or file (save to disk) | | SELENIUM_MCP_OUTPUT_DIR | auto | Output directory for generated files (auto-detected from project root) | | SELENIUM_MCP_SAVE_TRACE | false | Save session trace JSON to <output>/traces/ | | SELENIUM_MCP_UNRESTRICTED_FILES | false | Bypass workspace path validation (allow writing outside output dir) | | SE_AVOID_STATS | — | Set to true to disable Selenium usage statistics |

Pass env vars in your MCP config:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "env": {
        "SELENIUM_HEADLESS": "true",
        "SELENIUM_STEALTH": "true",
        "SE_AVOID_STATS": "true"
      }
    }
  }
}

CLI Flags

npx selenium-ai-agent [flags]

| Flag | Description | |------|-------------| | --stealth | Enable stealth mode | | --headless | Run browser headless | | --save-trace | Save session trace JSON | | --output-mode=stdout\|file | Set output mode | | --output-dir=<path> | Set output directory | | --grid-url=<url> | Set Selenium Grid hub URL | | --allow-unrestricted-file-access | Bypass workspace file path validation |


Tools (75)

Navigation (5)

| Tool | Description | |------|-------------| | navigate_to | Navigate the browser to a URL. Starts browser automatically if not running. | | go_back | Navigate back in browser history. | | go_forward | Navigate forward in browser history. | | refresh_page | Refresh the current page. | | scroll_page | Scroll the page in a direction (up/down/left/right) by pixel amount, or scroll a specific element into view by CSS selector. |

Page Analysis (2)

| Tool | Description | |------|-------------| | capture_page | Capture the current page state as an accessibility tree — returns elements with ARIA roles, semantic hierarchy, and refs (e1, e2, ...). Discovers up to 300 elements with visibility-aware selectors, ancestor scoping, and Shadow DOM traversal. Read-only. | | take_screenshot | Take a screenshot (viewport, full-page, or element). Uses BiDi when available for full-page/element screenshots, falls back to classic API. Params: origin (viewport/document), ref (element), format (png/jpeg), quality. |

Elements (5)

| Tool | Description | |------|-------------| | click_element | Click an element using its ref from the page snapshot. | | hover_element | Hover over an element using its ref. | | select_option | Select a dropdown option by value, text, or index. | | drag_drop | Drag from one element to another using refs. | | teach_selector | Teach the system a preferred CSS selector for an element. Saved as Phase 0 (highest priority) in future element discovery on that domain. Auto-scopes to site-wide for header/nav/footer elements, or path-specific for content. |

Input (3)

| Tool | Description | |------|-------------| | input_text | Type text into an input field or textarea. | | key_press | Press a keyboard key, optionally with modifiers (ctrl, alt, shift, meta). | | file_upload | Upload a file through a file input element. |

Mouse (3)

| Tool | Description | |------|-------------| | mouse_move | Move mouse to specific coordinates. | | mouse_click | Click at coordinates with specified button (left, right, middle). | | mouse_drag | Drag from one position to another. |

Tabs (4)

| Tool | Description | |------|-------------| | tab_list | List all open browser tabs with titles and URLs. Read-only. | | tab_select | Switch to a specific browser tab. | | tab_new | Open a new browser tab, optionally navigating to a URL. | | tab_close | Close a specific browser tab. |

Verification (4)

| Tool | Description | |------|-------------| | verify_element_visible | Verify that an element is visible on the page (with timeout). Read-only. | | verify_text_visible | Verify that specific text is visible on the page (with timeout). Read-only. | | verify_value | Verify that an input element has the expected value. Read-only. | | verify_list_visible | Verify that multiple text items are all visible on the page. Read-only. |

Browser (7)

| Tool | Description | |------|-------------| | wait_for | Wait for a condition: element visible, clickable, present, URL contains, or title contains. | | execute_javascript | Execute JavaScript code in the browser context with optional return value. | | resize_window | Resize the browser window to specified dimensions. | | dialog_handle | Handle browser dialogs (alert, confirm, prompt). | | console_logs | Get or clear browser console logs. Uses BiDi event collector when available for cross-browser support, falls back to classic log API. | | network_monitor | Monitor network requests: get requests, clear, or toggle offline mode. | | pdf_generate | Generate a PDF from the current page. Uses BiDi printPage for cross-browser support (Chrome, Firefox, Edge), falls back to CDP. Params: format, landscape, scale, pageRanges. Optional filePath — omit to return as base64 resource. |

Session (3)

| Tool | Description | |------|-------------| | close_browser | Close the browser and end the session. | | reset_session | Reset the browser session (close and restart). | | set_stealth_mode | Enable/disable stealth mode — hides navigator.webdriver, patches plugins, sets realistic languages. |

Recording (4)

| Tool | Description | |------|-------------| | start_recording | Start recording browser actions for test script generation. | | stop_recording | Stop recording and return full action log with element locators and framework hint. | | recording_status | Check if recording is active and show recent actions. Read-only. | | clear_recording | Clear all recorded browser actions. |

Test Planner (3)

| Tool | Description | |------|-------------| | planner_setup_page | Initialize test planning — navigate to app and start exploring. | | planner_explore_page | Explore a page in detail, discovering elements, forms, and links. | | planner_save_plan | Save completed test plan to a markdown file. |

Test Generator (6)

| Tool | Description | |------|-------------| | generator_setup_page | Initialize test generation session — navigate to app, start recording, set framework. | | generator_read_log | Retrieve the action log from the recording session. Read-only. | | generator_write_test | Save generated test code and update .test-manifest.json. Supports verify (validates selectors against live page) and specFile (links to spec). | | generator_write_seed | Write a seed/bootstrap test (auth, fixtures, env setup) and register in manifest under seedTests[]. | | generator_save_spec | Save a structured markdown spec to <output>/specs/. | | generator_read_spec | Read a spec file. Read-only. |

Test Healer (5)

| Tool | Description | |------|-------------| | healer_run_tests | Execute tests and return output. Supports manifest mode (reads .test-manifest.json) or explicit mode (provide command + args). Runs seed tests first when present. | | healer_debug_test | Run a single test in verbose mode with detailed output (15KB stdout, 8KB stderr). | | healer_fix_test | Apply a fix to a test file with .bak backup. Supports verify (validates selectors in fixed code). | | healer_inspect_page | Inspect current page against expected locators — reports found, missing, and changed elements with suggested updated locators. Use after test failure to understand UI drift. | | browser_generate_locator | Generate robust locator strategy for an element by description. Read-only. |

Regression Analyzer (6)

| Tool | Description | |------|-------------| | analyzer_setup | Initialize regression analysis session with product URL and business context. | | analyzer_import_context | Import additional context from files, inline text, or URLs. | | analyzer_scan_product | Explore product using process walking and page scanning. | | analyzer_build_risk_profile | Build risk profile from discovered features and context. Read-only. | | analyzer_save_profile | Save risk profile to YAML or JSON file. | | analyzer_generate_documentation | Generate product discovery documentation with screenshots. |

Batch (1)

| Tool | Description | |------|-------------| | batch_execute | Execute up to 20 tool steps in a single round trip. Intermediate steps skip snapshots for speed. |

Grid Management (4)

| Tool | Description | |------|-------------| | grid_status | Check Grid status — nodes, browsers, capacity. Read-only. | | grid_start | Start Selenium Grid via Docker Compose with configurable Chrome/Firefox node counts. | | grid_stop | Stop Selenium Grid. | | grid_scale | Scale Grid to desired number of browser nodes. |

Grid Sessions (5)

| Tool | Description | |------|-------------| | session_create | Create a new browser session on the Grid. | | session_select | Select a grid session as active browser for all subsequent tool calls. | | session_list | List all active Grid sessions, optionally filtered by tags. Read-only. | | session_destroy | Destroy a specific Grid session. | | session_destroy_all | Destroy all Grid sessions, optionally filtered by tags. |

Grid Parallel Execution (3)

| Tool | Description | |------|-------------| | parallel_explore | Explore multiple URLs in parallel — each target gets its own Grid session. | | parallel_execute | Execute multiple task sequences in parallel across Grid sessions. | | planner_generate_plan | Generate structured test plan from parallel exploration results. |

Grid Exploration Analysis (2)

| Tool | Description | |------|-------------| | exploration_merge | Merge multiple exploration results, deduplicate pages, build site map. Read-only. | | exploration_diff | Compare two exploration results — find added, removed, and changed pages. Read-only. |


Expectation System

Every tool accepts an optional expectation parameter to control what data is included in the response:

{
  "expectation": {
    "includeSnapshot": true,
    "includeConsole": true,
    "includeNetwork": true,
    "snapshotOptions": { "selector": "#main", "maxLength": 5000 },
    "consoleOptions": { "levels": ["error", "warn"], "maxMessages": 10 },
    "diffOptions": { "enabled": true, "format": "unified" }
  }
}

| Option | Description | |--------|-------------| | includeSnapshot | Include page snapshot (element list) in the response | | includeConsole | Include browser console logs | | includeNetwork | Include network request summary (requires BiDi) | | snapshotOptions.selector | CSS selector to scope element discovery | | snapshotOptions.maxLength | Truncate snapshot text at this length | | consoleOptions.levels | Filter by log level: error, warn, info, log | | diffOptions.enabled | Return only changes since last snapshot | | diffOptions.format | Diff format: minimal or unified |

Each tool category has sensible defaults (e.g., navigation tools include snapshot, verification tools don't).


BiDi Cross-Browser Features

The server uses WebDriver BiDi protocol (always enabled) for cross-browser features that go beyond what the classic WebDriver API offers:

  • Full-page screenshotstake_screenshot with origin: "document" captures the entire scrollable page, not just the viewport
  • Element screenshotstake_screenshot with ref: "e5" captures a specific element
  • Cross-browser PDFpdf_generate works on Chrome, Firefox, and Edge (was Chrome-only with CDP)
  • Console eventsconsole_logs uses BiDi LogInspector for real-time console events across all browsers
  • Network monitoring — BiDi network events provide request/response tracking
  • Stealth mode — Injects preload scripts via BiDi script.addPreloadScript to mask automation indicators

BiDi features degrade gracefully — if a browser doesn't support a specific BiDi feature, the tool falls back to the classic API.


Selector Teaching & Hints

The teach_selector tool lets you override auto-computed selectors with your own preferred CSS selectors. Taught selectors are persisted to <output>/selector-hints.json and loaded as Phase 0 (highest priority) during element discovery on matching pages.

How It Works

  1. Call teach_selector with a description and CSS selector while on the page
  2. The selector is validated in-browser (must match exactly 1 visible element)
  3. Scope is auto-determined: header/nav/footer elements default to site-wide (*), content elements default to the current path pattern
  4. On subsequent page snapshots, matching hints are loaded and used before any auto-computation

Example

teach_selector({
  description: "the NL language link",
  css: "a[href='/nl/']",
  scope: "*"  // optional — auto-determined if omitted
})

Hints file structure (selector-hints.json):

{
  "example.com": {
    "*": [
      { "css": "a[href='/nl/']", "tag": "a", "text": "NL" }
    ],
    "/blog/*": [
      { "css": "#post-title", "tag": "h1", "text": "My Post" }
    ]
  }
}

Element Discovery

The server uses a 16-phase selector computation engine that produces human-readable, semantically meaningful CSS and XPath selectors for every discovered element.

Selector Priority (Phases 0–16)

| Phase | Strategy | Example | |-------|----------|---------| | 0 | Taught hints | User-taught a[href="/nl/"] | | 1 | By ID | #login-form | | 2 | By test ID | [data-testid="submit-btn"] | | 2b | By descendant test ID | form:has([data-testid="email"]) | | 3 | By role + name | button[aria-label="Close"] | | 4 | By label | //label[normalize-space()='Email']//input | | 5 | By placeholder | input[placeholder="Search..."] | | 6 | By text | //a[normalize-space()='Sign In'] | | 7 | By attribute | a[hreflang="nl"], img[alt="Logo"] | | 9 | By ARIA role | [role="dialog"] | | 10 | By state | dialog[open], [aria-expanded] | | 11 | By table cell | #data-table > tbody > tr:nth-child(2) > td:nth-child(3) | | 12 | By compound attrs | input[type="email"][name="user"] | | 13 | By semantic class | button.primary-action | | 14 | By position | #sidebar > ul > li:nth-of-type(3) | | 15 | By text (loose) | //span[contains(normalize-space(),'Welcome')] | | 16 | By positional index | (//button[normalize-space()='Save'])[2] |

Key Capabilities

  • Visibility-aware — only visible elements are counted for uniqueness, preventing hidden duplicates from causing fallbacks to fragile selectors
  • Ancestor scoping — when a selector isn't unique globally, it's scoped to the nearest ancestor with an ID, test attribute, or landmark (nav[aria-label="Main"] a[href="/"])
  • Shadow DOM — traverses open shadow roots, scopes CSS within shadow boundaries, uses >>> notation for cross-boundary selectors
  • Two-pass discovery — semantic elements (links, buttons, headings) get refs first; generic elements with test attributes fill remaining budget
  • Non-semantic class filtering — auto-skips CSS-in-JS hashes, Tailwind utilities, and framework-generated classes

Test Generation & Healing Pipeline

The generator and healer tools form a complete test automation pipeline:

1. Plan

planner_setup_page → planner_explore_page → planner_save_plan

2. Record & Generate

generator_setup_page → [interact with app] → stop_recording → generator_write_test
  • Recording captures actions with element locators (id, name, text, aria-label)
  • generator_write_test validates selectors against the live page before saving
  • A .test-manifest.json is created alongside tests with framework, run command, and test list

3. Heal

healer_run_tests → healer_inspect_page → healer_fix_test → healer_run_tests
  • healer_run_tests reads .test-manifest.json to auto-discover how to run tests
  • healer_inspect_page compares expected locators against the live page to find UI drift
  • healer_fix_test validates selectors in the fixed code before writing
  • Seed tests (auth, fixtures) are run automatically before the main test when registered in the manifest

Spec Files

Save structured requirements as markdown specs before generating tests:

generator_save_spec → generator_write_test (with specFile param)

Session Tracing

Enable tracing to record every tool call and result as structured JSON:

npx selenium-ai-agent --save-trace

Or via env var:

{
  "env": { "SELENIUM_MCP_SAVE_TRACE": "true" }
}

Traces are saved to <output>/traces/session-<timestamp>.json on session close. Each trace entry records:

  • Tool name and parameters
  • Result content and error status
  • Timestamps for performance analysis

Workspace Isolation

By default, all file-writing tools (screenshots, PDFs, test files, plans, analyzer output) validate that paths resolve within the output directory. This prevents accidental writes to system paths.

To override (e.g., for CI/CD or trusted environments):

npx selenium-ai-agent --allow-unrestricted-file-access

The healer_fix_test tool is exempt — it modifies existing project test files by design.


Selenium Grid

For parallel browser automation across multiple browsers, set SELENIUM_GRID_URL:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "env": {
        "SELENIUM_GRID_URL": "http://localhost:4444"
      }
    }
  }
}

Quick Start with Docker Compose

The project includes a Docker Compose file for local Grid setup:

# Start Grid with 4 Chrome + 1 Firefox nodes
grid_start

# Or use docker-compose directly
docker compose up -d

Parallel Workflows

Parallel exploration — explore multiple sections of a site simultaneously:

session_create (x3) → parallel_explore → exploration_merge

Parallel execution — run test steps across browsers:

session_create (chrome + firefox) → parallel_execute

Cross-browser testing — same actions on different browsers:

session_create (chrome) → session_create (firefox) → parallel_execute

See the project README for Docker Compose setup and Grid architecture details.


Output Mode

Control how binary data (screenshots, PDFs) is returned:

| Mode | Behavior | |------|----------| | stdout (default) | Return base64-encoded data to the LLM for inline display | | file | Save to disk in <output>/screenshots/ or <output>/pdfs/ |

npx selenium-ai-agent --output-mode=file

Architecture

selenium-mcp-server/src/
├── server.ts              # MCP server, tool routing, expectation system, tracing
├── context.ts             # Browser session state, EventCollector, SessionTracer
├── types.ts               # Core types (ToolResult, BrowserConfig, Expectation, Grid types)
├── types/
│   └── manifest.ts        # Shared test manifest types (generator ↔ healer)
├── bidi/
│   ├── event-collector.ts # BiDi event subscriptions (console, network, navigation)
│   └── index.ts
├── trace/
│   ├── session-tracer.ts  # Tool call + result recording
│   └── index.ts
├── utils/
│   ├── bidi-helpers.ts    # BiDi WebSocket URL rewriting + context factory
│   ├── chrome-options.ts  # Chrome options builder + stealth scripts
│   ├── element-discovery/   # Accessibility tree discovery (e1-e300)
│   │   ├── index.ts         # Barrel exports
│   │   ├── discover.ts      # discoverElements() with selector hints
│   │   ├── selector-scripts.ts # Browser-side computeSelector() (15 phases)
│   │   ├── tree-scripts.ts  # Browser-side accessibility tree walker
│   │   ├── format-tree.ts   # formatAccessibilityTree() (full/smart/minimal)
│   │   └── element-scripts.ts # extractElementInfo(), findElementByInfo()
│   ├── selector-hints.ts    # Persistent domain-scoped selector hint storage
│   ├── paths.ts           # Output directory resolution
│   ├── sandbox.ts         # Workspace path validation
│   ├── selector-validation.ts # Extract + validate selectors from test code
│   ├── schema.ts          # Zod → JSON Schema converter
│   └── docker.ts          # Docker Compose helpers
├── grid/
│   ├── grid-client.ts     # Grid REST API client
│   ├── grid-session.ts    # Remote browser session
│   ├── session-pool.ts    # Session lifecycle management
│   ├── session-context.ts # Context adapter for grid sessions
│   └── exploration-coordinator.ts
└── tools/                 # 75 tools grouped by domain
    ├── base.ts            # BaseTool abstract class + MCP annotations
    ├── index.ts           # Tool registry
    ├── navigation/        # navigate_to, go_back, go_forward, refresh_page, scroll_page
    ├── page/              # capture_page, take_screenshot
    ├── elements/          # click, hover, select, drag_drop, teach_selector
    ├── input/             # input_text, key_press, file_upload
    ├── mouse/             # mouse_move, mouse_click, mouse_drag
    ├── tabs/              # tab_list, tab_select, tab_new, tab_close
    ├── verification/      # verify_element_visible, verify_text, verify_value, verify_list
    ├── browser/           # wait, javascript, resize, dialog, console, network, pdf
    ├── session/           # close_browser, reset_session, set_stealth_mode
    ├── recording/         # start, stop, status, clear
    ├── agents/            # planner, generator, healer, spec tools
    ├── analyzer/          # setup, import, scan, risk, save, documentation
    ├── batch/             # batch_execute
    └── grid/              # 14 grid management + parallel execution tools

License

MIT