@dhivakaranthonydoss/mcp-selenium

v0.1.5

Published

8 months ago

A Model Context Protocol (MCP) server for Selenium WebDriver automation with support for Chrome, Firefox, and Edge browsers

0High
0Medium
0Low

dhivakaranthonydoss

mcp model-context-protocol selenium webdriver browser-automation testing web-scraping chrome firefox edge automation alerts web-testing

🚀 MCP Selenium Server

✨ What is MCP Selenium Server?

MCP Selenium Server bridges the gap between AI assistants and web browser automation. Built on the Model Context Protocol (MCP), it enables AI agents to interact with web browsers just like humans do - clicking buttons, filling forms, taking screenshots, and handling complex web interactions.

Perfect for:

🤖 AI-driven web automation
🧪 Automated testing workflows
📊 Web scraping and data extraction
🔍 Browser-based monitoring
📱 Cross-browser compatibility testing

🎯 Features

🌐 Browser Management

Multi-browser support: Chrome, Firefox, Microsoft Edge
Headless mode: Run browsers invisibly in the background
Custom options: Configure browser arguments and preferences
Session management: Handle multiple browser instances

🎮 Element Interactions

Smart element finding: ID, CSS, XPath, name, tag, class selectors
User actions: Click, type, hover, drag & drop
Advanced gestures: Double-click, right-click, keyboard shortcuts
File operations: Upload files with ease

🚨 Alert Handling

JavaScript alerts: Accept or dismiss alert dialogs
Confirmations: Handle confirm dialogs programmatically
Prompts: Send text to prompt dialogs
Text extraction: Get alert message content

📸 Visual Capture

Screenshots: Capture full-page or element-specific images
Base64 support: Get images as data URIs
File saving: Save screenshots to disk

⚡ Advanced Capabilities

Wait strategies: Smart waiting for elements and conditions
Error handling: Robust error management and reporting
Timeout controls: Configurable timeouts for all operations

🚀 Quick Start

1. Install the Package

npm install -g @dhivakaranthonydoss/mcp-selenium

2. Configure Your MCP Client

Add to your MCP configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "selenium": {
      "command": "npx",
      "args": ["-y", "@dhivakaranthonydoss/mcp-selenium"]
    }
  }
}

3. Start Automating!

Your AI assistant can now control browsers:

🤖 "Please open Chrome, navigate to example.com, and take a screenshot"

📦 Installation

Option 1: Global Installation

npm install -g @dhivakaranthonydoss/mcp-selenium

Option 2: Use with NPX (Recommended)

npx @dhivakaranthonydoss/mcp-selenium

Option 3: Local Project Installation

npm install @dhivakaranthonydoss/mcp-selenium

🛠️ Supported Browsers

| Browser | Version | Status | Notes | |---------|---------|--------|-------| | Chrome | 70+ | ✅ Full Support | Recommended for best performance | | Firefox | 60+ | ✅ Full Support | Good alternative to Chrome | | Microsoft Edge | 79+ | ✅ Full Support | Chromium-based versions |

Note: Make sure you have the browsers installed on your system. WebDriver will be managed automatically.

📚 API Reference

Browser Management

`start_browser`

Launch a new browser session

{
  "tool": "start_browser",
  "parameters": {
    "browser": "chrome",
    "options": {
      "headless": true,
      "arguments": ["--no-sandbox", "--disable-dev-shm-usage"]
    }
  }
}

`navigate`

Navigate to a URL

{
  "tool": "navigate",
  "parameters": {
    "url": "https://example.com"
  }
}

`close_session`

Close the current browser session

{
  "tool": "close_session",
  "parameters": {}
}

Element Interactions

`find_element`

Locate an element on the page

{
  "tool": "find_element",
  "parameters": {
    "by": "id",
    "value": "search-box",
    "timeout": 10000
  }
}

`click_element`

Click on an element

{
  "tool": "click_element",
  "parameters": {
    "by": "css",
    "value": ".submit-button"
  }
}

`send_keys`

Type text into an element

{
  "tool": "send_keys",
  "parameters": {
    "by": "name",
    "value": "username",
    "text": "myusername"
  }
}

`get_element_text`

Extract text from an element

{
  "tool": "get_element_text",
  "parameters": {
    "by": "css",
    "value": ".status-message"
  }
}

Mouse Actions

`hover`

Hover over an element

{
  "tool": "hover",
  "parameters": {
    "by": "css",
    "value": ".dropdown-trigger"
  }
}

`drag_and_drop`

Drag one element to another

{
  "tool": "drag_and_drop",
  "parameters": {
    "by": "id",
    "value": "draggable-item",
    "targetBy": "id",
    "targetValue": "drop-zone"
  }
}

`double_click`

Perform a double-click

{
  "tool": "double_click",
  "parameters": {
    "by": "css",
    "value": ".file-item"
  }
}

`right_click`

Perform a right-click (context menu)

{
  "tool": "right_click",
  "parameters": {
    "by": "css",
    "value": ".context-menu-trigger"
  }
}

Alert Handling

`accept_alert`

Accept a JavaScript alert

{
  "tool": "accept_alert",
  "parameters": {}
}

`dismiss_alert`

Dismiss/cancel an alert

{
  "tool": "dismiss_alert",
  "parameters": {}
}

`get_alert_text`

Get the text from an alert

{
  "tool": "get_alert_text",
  "parameters": {}
}

`send_alert_text`

Send text to a prompt dialog

{
  "tool": "send_alert_text",
  "parameters": {
    "text": "My response"
  }
}

Keyboard & File Operations

`press_key`

Press a keyboard key

{
  "tool": "press_key",
  "parameters": {
    "key": "Enter"
  }
}

`upload_file`

Upload a file using a file input

{
  "tool": "upload_file",
  "parameters": {
    "by": "css",
    "value": "input[type='file']",
    "filePath": "/path/to/file.pdf"
  }
}

`take_screenshot`

Capture a screenshot

{
  "tool": "take_screenshot",
  "parameters": {
    "outputPath": "/path/to/screenshot.png"
  }
}

🎨 Examples

Example 1: Basic Web Automation

// Start browser and navigate
await startBrowser({ browser: "chrome", options: { headless: false }});
await navigate({ url: "https://example.com" });

// Find and interact with elements
await clickElement({ by: "css", value: ".login-button" });
await sendKeys({ by: "id", value: "username", text: "testuser" });
await sendKeys({ by: "id", value: "password", text: "password123" });
await clickElement({ by: "css", value: ".submit-btn" });

// Take a screenshot
await takeScreenshot({ outputPath: "./login-success.png" });

Example 2: Form Automation with Alerts

// Fill out a contact form
await sendKeys({ by: "name", value: "email", text: "[email protected]" });
await sendKeys({ by: "name", value: "message", text: "Hello World!" });
await clickElement({ by: "css", value: ".send-button" });

// Handle confirmation alert
const alertText = await getAlertText();
console.log("Alert says:", alertText);
await acceptAlert();

Example 3: E-commerce Automation

// Product search and selection
await sendKeys({ by: "css", value: ".search-input", text: "laptop" });
await pressKey({ key: "Enter" });

// Hover over product to see details
await hover({ by: "css", value: ".product-item:first-child" });

// Add to cart
await clickElement({ by: "css", value: ".add-to-cart" });

// Drag product to wishlist
await dragAndDrop({ 
  by: "css", value: ".product-item",
  targetBy: "css", targetValue: ".wishlist-area"
});

🔧 Configuration

Browser Options

| Option | Type | Description | Example | |--------|------|-------------|---------| | headless | boolean | Run browser without GUI | true | | arguments | string[] | Custom browser arguments | ["--no-sandbox"] |

Locator Strategies

| Strategy | Description | Example | |----------|-------------|---------| | id | Find by element ID | "submit-button" | | css | Find by CSS selector | ".btn.primary" | | xpath | Find by XPath expression | "//button[@type='submit']" | | name | Find by name attribute | "username" | | tag | Find by tag name | "button" | | class | Find by class name | "submit-btn" |

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone the repository
git clone https://github.com/dhivakaranthonydoss/mcp-selenium.git

# Install dependencies
npm install

# Start development
npm run start

📄 License

This project is licensed under the ISC License - see the LICENSE file for details.

👨‍💻 Author

Dhivakaran Anthony Doss

GitHub: @dhivakaranthonydoss
Package: @dhivakaranthonydoss/mcp-selenium

🙏 Acknowledgments

Built on the Model Context Protocol (MCP)
Powered by Selenium WebDriver
Inspired by the need for AI-driven browser automation

⭐ Star this repo if you find it useful! ⭐

Report Bug • Request Feature • Contribute