@dhivakaranthonydoss/mcp-selenium
v0.1.5
Published
A Model Context Protocol (MCP) server for Selenium WebDriver automation with support for Chrome, Firefox, and Edge browsers
Maintainers
Readme
🚀 MCP Selenium Server
✨ What is MCP Selenium Server?
MCP Selenium Server bridges the gap between AI assistants and web browser automation. Built on the Model Context Protocol (MCP), it enables AI agents to interact with web browsers just like humans do - clicking buttons, filling forms, taking screenshots, and handling complex web interactions.
Perfect for:
- 🤖 AI-driven web automation
- 🧪 Automated testing workflows
- 📊 Web scraping and data extraction
- 🔍 Browser-based monitoring
- 📱 Cross-browser compatibility testing
🎯 Features
🌐 Browser Management
- Multi-browser support: Chrome, Firefox, Microsoft Edge
- Headless mode: Run browsers invisibly in the background
- Custom options: Configure browser arguments and preferences
- Session management: Handle multiple browser instances
🎮 Element Interactions
- Smart element finding: ID, CSS, XPath, name, tag, class selectors
- User actions: Click, type, hover, drag & drop
- Advanced gestures: Double-click, right-click, keyboard shortcuts
- File operations: Upload files with ease
🚨 Alert Handling
- JavaScript alerts: Accept or dismiss alert dialogs
- Confirmations: Handle confirm dialogs programmatically
- Prompts: Send text to prompt dialogs
- Text extraction: Get alert message content
📸 Visual Capture
- Screenshots: Capture full-page or element-specific images
- Base64 support: Get images as data URIs
- File saving: Save screenshots to disk
⚡ Advanced Capabilities
- Wait strategies: Smart waiting for elements and conditions
- Error handling: Robust error management and reporting
- Timeout controls: Configurable timeouts for all operations
🚀 Quick Start
1. Install the Package
npm install -g @dhivakaranthonydoss/mcp-selenium2. Configure Your MCP Client
Add to your MCP configuration (e.g., Claude Desktop):
{
"mcpServers": {
"selenium": {
"command": "npx",
"args": ["-y", "@dhivakaranthonydoss/mcp-selenium"]
}
}
}3. Start Automating!
Your AI assistant can now control browsers:
🤖 "Please open Chrome, navigate to example.com, and take a screenshot"📦 Installation
Option 1: Global Installation
npm install -g @dhivakaranthonydoss/mcp-seleniumOption 2: Use with NPX (Recommended)
npx @dhivakaranthonydoss/mcp-seleniumOption 3: Local Project Installation
npm install @dhivakaranthonydoss/mcp-selenium🛠️ Supported Browsers
| Browser | Version | Status | Notes | |---------|---------|--------|-------| | Chrome | 70+ | ✅ Full Support | Recommended for best performance | | Firefox | 60+ | ✅ Full Support | Good alternative to Chrome | | Microsoft Edge | 79+ | ✅ Full Support | Chromium-based versions |
Note: Make sure you have the browsers installed on your system. WebDriver will be managed automatically.
📚 API Reference
Browser Management
start_browser
Launch a new browser session
{
"tool": "start_browser",
"parameters": {
"browser": "chrome",
"options": {
"headless": true,
"arguments": ["--no-sandbox", "--disable-dev-shm-usage"]
}
}
}navigate
Navigate to a URL
{
"tool": "navigate",
"parameters": {
"url": "https://example.com"
}
}close_session
Close the current browser session
{
"tool": "close_session",
"parameters": {}
}Element Interactions
find_element
Locate an element on the page
{
"tool": "find_element",
"parameters": {
"by": "id",
"value": "search-box",
"timeout": 10000
}
}click_element
Click on an element
{
"tool": "click_element",
"parameters": {
"by": "css",
"value": ".submit-button"
}
}send_keys
Type text into an element
{
"tool": "send_keys",
"parameters": {
"by": "name",
"value": "username",
"text": "myusername"
}
}get_element_text
Extract text from an element
{
"tool": "get_element_text",
"parameters": {
"by": "css",
"value": ".status-message"
}
}Mouse Actions
hover
Hover over an element
{
"tool": "hover",
"parameters": {
"by": "css",
"value": ".dropdown-trigger"
}
}drag_and_drop
Drag one element to another
{
"tool": "drag_and_drop",
"parameters": {
"by": "id",
"value": "draggable-item",
"targetBy": "id",
"targetValue": "drop-zone"
}
}double_click
Perform a double-click
{
"tool": "double_click",
"parameters": {
"by": "css",
"value": ".file-item"
}
}right_click
Perform a right-click (context menu)
{
"tool": "right_click",
"parameters": {
"by": "css",
"value": ".context-menu-trigger"
}
}Alert Handling
accept_alert
Accept a JavaScript alert
{
"tool": "accept_alert",
"parameters": {}
}dismiss_alert
Dismiss/cancel an alert
{
"tool": "dismiss_alert",
"parameters": {}
}get_alert_text
Get the text from an alert
{
"tool": "get_alert_text",
"parameters": {}
}send_alert_text
Send text to a prompt dialog
{
"tool": "send_alert_text",
"parameters": {
"text": "My response"
}
}Keyboard & File Operations
press_key
Press a keyboard key
{
"tool": "press_key",
"parameters": {
"key": "Enter"
}
}upload_file
Upload a file using a file input
{
"tool": "upload_file",
"parameters": {
"by": "css",
"value": "input[type='file']",
"filePath": "/path/to/file.pdf"
}
}take_screenshot
Capture a screenshot
{
"tool": "take_screenshot",
"parameters": {
"outputPath": "/path/to/screenshot.png"
}
}🎨 Examples
Example 1: Basic Web Automation
// Start browser and navigate
await startBrowser({ browser: "chrome", options: { headless: false }});
await navigate({ url: "https://example.com" });
// Find and interact with elements
await clickElement({ by: "css", value: ".login-button" });
await sendKeys({ by: "id", value: "username", text: "testuser" });
await sendKeys({ by: "id", value: "password", text: "password123" });
await clickElement({ by: "css", value: ".submit-btn" });
// Take a screenshot
await takeScreenshot({ outputPath: "./login-success.png" });Example 2: Form Automation with Alerts
// Fill out a contact form
await sendKeys({ by: "name", value: "email", text: "[email protected]" });
await sendKeys({ by: "name", value: "message", text: "Hello World!" });
await clickElement({ by: "css", value: ".send-button" });
// Handle confirmation alert
const alertText = await getAlertText();
console.log("Alert says:", alertText);
await acceptAlert();Example 3: E-commerce Automation
// Product search and selection
await sendKeys({ by: "css", value: ".search-input", text: "laptop" });
await pressKey({ key: "Enter" });
// Hover over product to see details
await hover({ by: "css", value: ".product-item:first-child" });
// Add to cart
await clickElement({ by: "css", value: ".add-to-cart" });
// Drag product to wishlist
await dragAndDrop({
by: "css", value: ".product-item",
targetBy: "css", targetValue: ".wishlist-area"
});🔧 Configuration
Browser Options
| Option | Type | Description | Example |
|--------|------|-------------|---------|
| headless | boolean | Run browser without GUI | true |
| arguments | string[] | Custom browser arguments | ["--no-sandbox"] |
Locator Strategies
| Strategy | Description | Example |
|----------|-------------|---------|
| id | Find by element ID | "submit-button" |
| css | Find by CSS selector | ".btn.primary" |
| xpath | Find by XPath expression | "//button[@type='submit']" |
| name | Find by name attribute | "username" |
| tag | Find by tag name | "button" |
| class | Find by class name | "submit-btn" |
🤝 Contributing
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone the repository
git clone https://github.com/dhivakaranthonydoss/mcp-selenium.git
# Install dependencies
npm install
# Start development
npm run start📄 License
This project is licensed under the ISC License - see the LICENSE file for details.
👨💻 Author
Dhivakaran Anthony Doss
- GitHub: @dhivakaranthonydoss
- Package: @dhivakaranthonydoss/mcp-selenium
🙏 Acknowledgments
- Built on the Model Context Protocol (MCP)
- Powered by Selenium WebDriver
- Inspired by the need for AI-driven browser automation
⭐ Star this repo if you find it useful! ⭐
