mcp-webdriverio
v1.0.5
Published
A Message Control Protocol (MCP) server implementation for WebdriverIO, enabling remote browser automation through a message-based interface
Downloads
40
Maintainers
Readme
MCP WebdriverIO
A Message Control Protocol (MCP) server implementation for WebdriverIO, enabling remote browser automation through a message-based interface.
Overview
This project implements a Message Control Protocol server that wraps WebdriverIO functionality, allowing browser automation through a standardized message-based interface. It provides a set of tools for browser control, element interaction, and session management.
Features
- Browser session management (start, close)
- Navigation control
- Element interaction (find, click, type, get text)
- Cross-browser support (Chrome, Firefox, Safari)
- Headless mode support
- Session-based architecture
- TypeScript support
Prerequisites
- Node.js (v14 or higher)
- npm or yarn
- Chrome, Firefox, or Safari browser installed
- For Chrome/Firefox: WebDriver installed (ChromeDriver/geckodriver)
Installation
- Clone the repository:
git clone https://github.com/hiroksarker/mcp-webdriverio.git
cd mcp-webdriverio- Install dependencies:
npm installProject Structure
mcp-webdriverio/
├── src/
│ ├── lib/
│ │ ├── server/
│ │ │ ├── tools/
│ │ │ │ ├── browser.ts # Browser control tools
│ │ │ │ ├── elements.ts # Element interaction tools
│ │ │ │ └── navigation.ts # Navigation tools
│ │ │ └── server.ts # MCP server implementation
│ │ └── types.ts # TypeScript type definitions
│ └── index.ts # Main entry point
├── tests/
│ ├── pages/
│ │ └── LoginPage.ts # Page object for login page
│ └── specs/
│ └── example.spec.ts # Example test suite
├── package.json
└── README.mdAvailable Tools
Browser Tools
start_browser: Start a new browser session{ type: 'tool', name: 'start_browser', params: { browserName: 'chrome' | 'firefox' | 'safari', headless?: boolean } }close_browser: Close an existing browser session{ type: 'tool', name: 'close_browser', params: { sessionId: string } }
Navigation Tools
navigate: Navigate to a URL{ type: 'tool', name: 'navigate', params: { sessionId: string, url: string } }get_url: Get current page URL{ type: 'tool', name: 'get_url', params: { sessionId: string } }
Element Tools
find_element: Find an element on the page{ type: 'tool', name: 'find_element', params: { sessionId: string, by: 'css selector' | 'xpath' | 'id', value: string, timeout?: number } }element_action: Perform actions on elements{ type: 'tool', name: 'element_action', params: { sessionId: string, elementId: string, action: 'click' | 'type' | 'clear' | 'submit', value?: string } }getText: Get text content of an element{ type: 'tool', name: 'getText', params: { sessionId: string, elementId: string } }
Running Tests
- Start the MCP server:
npm start- Run the test suite:
npm testExample Usage
Here's a simple example of using the MCP server to automate a login flow:
// Start browser session
const startResponse = await server.mcpServer.handleMessage({
type: 'tool',
name: 'start_browser',
params: {
browserName: 'chrome',
headless: true
}
});
const sessionId = startResponse.content[0].sessionId;
// Navigate to login page
await server.mcpServer.handleMessage({
type: 'tool',
name: 'navigate',
params: {
sessionId,
url: 'https://example.com/login'
}
});
// Find and fill username
const usernameResponse = await server.mcpServer.handleMessage({
type: 'tool',
name: 'find_element',
params: {
sessionId,
by: 'css selector',
value: '#username'
}
});
await server.mcpServer.handleMessage({
type: 'tool',
name: 'element_action',
params: {
sessionId,
elementId: usernameResponse.content[0].elementId,
action: 'type',
value: 'testuser'
}
});
// Close browser session
await server.mcpServer.handleMessage({
type: 'tool',
name: 'close_browser',
params: { sessionId }
});Best Practices
Session Management
- Always close browser sessions after use
- Use unique session IDs for parallel test execution
- Handle session cleanup in error cases
Element Interaction
- Use appropriate selectors (prefer CSS selectors over XPath)
- Add timeouts for dynamic elements
- Implement proper error handling for element not found cases
Page Objects
- Use page objects to encapsulate page-specific logic
- Keep selectors in page objects
- Implement reusable actions in page objects
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- WebdriverIO team for the excellent automation framework
- Selenium WebDriver for the WebDriver protocol
- All contributors who have helped improve this project
