selenium-mcp
v1.5.1
Published
A Model Context Protocol (MCP) server that provides advanced screenshot capabilities using Selenium WebDriver. Perfect for AI agents, automated testing, visual regression testing, and content capture workflows.
Readme
Selenium Screenshot Server
A Model Context Protocol (MCP) server that provides advanced screenshot capabilities using Selenium WebDriver. Perfect for AI agents, automated testing, visual regression testing, and content capture workflows.
📋 Issue Tracker - Report bugs, request features, or ask questions
Getting Started (For AI Agents)
Quick Setup in Cursor
- Clone and install the server:
git clone <repository-url>
cd selenium
npm install
npm test # Verify installation- Add to your Cursor MCP configuration:
Create or edit
~/.cursor/mcp.json(macOS/Linux) or%APPDATA%\Cursor\mcp.json(Windows):
{
"mcpServers": {
"selenium-screenshot": {
"command": "node",
"args": ["/path/to/your/selenium/src/server.js"],
"env": {
"NODE_ENV": "production"
}
}
}
}- Restart Cursor and start using the screenshot tool!
Demo Commands
Try these commands in Cursor:
Take a screenshot of https://google.comTake a full page screenshot of https://google.com in desktop viewportTake a screenshot of the Google logo on https://google.comTake a screenshot of https://google.com in mobile viewportWhat You Can Do
- Basic screenshots: "Take a screenshot of [URL]"
- Full page capture: "Take a full page screenshot of [URL]"
- Element-specific: "Take a screenshot of the [element] on [URL]"
- Responsive testing: "Take screenshots of [URL] in mobile, tablet, and desktop viewports"
- Debug elements: "Take a screenshot of the [element] on [URL] with highlighting"
For detailed setup instructions, see CURSOR_SETUP.md.
Features
- Full Page Screenshots: Capture entire page content including areas below the fold
- Element-Specific Screenshots: Target specific DOM elements with CSS selectors
- Multiple Viewport Sizes: Support for mobile, tablet, and desktop presets
- Custom Viewport Dimensions: Flexible viewport sizing for responsive testing
- Wait Conditions: Wait for selectors or custom time periods
- Element Highlighting: Debug mode for element-specific screenshots
- Headless Mode: Configurable browser visibility (default: true for efficiency)
- High-Quality Output: PNG format with configurable quality
- HTML Retrieval: Get page HTML content with structure analysis options
Installation
Prerequisites
- Node.js 18+
- Chrome browser installed
- ChromeDriver (automatically managed by Selenium)
For AI Agents (Recommended)
Follow the Getting Started section above for quick setup in Cursor.
For Direct Usage
# Clone the repository
git clone <repository-url>
cd selenium
# Install dependencies
npm install
# Run tests to verify installation
npm testUsage
Quick Reference for AI Agents
| Command | Description |
| ---------------------------------------------------------- | ------------------------ |
| Take a screenshot of [URL] | Basic screenshot |
| Take a full page screenshot of [URL] | Capture entire page |
| Take a screenshot of [URL] in mobile viewport | Mobile device testing |
| Take a screenshot of the [element] on [URL] | Element-specific capture |
| Take screenshots of [URL] in mobile, tablet, and desktop | Responsive testing |
| Take a screenshot of [URL] with element highlighting | Debug mode |
| Take a screenshot of [URL] with visible browser | Non-headless mode |
| Get the HTML content of [URL] | Basic HTML retrieval |
| Get the HTML structure of [URL] | Structure mode (default) |
| Get the full HTML of [URL] | Complete HTML content |
| Click the [element] on [URL] | Basic element click |
| Click the [element] on [URL] with visible browser | Non-headless click |
| Type [text] into [field] on [URL] | Basic text input |
| Type [text] into [field] on [URL] with visible browser | Non-headless text input |
Basic Screenshot
// Take a basic screenshot of a webpage
const result = await takeScreenshot({
url: 'https://example.com',
viewportPreset: 'desktop',
});Full Page Screenshot
// Capture the entire page including scrollable content
const result = await takeScreenshot({
url: 'https://example.com',
fullPage: true,
viewportPreset: 'desktop',
});Element-Specific Screenshot
// Capture only a specific element
const result = await takeScreenshot({
url: 'https://example.com',
elementSelector: 'h1',
highlightElement: true, // Optional: highlight the element for debugging
});Mobile Viewport
// Test responsive design with mobile viewport
const result = await takeScreenshot({
url: 'https://example.com',
viewportPreset: 'mobile',
fullPage: true,
});Custom Viewport with Wait Conditions
// Custom viewport with wait conditions
const result = await takeScreenshot({
url: 'https://example.com',
viewportPreset: 'custom',
width: 1200,
height: 800,
waitForSelector: '.content-loaded',
waitTime: 2000,
userInteractionTime: 3000,
});Headless Mode Configuration
// Default: headless mode (efficient, no visible browser)
const result = await takeScreenshot({
url: 'https://example.com',
headless: true, // default
});
// Non-headless mode (visible browser for debugging)
const result = await takeScreenshot({
url: 'https://example.com',
headless: false, // browser will be visible
});HTML Retrieval
🚀 PREFERRED METHOD: Use
getPageHtmlto save HTML to a temp file, then use standard command-line tools for processing.
HTML to File (Recommended)
// Get HTML content and save to temp file
const result = await getPageHtml({
url: 'https://example.com',
mode: 'structure', // or 'full'
});
console.log(result.filePath); // e.g., /tmp/page-html-abc123.htmlBenefits:
- File-based approach - LLM can use grep, sed, awk, etc. for any processing
- No token limits - Content saved to files, process however you want
- Better performance - No large content in responses
- Maximum flexibility - Use any command-line tool to filter/analyze HTML
With Wait Conditions
// Wait for specific element before getting HTML
const result = await getPageHtml({
url: 'https://example.com',
waitForSelector: '.content-loaded',
waitTime: 2000,
});Non-Headless Mode
const result = await getPageHtml({
url: 'https://example.com',
headless: false,
});Click Element
Basic Click
// Click an element on a webpage
const result = await clickElement({
url: 'https://example.com',
selector: '#submit-button',
});With Wait Conditions
// Wait for element to be present before clicking
const result = await clickElement({
url: 'https://example.com',
selector: '#submit-button',
waitForSelector: '#form-loaded',
waitTime: 2000,
});Non-Headless Mode
// Click with visible browser for debugging
const result = await clickElement({
url: 'https://example.com',
selector: '#submit-button',
headless: false,
});Type Text
Basic Text Input
// Type text into an input field
const result = await typeText({
url: 'https://example.com',
selector: 'input[name="username"]',
text: '[email protected]',
});With Clear First
// Clear field before typing (default behavior)
const result = await typeText({
url: 'https://example.com',
selector: 'input[name="username"]',
text: '[email protected]',
clearFirst: true, // default
});Without Clearing
// Type without clearing existing text
const result = await typeText({
url: 'https://example.com',
selector: 'input[name="username"]',
text: ' @example.com',
clearFirst: false,
});With Wait Conditions
// Wait for element to be present before typing
const result = await typeText({
url: 'https://example.com',
selector: 'input[name="username"]',
text: '[email protected]',
waitForSelector: '#login-form',
waitTime: 1000,
});Non-Headless Mode
// Type with visible browser for debugging
const result = await typeText({
url: 'https://example.com',
selector: 'input[name="username"]',
text: '[email protected]',
headless: false,
});API Reference
Parameters
| Parameter | Type | Default | Description |
| --------------------- | ------- | ------------ | --------------------------------------------------------------------- |
| url | string | required | URL of the page to screenshot |
| viewportPreset | string | 'desktop' | Viewport size preset: 'mobile', 'tablet', 'desktop', 'custom' |
| width | number | 1920 | Custom viewport width (used with viewportPreset: 'custom') |
| height | number | 1080 | Custom viewport height (used with viewportPreset: 'custom') |
| elementSelector | string | - | CSS selector for element-specific screenshot |
| fullPage | boolean | false | Capture full page including scroll |
| waitForSelector | string | - | CSS selector to wait for before screenshot |
| waitTime | number | - | Time to wait after page load (ms) |
| userInteractionTime | number | 5000 | Time to wait for user login/navigation (ms) |
| highlightElement | boolean | false | Highlight target element for debugging |
| headless | boolean | true | Run browser in headless mode (default: true for efficiency) |
HTML Retrieval Parameters
| Parameter | Type | Default | Description |
| ----------------- | -------- | ---------------------------- | --------------------------------------------------------------------- |
| url | string | required | URL of the page to get HTML from |
| mode | string | 'structure' | HTML retrieval mode: 'structure' (clean DOM) or 'full' (complete) |
| stripElements | string[] | ['script', 'svg', 'style'] | Element types to strip from HTML |
| waitForSelector | string | - | CSS selector to wait for before getting HTML |
| waitTime | number | - | Time to wait after page load (ms) |
| headless | boolean | true | Run browser in headless mode (default: true for efficiency) |
Click Element Parameters
| Parameter | Type | Default | Description |
| ----------------- | ------- | ------------ | ----------------------------------------------------------- |
| url | string | required | URL of the page to interact with |
| selector | string | required | CSS selector for the element to click |
| waitForSelector | string | - | CSS selector to wait for before clicking |
| waitTime | number | - | Time to wait after page load (ms) |
| headless | boolean | true | Run browser in headless mode (default: true for efficiency) |
Type Text Parameters
| Parameter | Type | Default | Description |
| ----------------- | ------- | ------------ | ----------------------------------------------------------- |
| url | string | required | URL of the page to interact with |
| selector | string | required | CSS selector for the input field |
| text | string | required | Text to type into the field |
| clearFirst | boolean | true | Clear the field before typing (default: true) |
| waitForSelector | string | - | CSS selector to wait for before typing |
| waitTime | number | - | Time to wait after page load (ms) |
| headless | boolean | true | Run browser in headless mode (default: true for efficiency) |
Viewport Presets
| Preset | Width | Height | Use Case |
| --------- | ------------ | ------------ | ------------------------- |
| mobile | 375 | 667 | Mobile device testing |
| tablet | 768 | 1024 | Tablet device testing |
| desktop | 1920 | 1080 | Desktop testing (default) |
| custom | configurable | configurable | Custom dimensions |
Screenshot Return Format
{
success: true,
mimeType: 'image/png',
data: 'base64-encoded-image-data',
size: 12345 // bytes
}HTML Retrieval Return Format
{
success: true,
mode: 'structure', // or 'full'
data: '<html>...</html>', // plain text HTML
contentLength: 3720, // characters
stripElements: ['script', 'svg', 'style']
}Click Element Return Format
// Success
{
success: true
}
// Error
{
success: false,
error: 'Element not found: Check if selector \'#submit\' is correct. The element may not exist on the page.',
userMessage: 'Element not found: Check if selector \'#submit\' is correct. The element may not exist on the page.'
}Type Text Return Format
// Success
{
success: true
}
// Error
{
success: false,
error: 'Input field not found: Check if selector \'#username\' is correct. The field may not exist on the page.',
userMessage: 'Input field not found: Check if selector \'#username\' is correct. The field may not exist on the page.'
}MCP Server Usage
Starting the Server
# Run the MCP server
node src/server.jsSetting Up in Cursor
For detailed instructions on integrating this server with Cursor, see CURSOR_SETUP.md.
Quick Setup Example:
{
"mcpServers": {
"selenium-screenshot": {
"command": "node",
"args": ["/path/to/selenium/src/server.js"],
"env": {
"NODE_ENV": "production"
}
}
}
}MCP Tool Registration
The server registers a take_screenshot tool with the following schema:
{
"name": "take_screenshot",
"description": "Take a screenshot of a web page with advanced options",
"inputSchema": {
"type": "object",
"properties": {
"url": { "type": "string", "description": "URL to screenshot" },
"viewportPreset": {
"type": "string",
"enum": ["mobile", "tablet", "desktop", "custom"],
"default": "desktop"
},
"elementSelector": {
"type": "string",
"description": "CSS selector for element-specific screenshot"
},
"fullPage": {
"type": "boolean",
"default": false,
"description": "Capture full page including scroll"
},
"headless": {
"type": "boolean",
"default": true,
"description": "Run browser in headless mode (default: true for efficiency)"
}
},
"required": ["url"]
}
}Testing
Run All Tests
npm testRun Tests with Coverage
npm run test:coverageTest Categories
- Unit Tests: Core functionality and edge cases
- Integration Tests: Real website testing
- Viewport Tests: Responsive design validation
- Element Tests: Element-specific screenshot functionality
- Full Page Tests: Scroll capture and stitching
Development
Project Structure
selenium/
├── src/
│ ├── server.js # MCP server with DI pattern
│ ├── logger.js # Logging utilities
│ └── tools/
│ └── screenshot.js # Core screenshot functionality
├── test/ # Test files
├── screenshots/ # Generated screenshots
└── docs/ # DocumentationDependency Injection Pattern
This app uses the getDeps pattern for dependency injection. New code should follow this pattern:
- Define a
getDepsfunction that returns all dependencies mainshould accept a_getDepsargument (defaulting togetDeps)- This enables easy testing and swapping of dependencies
Adding New Features
- Follow the getDeps pattern for dependency injection
- Add comprehensive tests for new functionality
- Update documentation with usage examples
- Ensure backward compatibility
Error Handling
The server provides clear error messages for common scenarios:
- Element not found: Returns error when CSS selector doesn't match
- Page load timeout: Handles slow-loading pages gracefully
- Invalid URLs: Validates URL format before processing
- Browser errors: Captures and reports WebDriver errors
Troubleshooting
MCP Logs in Cursor
When using this server with Cursor, you can view detailed logs to troubleshoot issues:
- Open Debug Console: In Cursor, go to
View→Debug Console(or pressCtrl+Shift+Y/Cmd+Shift+Y) - Look for MCP Logs: The server logs will appear in the Debug Console with timestamps
- Common Log Messages:
[INFO] Starting screenshot capture- Server is processing your request[DEBUG] Headless mode enabled/disabled- Shows browser visibility setting[ERROR] Screenshot capture failed- Indicates what went wrong[DEBUG] WebDriver initialized/closed- Browser lifecycle events
Common Issues
Headless Mode Problems: If headless mode fails in your environment:
- Try setting
headless: falseto see the browser window - Check if Chrome is installed and accessible
- Some CI/CD environments may not support headless mode
Timeout Issues: If screenshots are timing out:
- Increase
userInteractionTimefor slow-loading pages - Use
waitForSelectorto wait for specific content - Check your internet connection
Element Not Found: If element-specific screenshots fail:
- Verify the CSS selector is correct
- Use browser dev tools to test the selector
- Try
highlightElement: trueto debug element location
Performance Considerations
- Timeout: 15-second timeout for all page operations
- Memory: Optimized for large screenshots
- Concurrency: Single browser instance (no concurrent requests)
- Caching: No built-in caching (planned for future versions)
Alpha Usage Guidelines
What's Ready for Production
✅ Core Functionality
- Basic screenshots with viewport control
- Full page screenshot capture
- Element-specific screenshots
- Multiple viewport presets
- Wait conditions and timeouts
- Error handling and logging
✅ Testing & Quality
- Comprehensive test coverage (71% overall)
- Real-world integration testing
- Error scenario validation
- Performance testing
✅ Documentation
- Complete API reference
- Usage examples
- Installation instructions
- Development guidelines
Known Limitations
⚠️ Alpha Limitations
- Single browser instance (no concurrent requests)
- No built-in caching or browser pooling
- Limited to Chrome browser
- No PDF or video output (planned for Phase 2, Step 3)
Recommended Usage Patterns
- Start Simple: Begin with basic screenshots before using advanced features
- Test Responsively: Use viewport presets to test different device sizes
- Handle Errors: Implement proper error handling for production use
- Monitor Performance: Watch for timeout issues with complex pages
- Validate Output: Always verify screenshot quality and content
Roadmap
Phase 2, Step 2: Performance Optimizations
- Browser pooling for concurrent requests
- Caching mechanisms
- Performance monitoring
Phase 2, Step 3: Advanced Features
- PDF generation
- Video capture
- Batch processing
Phase 3: Production Readiness
- Configuration management
- Monitoring and observability
- Deployment automation
Contributing
- Follow the getDeps pattern for dependency injection
- Add tests for new functionality
- Update documentation
- Ensure all tests pass before submitting
Changelog
Recent Changes
- Updated demo URLs from Apple Music to example.com for better reliability
- Added console.log mocking in Jest setup to reduce test verbosity
- Removed legacy HTML mode documentation sections
- Improved README structure and formatting
Version History
- v1.0.0 - Initial release with core Selenium MCP functionality
- v1.1.0 - Added filtered HTML retrieval capabilities
- v1.2.0 - Enhanced browser pool management and error handling
License
[Add your license information here]
