selenium-mcp

v1.5.1

Published

8 months ago

A Model Context Protocol (MCP) server that provides advanced screenshot capabilities using Selenium WebDriver. Perfect for AI agents, automated testing, visual regression testing, and content capture workflows.

0High
0Medium
0Low

grahampheath

Selenium Screenshot Server

📋 Issue Tracker - Report bugs, request features, or ask questions

Getting Started (For AI Agents)

Quick Setup in Cursor

Clone and install the server:

git clone <repository-url>
cd selenium
npm install
npm test  # Verify installation

Add to your Cursor MCP configuration: Create or edit ~/.cursor/mcp.json (macOS/Linux) or %APPDATA%\Cursor\mcp.json (Windows):

{
  "mcpServers": {
    "selenium-screenshot": {
      "command": "node",
      "args": ["/path/to/your/selenium/src/server.js"],
      "env": {
        "NODE_ENV": "production"
      }
    }
  }
}

Restart Cursor and start using the screenshot tool!

Demo Commands

Try these commands in Cursor:

Take a screenshot of https://google.com

Take a full page screenshot of https://google.com in desktop viewport

Take a screenshot of the Google logo on https://google.com

Take a screenshot of https://google.com in mobile viewport

What You Can Do

Basic screenshots: "Take a screenshot of [URL]"
Full page capture: "Take a full page screenshot of [URL]"
Element-specific: "Take a screenshot of the [element] on [URL]"
Responsive testing: "Take screenshots of [URL] in mobile, tablet, and desktop viewports"
Debug elements: "Take a screenshot of the [element] on [URL] with highlighting"

For detailed setup instructions, see CURSOR_SETUP.md.

Features

Full Page Screenshots: Capture entire page content including areas below the fold
Element-Specific Screenshots: Target specific DOM elements with CSS selectors
Multiple Viewport Sizes: Support for mobile, tablet, and desktop presets
Custom Viewport Dimensions: Flexible viewport sizing for responsive testing
Wait Conditions: Wait for selectors or custom time periods
Element Highlighting: Debug mode for element-specific screenshots
Headless Mode: Configurable browser visibility (default: true for efficiency)
High-Quality Output: PNG format with configurable quality
HTML Retrieval: Get page HTML content with structure analysis options

Installation

Prerequisites

Node.js 18+
Chrome browser installed
ChromeDriver (automatically managed by Selenium)

For AI Agents (Recommended)

Follow the Getting Started section above for quick setup in Cursor.

For Direct Usage

# Clone the repository
git clone <repository-url>
cd selenium

# Install dependencies
npm install

# Run tests to verify installation
npm test

Usage

Quick Reference for AI Agents

| Command | Description | | ---------------------------------------------------------- | ------------------------ | | Take a screenshot of [URL] | Basic screenshot | | Take a full page screenshot of [URL] | Capture entire page | | Take a screenshot of [URL] in mobile viewport | Mobile device testing | | Take a screenshot of the [element] on [URL] | Element-specific capture | | Take screenshots of [URL] in mobile, tablet, and desktop | Responsive testing | | Take a screenshot of [URL] with element highlighting | Debug mode | | Take a screenshot of [URL] with visible browser | Non-headless mode | | Get the HTML content of [URL] | Basic HTML retrieval | | Get the HTML structure of [URL] | Structure mode (default) | | Get the full HTML of [URL] | Complete HTML content | | Click the [element] on [URL] | Basic element click | | Click the [element] on [URL] with visible browser | Non-headless click | | Type [text] into [field] on [URL] | Basic text input | | Type [text] into [field] on [URL] with visible browser | Non-headless text input |

Basic Screenshot

// Take a basic screenshot of a webpage
const result = await takeScreenshot({
  url: 'https://example.com',
  viewportPreset: 'desktop',
});

Full Page Screenshot

// Capture the entire page including scrollable content
const result = await takeScreenshot({
  url: 'https://example.com',
  fullPage: true,
  viewportPreset: 'desktop',
});

Element-Specific Screenshot

// Capture only a specific element
const result = await takeScreenshot({
  url: 'https://example.com',
  elementSelector: 'h1',
  highlightElement: true, // Optional: highlight the element for debugging
});

Mobile Viewport

// Test responsive design with mobile viewport
const result = await takeScreenshot({
  url: 'https://example.com',
  viewportPreset: 'mobile',
  fullPage: true,
});

Custom Viewport with Wait Conditions

// Custom viewport with wait conditions
const result = await takeScreenshot({
  url: 'https://example.com',
  viewportPreset: 'custom',
  width: 1200,
  height: 800,
  waitForSelector: '.content-loaded',
  waitTime: 2000,
  userInteractionTime: 3000,
});

Headless Mode Configuration

// Default: headless mode (efficient, no visible browser)
const result = await takeScreenshot({
  url: 'https://example.com',
  headless: true, // default
});

// Non-headless mode (visible browser for debugging)
const result = await takeScreenshot({
  url: 'https://example.com',
  headless: false, // browser will be visible
});

HTML Retrieval

🚀 PREFERRED METHOD: Use getPageHtml to save HTML to a temp file, then use standard command-line tools for processing.

HTML to File (Recommended)

// Get HTML content and save to temp file
const result = await getPageHtml({
  url: 'https://example.com',
  mode: 'structure', // or 'full'
});

console.log(result.filePath); // e.g., /tmp/page-html-abc123.html

Benefits:

File-based approach - LLM can use grep, sed, awk, etc. for any processing
No token limits - Content saved to files, process however you want
Better performance - No large content in responses
Maximum flexibility - Use any command-line tool to filter/analyze HTML

With Wait Conditions

// Wait for specific element before getting HTML
const result = await getPageHtml({
  url: 'https://example.com',
  waitForSelector: '.content-loaded',
  waitTime: 2000,
});

Non-Headless Mode

const result = await getPageHtml({
  url: 'https://example.com',
  headless: false,
});

Click Element

Basic Click

// Click an element on a webpage
const result = await clickElement({
  url: 'https://example.com',
  selector: '#submit-button',
});

With Wait Conditions

// Wait for element to be present before clicking
const result = await clickElement({
  url: 'https://example.com',
  selector: '#submit-button',
  waitForSelector: '#form-loaded',
  waitTime: 2000,
});

Non-Headless Mode

// Click with visible browser for debugging
const result = await clickElement({
  url: 'https://example.com',
  selector: '#submit-button',
  headless: false,
});

Type Text

Basic Text Input

// Type text into an input field
const result = await typeText({
  url: 'https://example.com',
  selector: 'input[name="username"]',
  text: '[email protected]',
});

With Clear First

// Clear field before typing (default behavior)
const result = await typeText({
  url: 'https://example.com',
  selector: 'input[name="username"]',
  text: '[email protected]',
  clearFirst: true, // default
});

Without Clearing

// Type without clearing existing text
const result = await typeText({
  url: 'https://example.com',
  selector: 'input[name="username"]',
  text: ' @example.com',
  clearFirst: false,
});

With Wait Conditions

// Wait for element to be present before typing
const result = await typeText({
  url: 'https://example.com',
  selector: 'input[name="username"]',
  text: '[email protected]',
  waitForSelector: '#login-form',
  waitTime: 1000,
});

Non-Headless Mode

// Type with visible browser for debugging
const result = await typeText({
  url: 'https://example.com',
  selector: 'input[name="username"]',
  text: '[email protected]',
  headless: false,
});

API Reference

Parameters

| Parameter | Type | Default | Description | | --------------------- | ------- | ------------ | --------------------------------------------------------------------- | | url | string | required | URL of the page to screenshot | | viewportPreset | string | 'desktop' | Viewport size preset: 'mobile', 'tablet', 'desktop', 'custom' | | width | number | 1920 | Custom viewport width (used with viewportPreset: 'custom') | | height | number | 1080 | Custom viewport height (used with viewportPreset: 'custom') | | elementSelector | string | - | CSS selector for element-specific screenshot | | fullPage | boolean | false | Capture full page including scroll | | waitForSelector | string | - | CSS selector to wait for before screenshot | | waitTime | number | - | Time to wait after page load (ms) | | userInteractionTime | number | 5000 | Time to wait for user login/navigation (ms) | | highlightElement | boolean | false | Highlight target element for debugging | | headless | boolean | true | Run browser in headless mode (default: true for efficiency) |

HTML Retrieval Parameters

| Parameter | Type | Default | Description | | ----------------- | -------- | ---------------------------- | --------------------------------------------------------------------- | | url | string | required | URL of the page to get HTML from | | mode | string | 'structure' | HTML retrieval mode: 'structure' (clean DOM) or 'full' (complete) | | stripElements | string[] | ['script', 'svg', 'style'] | Element types to strip from HTML | | waitForSelector | string | - | CSS selector to wait for before getting HTML | | waitTime | number | - | Time to wait after page load (ms) | | headless | boolean | true | Run browser in headless mode (default: true for efficiency) |

Click Element Parameters

| Parameter | Type | Default | Description | | ----------------- | ------- | ------------ | ----------------------------------------------------------- | | url | string | required | URL of the page to interact with | | selector | string | required | CSS selector for the element to click | | waitForSelector | string | - | CSS selector to wait for before clicking | | waitTime | number | - | Time to wait after page load (ms) | | headless | boolean | true | Run browser in headless mode (default: true for efficiency) |

Type Text Parameters

| Parameter | Type | Default | Description | | ----------------- | ------- | ------------ | ----------------------------------------------------------- | | url | string | required | URL of the page to interact with | | selector | string | required | CSS selector for the input field | | text | string | required | Text to type into the field | | clearFirst | boolean | true | Clear the field before typing (default: true) | | waitForSelector | string | - | CSS selector to wait for before typing | | waitTime | number | - | Time to wait after page load (ms) | | headless | boolean | true | Run browser in headless mode (default: true for efficiency) |

Viewport Presets

| Preset | Width | Height | Use Case | | --------- | ------------ | ------------ | ------------------------- | | mobile | 375 | 667 | Mobile device testing | | tablet | 768 | 1024 | Tablet device testing | | desktop | 1920 | 1080 | Desktop testing (default) | | custom | configurable | configurable | Custom dimensions |

Screenshot Return Format

{
  success: true,
  mimeType: 'image/png',
  data: 'base64-encoded-image-data',
  size: 12345 // bytes
}

HTML Retrieval Return Format

{
  success: true,
  mode: 'structure', // or 'full'
  data: '<html>...</html>', // plain text HTML
  contentLength: 3720, // characters
  stripElements: ['script', 'svg', 'style']
}

Click Element Return Format

// Success
{
  success: true
}

// Error
{
  success: false,
  error: 'Element not found: Check if selector \'#submit\' is correct. The element may not exist on the page.',
  userMessage: 'Element not found: Check if selector \'#submit\' is correct. The element may not exist on the page.'
}

Type Text Return Format

// Success
{
  success: true
}

// Error
{
  success: false,
  error: 'Input field not found: Check if selector \'#username\' is correct. The field may not exist on the page.',
  userMessage: 'Input field not found: Check if selector \'#username\' is correct. The field may not exist on the page.'
}

MCP Server Usage

Starting the Server

# Run the MCP server
node src/server.js

Setting Up in Cursor

For detailed instructions on integrating this server with Cursor, see CURSOR_SETUP.md.

Quick Setup Example:

{
  "mcpServers": {
    "selenium-screenshot": {
      "command": "node",
      "args": ["/path/to/selenium/src/server.js"],
      "env": {
        "NODE_ENV": "production"
      }
    }
  }
}

MCP Tool Registration

The server registers a take_screenshot tool with the following schema:

{
  "name": "take_screenshot",
  "description": "Take a screenshot of a web page with advanced options",
  "inputSchema": {
    "type": "object",
    "properties": {
      "url": { "type": "string", "description": "URL to screenshot" },
      "viewportPreset": {
        "type": "string",
        "enum": ["mobile", "tablet", "desktop", "custom"],
        "default": "desktop"
      },
      "elementSelector": {
        "type": "string",
        "description": "CSS selector for element-specific screenshot"
      },
      "fullPage": {
        "type": "boolean",
        "default": false,
        "description": "Capture full page including scroll"
      },
      "headless": {
        "type": "boolean",
        "default": true,
        "description": "Run browser in headless mode (default: true for efficiency)"
      }
    },
    "required": ["url"]
  }
}

Testing

Run All Tests

npm test

Run Tests with Coverage

npm run test:coverage

Test Categories

Unit Tests: Core functionality and edge cases
Integration Tests: Real website testing
Viewport Tests: Responsive design validation
Element Tests: Element-specific screenshot functionality
Full Page Tests: Scroll capture and stitching

Development

Project Structure

selenium/
├── src/
│   ├── server.js          # MCP server with DI pattern
│   ├── logger.js          # Logging utilities
│   └── tools/
│       └── screenshot.js  # Core screenshot functionality
├── test/                  # Test files
├── screenshots/           # Generated screenshots
└── docs/                  # Documentation

Dependency Injection Pattern

This app uses the getDeps pattern for dependency injection. New code should follow this pattern:

Define a getDeps function that returns all dependencies
main should accept a _getDeps argument (defaulting to getDeps)
This enables easy testing and swapping of dependencies

Adding New Features

Follow the getDeps pattern for dependency injection
Add comprehensive tests for new functionality
Update documentation with usage examples
Ensure backward compatibility

Error Handling

The server provides clear error messages for common scenarios:

Element not found: Returns error when CSS selector doesn't match
Page load timeout: Handles slow-loading pages gracefully
Invalid URLs: Validates URL format before processing
Browser errors: Captures and reports WebDriver errors

Troubleshooting

MCP Logs in Cursor

When using this server with Cursor, you can view detailed logs to troubleshoot issues:

Open Debug Console: In Cursor, go to View → Debug Console (or press Ctrl+Shift+Y / Cmd+Shift+Y)
Look for MCP Logs: The server logs will appear in the Debug Console with timestamps
Common Log Messages:
- [INFO] Starting screenshot capture - Server is processing your request
- [DEBUG] Headless mode enabled/disabled - Shows browser visibility setting
- [ERROR] Screenshot capture failed - Indicates what went wrong
- [DEBUG] WebDriver initialized/closed - Browser lifecycle events

Common Issues

Headless Mode Problems: If headless mode fails in your environment:

Try setting headless: false to see the browser window
Check if Chrome is installed and accessible
Some CI/CD environments may not support headless mode

Timeout Issues: If screenshots are timing out:

Increase userInteractionTime for slow-loading pages
Use waitForSelector to wait for specific content
Check your internet connection

Element Not Found: If element-specific screenshots fail:

Verify the CSS selector is correct
Use browser dev tools to test the selector
Try highlightElement: true to debug element location

Performance Considerations

Timeout: 15-second timeout for all page operations
Memory: Optimized for large screenshots
Concurrency: Single browser instance (no concurrent requests)
Caching: No built-in caching (planned for future versions)

Alpha Usage Guidelines

What's Ready for Production

✅ Core Functionality

Basic screenshots with viewport control
Full page screenshot capture
Element-specific screenshots
Multiple viewport presets
Wait conditions and timeouts
Error handling and logging

✅ Testing & Quality

Comprehensive test coverage (71% overall)
Real-world integration testing
Error scenario validation
Performance testing

✅ Documentation

Complete API reference
Usage examples
Installation instructions
Development guidelines

Known Limitations

⚠️ Alpha Limitations

Single browser instance (no concurrent requests)
No built-in caching or browser pooling
Limited to Chrome browser
No PDF or video output (planned for Phase 2, Step 3)

Recommended Usage Patterns

Start Simple: Begin with basic screenshots before using advanced features
Test Responsively: Use viewport presets to test different device sizes
Handle Errors: Implement proper error handling for production use
Monitor Performance: Watch for timeout issues with complex pages
Validate Output: Always verify screenshot quality and content

Roadmap

Phase 2, Step 2: Performance Optimizations

Browser pooling for concurrent requests
Caching mechanisms
Performance monitoring

Phase 2, Step 3: Advanced Features

PDF generation
Video capture
Batch processing

Phase 3: Production Readiness

Configuration management
Monitoring and observability
Deployment automation

Contributing

Follow the getDeps pattern for dependency injection
Add tests for new functionality
Update documentation
Ensure all tests pass before submitting

Changelog

Recent Changes

Updated demo URLs from Apple Music to example.com for better reliability
Added console.log mocking in Jest setup to reduce test verbosity
Removed legacy HTML mode documentation sections
Improved README structure and formatting

Version History

v1.0.0 - Initial release with core Selenium MCP functionality
v1.1.0 - Added filtered HTML retrieval capabilities
v1.2.0 - Enhanced browser pool management and error handling

License

[Add your license information here]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Selenium Screenshot Server

Getting Started (For AI Agents)

Quick Setup in Cursor

Demo Commands

What You Can Do

Features

Installation

Prerequisites

For AI Agents (Recommended)

For Direct Usage

Usage

Quick Reference for AI Agents

Basic Screenshot

Full Page Screenshot

Element-Specific Screenshot

Mobile Viewport

Custom Viewport with Wait Conditions

Headless Mode Configuration

HTML Retrieval

HTML to File (Recommended)

With Wait Conditions

Non-Headless Mode

Click Element

Basic Click

With Wait Conditions

Non-Headless Mode

Type Text

Basic Text Input

With Clear First

Without Clearing

With Wait Conditions

Non-Headless Mode

API Reference

Parameters

HTML Retrieval Parameters

Click Element Parameters

Type Text Parameters

Viewport Presets

Screenshot Return Format

HTML Retrieval Return Format

Click Element Return Format

Type Text Return Format

MCP Server Usage

Starting the Server

Setting Up in Cursor

MCP Tool Registration

Testing

Run All Tests

Run Tests with Coverage

Test Categories

Development

Project Structure

Dependency Injection Pattern

Adding New Features

Error Handling

Troubleshooting

MCP Logs in Cursor

Common Issues

Performance Considerations

Alpha Usage Guidelines

What's Ready for Production

Known Limitations

Recommended Usage Patterns

Roadmap

Phase 2, Step 2: Performance Optimizations

Phase 2, Step 3: Advanced Features

Phase 3: Production Readiness

Contributing

Changelog

Recent Changes

Version History

License