sentinel-mcp

v1.4.6

Published

2 months ago

MCP-to-browser bridge library for AI-driven test automation

0High
0Medium
0Low

mcp model-context-protocol browser-automation playwright test-generation ai-testing web-automation cdp chrome-devtools-protocol testing typescript ai-agents

Sentinel MCP

MCP-to-browser bridge library for AI-driven test automation

Sentinel MCP is a powerful browser automation library that bridges the Model Context Protocol (MCP) with Playwright, enabling AI agents to interact with web applications through 57 specialized tools. Built with TypeScript and featuring intelligent element detection, visual feedback, and automatic Playwright code generation.

✨ Features

57 MCP Tools - Comprehensive browser control across 7 categories
Smart Element Detection - 15-stage algorithm with compound components, shadow DOM, and scroll tracking
Advanced Element Detection - Optimized constants, detection heuristics, and filtering algorithms
Visual Feedback - Annotated screenshots with element highlighting (Retina/HiDPI display support)
Scroll Indicators - Page and element scroll positions in serialized output
Code Generation - Automatic Playwright code generation for every action
CDP Integration - Chrome DevTools Protocol for reliable browser communication
TypeScript First - Full type safety with comprehensive type definitions
Security Built-in - Input sanitization and XSS protection
Test Ready - 177 unit tests, 130 integration tests

📦 Installation

npm install sentinel-mcp

Requirements

Node.js >= 18.0.0
Playwright browsers (installed automatically)

🚀 Quick Start

import { BrowserOrchestrator } from 'sentinel-mcp';

// Initialize the orchestrator
const orchestrator = new BrowserOrchestrator();
await orchestrator.initialize();

// Navigate to a website
await orchestrator.navigate('https://example.com');

// Take a snapshot with element detection
const snapshot = await orchestrator.takeSnapshot({
  includeScreenshot: true,
  highlightElements: true
});

console.log(`Found ${snapshot.selectorMap.size} interactive elements`);
console.log(snapshot.screenshot); // Base64 screenshot with annotations

// Interact with elements by index (1-based)
await orchestrator.executeTool('fill', {
  index: 1,
  text: '[email protected]'
});

await orchestrator.executeTool('click', {
  index: 2
});

// Get generated Playwright code
console.log(snapshot.code); // Ready-to-use Playwright test code

// Clean up
await orchestrator.shutdown();

📚 API Documentation

Core Classes

BrowserOrchestrator

Main orchestration class for browser automation.

const orchestrator = new BrowserOrchestrator();

// Initialize browser
await orchestrator.initialize(options?: BrowserOptions);

// Navigate to URL
await orchestrator.navigate(url: string): Promise<void>;

// Take DOM snapshot
await orchestrator.takeSnapshot(options?: SnapshotOptions): Promise<DOMSnapshot>;

// Execute MCP tool
await orchestrator.executeTool(name: string, args: unknown): Promise<ToolResult>;

// Get tool registry
orchestrator.getToolRegistry(): ToolRegistry;

// Get page registry
orchestrator.getPageRegistry(): PageRegistry;

// Shutdown browser
await orchestrator.shutdown(): Promise<void>;

ToolRegistry

Manage and execute tools dynamically.

const registry = new ToolRegistry();

// Get all tools
registry.getAllTools(): ToolDefinition[];

// Get tool by name
registry.getTool(name: string): ToolDefinition | undefined;

// Get all tool names
registry.getToolNames(): string[];

// Execute tool
await registry.executeTool(name: string, args: unknown, context: ToolExecutionContext): Promise<ToolResult>;

ToolResponse

Format tool responses with code generation.

const response = new ToolResponse();

// Add text output
response.appendLine('Operation successful');

// Add generated code
response.addCode('await page.click("#button")');

// Set metadata
response.setMetadata('elementCount', 5);

// Format response
await response.format(): Promise<ToolResult>;

🛠️ Tools

Sentinel MCP provides 57 tools across 7 categories:

Actions (9 tools)

Navigate and interact with web elements:

navigate - Navigate to URLs with configurable wait options
click - Click elements (supports right-click, double-click)
fill - Fill input fields with optional Enter key
scroll - Scroll page or elements in any direction
hover - Hover over elements for tooltips/menus
rightClick - Trigger context menus
doubleClick - Double-click elements
dragAndDrop - Drag and drop between elements
pressKey - Keyboard input with modifier keys

Page Operations (14 tools)

Control page behavior and state:

getTitle - Get current page title
getUrl - Get current page URL
goBack - Navigate backward in history
goForward - Navigate forward in history
reload - Reload current page
getTabs - List all open tabs
selectTab - Switch between tabs
newTab - Open new tab with optional URL
closeTab - Close current or specified tab
evaluate - Execute JavaScript in page context (with security validation)
executePlaywright - Execute Playwright API code with accurate line numbers
getConsoleLogs - Retrieve browser console logs
setViewport - Change viewport dimensions
handleDialog - Configure alert/confirm/prompt handling

Forms (8 tools)

Specialized form interactions:

selectOption - Select dropdown options (by value, label, or index)
checkCheckbox - Check checkbox elements
uncheckCheckbox - Uncheck checkbox elements
uploadFile - Upload files to file inputs
clearInput - Clear input field values
submitForm - Submit forms
type - Type text with configurable delay for natural input
focus - Focus on input elements

Assertions (7 tools)

Verify page state and content:

assertVisible - Assert element is visible
assertText - Assert element text content
assertValue - Assert input element value
assertAttribute - Assert element attribute value
assertUrl - Assert current URL matches pattern
assertTitle - Assert page title
assertExists - Assert element exists in DOM

Inspection (8 tools)

Query and inspect elements:

getElementText - Get text content of elements
getElementAttribute - Get element attribute values
getElementValue - Get input element values
isVisible - Check if element is visible
isEnabled - Check if element is enabled
isChecked - Check checkbox/radio state
getElements - Query multiple elements
queryPage - Advanced element queries with filters

Waits (7 tools)

Synchronize with page state:

waitForElement - Wait for element to appear
waitForNavigation - Wait for page navigation to complete
waitForLoadState - Wait for specific load state (load, domcontentloaded, networkidle)
waitForTimeout - Wait for fixed duration
waitForFunction - Wait for custom JavaScript condition
waitForSelector - Wait for CSS selector to match
waitForUrl - Wait for URL to match pattern

Dialogs (4 tools)

Handle browser dialogs:

acceptDialog - Accept alerts/confirms/prompts
dismissDialog - Dismiss/cancel dialogs
getDialogMessage - Get dialog message text
typeIntoDialog - Type into prompt dialogs

💡 Usage Examples

Basic Navigation and Interaction

const orchestrator = new BrowserOrchestrator();
await orchestrator.initialize();

// Navigate to a page
await orchestrator.navigate('https://example.com');

// Take snapshot to see available elements
const snapshot = await orchestrator.takeSnapshot({
  includeScreenshot: true,
  highlightElements: true
});

// Elements are indexed starting from 1
snapshot.selectorMap.forEach((el, idx) => {
  console.log(`[${idx}] ${el.tagName} - ${el.meaningfulText || el.attributes.placeholder || el.ariaLabel}`);
});

// Interact with specific element by index
await orchestrator.executeTool('click', { index: 1 });

Form Filling Workflow

// Navigate to login page
await orchestrator.navigate('https://app.example.com/login');

// Fill login form
await orchestrator.executeTool('fill', {
  index: 1, // Email input
  text: '[email protected]'
});

await orchestrator.executeTool('fill', {
  index: 2, // Password input
  text: 'secretPassword',
  pressEnter: true // Submit form after filling
});

// Wait for navigation
await orchestrator.executeTool('waitForNavigation', {
  timeout: 5000
});

// Verify login success
await orchestrator.executeTool('assertUrl', {
  pattern: '/dashboard'
});

Assertions and Testing

// Navigate to page
await orchestrator.navigate('https://example.com/profile');

// Verify page state
await orchestrator.executeTool('assertTitle', {
  expected: 'User Profile'
});

await orchestrator.executeTool('assertElementVisible', {
  index: 1 // Profile picture element
});

await orchestrator.executeTool('assertText', {
  index: 3,
  text: 'Welcome back!'
});

// Check input values
const result = await orchestrator.executeTool('getText', {
  index: 2
});
console.log('Current value:', result.metadata?.value);

Advanced: Execute Custom Playwright Code

// Execute Playwright API code directly
const result = await orchestrator.executeTool('executePlaywright', {
  code: `
    // Access to 'page' object
    const title = await page.title();
    const cookies = await page.context().cookies();

    // Perform complex operations
    await page.evaluate(() => {
      localStorage.setItem('theme', 'dark');
    });

    // Return values
    return { title, cookieCount: cookies.length };
  `,
  timeout: 30000
});

console.log(result.metadata?.result); // { title: '...', cookieCount: 3 }

Working with Multiple Tabs

// Open new tab
await orchestrator.executeTool('newTab', {
  url: 'https://example.com/page2'
});

// List all tabs
const tabs = await orchestrator.executeTool('getTabs', {});
console.log(tabs.metadata?.tabs);

// Switch to second tab
await orchestrator.executeTool('selectTab', { tabId: tabs[1].id });

// Close current tab
await orchestrator.executeTool('closeTab', {});

Handling Dialogs

// Configure dialog handling before triggering
await orchestrator.executeTool('handleDialog', {
  action: 'accept',
  promptText: 'Yes, I agree' // For prompt dialogs
});

// Trigger action that opens dialog
await orchestrator.executeTool('click', { index: 10 });

// Dialog is automatically handled based on configuration

🏗️ Architecture

Element Detection Algorithm

Sentinel MCP uses a 15-stage element detection algorithm:

CDP DOM Snapshot - Capture full DOM state via DOMSnapshot.captureSnapshot
Accessibility Integration - Include ARIA roles, labels, and attributes
DevicePixelRatio Detection - Calculate via Page.getLayoutMetrics (deviceWidth / cssWidth)
Coordinate Transformation - Convert CDP device pixels → CSS pixels
Interactivity Detection - Tags, roles, cursors, event handlers (using optimized constant sets)
Clickable Element Detection - Multi-factor detection using tags, roles, cursors, and attributes
Compound Component Detection - Identify date pickers, color pickers, range sliders, custom selects
Shadow DOM Traversal - Process shadow roots and shadow DOM elements
Bounding Box Filtering - Remove elements 99% contained within propagating parents
Paint Order Filtering - O(n) RectUnion algorithm to remove occluded elements
Viewport Filtering - Remove elements outside visible viewport
Scroll Position Tracking - Capture page and element scroll data
Deduplication - Remove duplicate selectors
DOM Order Sorting - Maintain document order
Index Assignment - Assign stable 1-based indices for tool use

Element Detection Implementation

Sentinel MCP uses optimized element detection with the following approach:

Constants

All detection constants are Sets for O(1) lookup performance:

INTERACTIVE_TAGS: button, input, select, textarea, a, details, summary, option, optgroup (excludes label)
INTERACTIVE_ROLES: button, link, menu, menuitem, option, radio, checkbox, tab, textbox, combobox, slider, spinbutton, search, searchbox, listbox
EVENT_HANDLER_ATTRIBUTES: onclick, onmousedown, onmouseup, onkeydown, onkeyup, tabindex (no touch events)
CLICKABLE_CURSORS: Only pointer (not grab, text, etc.)

Paint Order O(n) Algorithm

Uses RectUnion class for efficient occlusion detection:

Groups elements by paint order (z-index)
Processes from highest to lowest paint order
Tracks covered area using union of rectangles
Only adds opaque elements (opacity >= 0.8, non-transparent background)
O(n) complexity vs naive O(n²) approach

Bounding Box Filtering

Removes redundant nested elements:

Filters out children 99% contained within propagating parents (<a>, <button>)
Reduces snapshot size significantly
Maintains interactive element hierarchy

Snapshot Caching

Performance optimization with automatic expiration:

5-second TTL cache prevents redundant DOM processing for same URL+viewport
Automatic expiration and deterministic pruning (every 10 snapshots)
Cache key based on MD5 hash of URL and viewport dimensions

Display Scaling & Coordinate Systems

Proper handling of Retina/HiDPI displays (2x, 3x scale factors):

DevicePixelRatio Calculation

// Via Page.getLayoutMetrics (NOT Performance.getMetrics)
const devicePixelRatio = deviceWidth / cssWidth;
// Example: 2560 / 1280 = 2.0 (Retina display)

Coordinate Transformation

CDP Returns: Bounds in device pixels (e.g., [0, 0, 2560, 1440])
Parser Converts: Divide by devicePixelRatio → CSS pixels (e.g., [0, 0, 1280, 720])
Screenshot: Playwright captures at CSS resolution (1280x720)
Highlights: Draw using CSS pixel coordinates directly (no scaling needed)

This ensures pixel-perfect element highlighting on all display types.

Serialization Features

The serialized DOM output includes:

Element Markers

Interactive elements are marked with [index] in the output:

<button class="submit">[1]Submit</button>
<input type="email" placeholder="Email">[2]

Scroll Indicators

Page scroll position at the top:

PAGE_SCROLL: (V:100px/2000px)

Scrollable elements show scroll capability:

<div class="scrollable">[5] (scroll: 0px/500px)

Shadow DOM

Shadow roots are represented:

<custom-element>
  #shadow-root
    <button>[3]Click Me</button>
</custom-element>

Minimal Output

Only required nodes are serialized:

Interactive elements themselves
All ancestor elements (for hierarchy context)
Bounding box filtered (removes 99% contained children)
Paint order filtered (removes occluded elements)

Result: ~5KB DOM snapshots instead of full 500KB+ page HTML

Code Generation

Every tool execution generates equivalent Playwright code:

const result = await orchestrator.executeTool('fill', {
  index: 1,
  text: '[email protected]'
});

console.log(result.code);
// Output:
// await page.locator('#email').fill('[email protected]');

The serialized DOM output also includes [index] markers for easy identification:

PAGE_SCROLL: (V:0px/2000px)

<form class="login-form">
  <input type="email" id="email">[1]
  <input type="password" id="password">[2]
  <button type="submit">[3]Login</button>
</form>

Store and replay these generated commands for:

Test automation
Workflow recording
Debugging and inspection
Documentation generation

CDP Integration

Uses Chrome DevTools Protocol for reliable browser communication:

Session Management - Automatic CDP session creation and recovery
DOM Snapshots - Fast, accurate DOM state capture
Event Handling - Real-time browser events
Console Capture - Intercept console logs
Network Monitoring - Track network requests (future feature)

⚙️ Configuration

BrowserOptions

interface BrowserOptions {
  headless?: boolean;          // Default: false
  viewport?: {
    width: number;            // Default: 1280
    height: number;           // Default: 720
  };
  slowMo?: number;            // Slow down operations (ms)
  devtools?: boolean;         // Open devtools
  timeout?: number;           // Default timeout (ms)
  userAgent?: string;         // Custom user agent
}

await orchestrator.initialize({
  headless: false,
  viewport: { width: 1920, height: 1080 },
  slowMo: 100
});

SnapshotOptions

interface SnapshotOptions {
  includeScreenshot?: boolean;    // Include base64 screenshot
  fullPage?: boolean;             // Full page vs viewport screenshot
  highlightElements?: boolean;    // Annotate elements on screenshot
}

const snapshot = await orchestrator.takeSnapshot({
  includeScreenshot: true,
  fullPage: false,
  highlightElements: true
});

🧪 Testing

# Run all tests
npm test

# Run unit tests only
npm run test:unit

# Run integration tests only
npm run test:integration

# Run tests in watch mode
npm run test:unit:watch

# Run tests with coverage
npm run test:coverage

Test Structure

Unit Tests (127+) - Test individual components and utilities
Integration Tests (25+) - Test real browser interactions
Test Coverage - High coverage across all modules

🔒 Security

Input Sanitization

The evaluate tool sanitizes JavaScript code to prevent:

Prototype pollution (__proto__, constructor.prototype)
Constructor access for code execution
Direct eval() calls
Unsafe patterns

Safe Code Execution

The executePlaywright tool:

Executes code in isolated async function scope
Provides accurate error line numbers for debugging
Cleans up temporary files automatically
Enforces execution timeouts

📖 TypeScript Support

Full TypeScript support with comprehensive type definitions:

import {
  BrowserOrchestrator,
  ToolRegistry,
  ToolResponse,
  ToolResult,
  DOMSnapshot,
  ElementData,
  BrowserOptions
} from 'sentinel-mcp';

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/KylixMedusa/sentinel-mcp.git
cd sentinel-mcp

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Run linter
npm run lint

# Type checking
npm run typecheck

📝 Changelog

See CHANGELOG.md for release history and breaking changes.

📄 License

MIT License - see LICENSE file for details.

🔗 Links

🙏 Acknowledgments

Playwright - Browser automation framework
MCP Protocol - Model Context Protocol specification

📊 Project Stats

57 Tools across 7 categories
15-Stage Detection Algorithm with advanced filtering
177 Unit Tests with Vitest
130 Integration Tests with Playwright
TypeScript - Full type safety
Node.js >= 18.0.0
Retina/HiDPI Support - 2x, 3x displays

Made with ❤️ for AI-driven browser automation