sentinel-mcp
v1.4.5
Published
MCP-to-browser bridge library for AI-driven test automation
Maintainers
Readme
Sentinel MCP
MCP-to-browser bridge library for AI-driven test automation
Sentinel MCP is a powerful browser automation library that bridges the Model Context Protocol (MCP) with Playwright, enabling AI agents to interact with web applications through 57 specialized tools. Built with TypeScript and featuring intelligent element detection, visual feedback, and automatic Playwright code generation.
✨ Features
- 57 MCP Tools - Comprehensive browser control across 7 categories
- Smart Element Detection - 15-stage algorithm with compound components, shadow DOM, and scroll tracking
- Advanced Element Detection - Optimized constants, detection heuristics, and filtering algorithms
- Visual Feedback - Annotated screenshots with element highlighting (Retina/HiDPI display support)
- Scroll Indicators - Page and element scroll positions in serialized output
- Code Generation - Automatic Playwright code generation for every action
- CDP Integration - Chrome DevTools Protocol for reliable browser communication
- TypeScript First - Full type safety with comprehensive type definitions
- Security Built-in - Input sanitization and XSS protection
- Test Ready - 177 unit tests, 130 integration tests
📦 Installation
npm install sentinel-mcpRequirements
- Node.js >= 18.0.0
- Playwright browsers (installed automatically)
🚀 Quick Start
import { BrowserOrchestrator } from 'sentinel-mcp';
// Initialize the orchestrator
const orchestrator = new BrowserOrchestrator();
await orchestrator.initialize();
// Navigate to a website
await orchestrator.navigate('https://example.com');
// Take a snapshot with element detection
const snapshot = await orchestrator.takeSnapshot({
includeScreenshot: true,
highlightElements: true
});
console.log(`Found ${snapshot.selectorMap.size} interactive elements`);
console.log(snapshot.screenshot); // Base64 screenshot with annotations
// Interact with elements by index (1-based)
await orchestrator.executeTool('fill', {
index: 1,
text: '[email protected]'
});
await orchestrator.executeTool('click', {
index: 2
});
// Get generated Playwright code
console.log(snapshot.code); // Ready-to-use Playwright test code
// Clean up
await orchestrator.shutdown();📚 API Documentation
Core Classes
BrowserOrchestrator
Main orchestration class for browser automation.
const orchestrator = new BrowserOrchestrator();
// Initialize browser
await orchestrator.initialize(options?: BrowserOptions);
// Navigate to URL
await orchestrator.navigate(url: string): Promise<void>;
// Take DOM snapshot
await orchestrator.takeSnapshot(options?: SnapshotOptions): Promise<DOMSnapshot>;
// Execute MCP tool
await orchestrator.executeTool(name: string, args: unknown): Promise<ToolResult>;
// Get tool registry
orchestrator.getToolRegistry(): ToolRegistry;
// Get page registry
orchestrator.getPageRegistry(): PageRegistry;
// Shutdown browser
await orchestrator.shutdown(): Promise<void>;ToolRegistry
Manage and execute tools dynamically.
const registry = new ToolRegistry();
// Get all tools
registry.getAllTools(): ToolDefinition[];
// Get tool by name
registry.getTool(name: string): ToolDefinition | undefined;
// Get all tool names
registry.getToolNames(): string[];
// Execute tool
await registry.executeTool(name: string, args: unknown, context: ToolExecutionContext): Promise<ToolResult>;ToolResponse
Format tool responses with code generation.
const response = new ToolResponse();
// Add text output
response.appendLine('Operation successful');
// Add generated code
response.addCode('await page.click("#button")');
// Set metadata
response.setMetadata('elementCount', 5);
// Format response
await response.format(): Promise<ToolResult>;🛠️ Tools
Sentinel MCP provides 57 tools across 7 categories:
Actions (9 tools)
Navigate and interact with web elements:
- navigate - Navigate to URLs with configurable wait options
- click - Click elements (supports right-click, double-click)
- fill - Fill input fields with optional Enter key
- scroll - Scroll page or elements in any direction
- hover - Hover over elements for tooltips/menus
- rightClick - Trigger context menus
- doubleClick - Double-click elements
- dragAndDrop - Drag and drop between elements
- pressKey - Keyboard input with modifier keys
Page Operations (14 tools)
Control page behavior and state:
- getTitle - Get current page title
- getUrl - Get current page URL
- goBack - Navigate backward in history
- goForward - Navigate forward in history
- reload - Reload current page
- getTabs - List all open tabs
- selectTab - Switch between tabs
- newTab - Open new tab with optional URL
- closeTab - Close current or specified tab
- evaluate - Execute JavaScript in page context (with security validation)
- executePlaywright - Execute Playwright API code with accurate line numbers
- getConsoleLogs - Retrieve browser console logs
- setViewport - Change viewport dimensions
- handleDialog - Configure alert/confirm/prompt handling
Forms (8 tools)
Specialized form interactions:
- selectOption - Select dropdown options (by value, label, or index)
- checkCheckbox - Check checkbox elements
- uncheckCheckbox - Uncheck checkbox elements
- uploadFile - Upload files to file inputs
- clearInput - Clear input field values
- submitForm - Submit forms
- type - Type text with configurable delay for natural input
- focus - Focus on input elements
Assertions (7 tools)
Verify page state and content:
- assertVisible - Assert element is visible
- assertText - Assert element text content
- assertValue - Assert input element value
- assertAttribute - Assert element attribute value
- assertUrl - Assert current URL matches pattern
- assertTitle - Assert page title
- assertExists - Assert element exists in DOM
Inspection (8 tools)
Query and inspect elements:
- getElementText - Get text content of elements
- getElementAttribute - Get element attribute values
- getElementValue - Get input element values
- isVisible - Check if element is visible
- isEnabled - Check if element is enabled
- isChecked - Check checkbox/radio state
- getElements - Query multiple elements
- queryPage - Advanced element queries with filters
Waits (7 tools)
Synchronize with page state:
- waitForElement - Wait for element to appear
- waitForNavigation - Wait for page navigation to complete
- waitForLoadState - Wait for specific load state (load, domcontentloaded, networkidle)
- waitForTimeout - Wait for fixed duration
- waitForFunction - Wait for custom JavaScript condition
- waitForSelector - Wait for CSS selector to match
- waitForUrl - Wait for URL to match pattern
Dialogs (4 tools)
Handle browser dialogs:
- acceptDialog - Accept alerts/confirms/prompts
- dismissDialog - Dismiss/cancel dialogs
- getDialogMessage - Get dialog message text
- typeIntoDialog - Type into prompt dialogs
💡 Usage Examples
Basic Navigation and Interaction
const orchestrator = new BrowserOrchestrator();
await orchestrator.initialize();
// Navigate to a page
await orchestrator.navigate('https://example.com');
// Take snapshot to see available elements
const snapshot = await orchestrator.takeSnapshot({
includeScreenshot: true,
highlightElements: true
});
// Elements are indexed starting from 1
snapshot.selectorMap.forEach((el, idx) => {
console.log(`[${idx}] ${el.tagName} - ${el.meaningfulText || el.attributes.placeholder || el.ariaLabel}`);
});
// Interact with specific element by index
await orchestrator.executeTool('click', { index: 1 });Form Filling Workflow
// Navigate to login page
await orchestrator.navigate('https://app.example.com/login');
// Fill login form
await orchestrator.executeTool('fill', {
index: 1, // Email input
text: '[email protected]'
});
await orchestrator.executeTool('fill', {
index: 2, // Password input
text: 'secretPassword',
pressEnter: true // Submit form after filling
});
// Wait for navigation
await orchestrator.executeTool('waitForNavigation', {
timeout: 5000
});
// Verify login success
await orchestrator.executeTool('assertUrl', {
pattern: '/dashboard'
});Assertions and Testing
// Navigate to page
await orchestrator.navigate('https://example.com/profile');
// Verify page state
await orchestrator.executeTool('assertTitle', {
expected: 'User Profile'
});
await orchestrator.executeTool('assertElementVisible', {
index: 1 // Profile picture element
});
await orchestrator.executeTool('assertText', {
index: 3,
text: 'Welcome back!'
});
// Check input values
const result = await orchestrator.executeTool('getText', {
index: 2
});
console.log('Current value:', result.metadata?.value);Advanced: Execute Custom Playwright Code
// Execute Playwright API code directly
const result = await orchestrator.executeTool('executePlaywright', {
code: `
// Access to 'page' object
const title = await page.title();
const cookies = await page.context().cookies();
// Perform complex operations
await page.evaluate(() => {
localStorage.setItem('theme', 'dark');
});
// Return values
return { title, cookieCount: cookies.length };
`,
timeout: 30000
});
console.log(result.metadata?.result); // { title: '...', cookieCount: 3 }Working with Multiple Tabs
// Open new tab
await orchestrator.executeTool('newTab', {
url: 'https://example.com/page2'
});
// List all tabs
const tabs = await orchestrator.executeTool('getTabs', {});
console.log(tabs.metadata?.tabs);
// Switch to second tab
await orchestrator.executeTool('selectTab', { tabId: tabs[1].id });
// Close current tab
await orchestrator.executeTool('closeTab', {});Handling Dialogs
// Configure dialog handling before triggering
await orchestrator.executeTool('handleDialog', {
action: 'accept',
promptText: 'Yes, I agree' // For prompt dialogs
});
// Trigger action that opens dialog
await orchestrator.executeTool('click', { index: 10 });
// Dialog is automatically handled based on configuration🏗️ Architecture
Element Detection Algorithm
Sentinel MCP uses a 15-stage element detection algorithm:
- CDP DOM Snapshot - Capture full DOM state via
DOMSnapshot.captureSnapshot - Accessibility Integration - Include ARIA roles, labels, and attributes
- DevicePixelRatio Detection - Calculate via
Page.getLayoutMetrics(deviceWidth / cssWidth) - Coordinate Transformation - Convert CDP device pixels → CSS pixels
- Interactivity Detection - Tags, roles, cursors, event handlers (using optimized constant sets)
- Clickable Element Detection - Multi-factor detection using tags, roles, cursors, and attributes
- Compound Component Detection - Identify date pickers, color pickers, range sliders, custom selects
- Shadow DOM Traversal - Process shadow roots and shadow DOM elements
- Bounding Box Filtering - Remove elements 99% contained within propagating parents
- Paint Order Filtering - O(n) RectUnion algorithm to remove occluded elements
- Viewport Filtering - Remove elements outside visible viewport
- Scroll Position Tracking - Capture page and element scroll data
- Deduplication - Remove duplicate selectors
- DOM Order Sorting - Maintain document order
- Index Assignment - Assign stable 1-based indices for tool use
Element Detection Implementation
Sentinel MCP uses optimized element detection with the following approach:
Constants
All detection constants are Sets for O(1) lookup performance:
- INTERACTIVE_TAGS:
button,input,select,textarea,a,details,summary,option,optgroup(excludeslabel) - INTERACTIVE_ROLES:
button,link,menu,menuitem,option,radio,checkbox,tab,textbox,combobox,slider,spinbutton,search,searchbox,listbox - EVENT_HANDLER_ATTRIBUTES:
onclick,onmousedown,onmouseup,onkeydown,onkeyup,tabindex(no touch events) - CLICKABLE_CURSORS: Only
pointer(notgrab,text, etc.)
Paint Order O(n) Algorithm
Uses RectUnion class for efficient occlusion detection:
- Groups elements by paint order (z-index)
- Processes from highest to lowest paint order
- Tracks covered area using union of rectangles
- Only adds opaque elements (opacity >= 0.8, non-transparent background)
- O(n) complexity vs naive O(n²) approach
Bounding Box Filtering
Removes redundant nested elements:
- Filters out children 99% contained within propagating parents (
<a>,<button>) - Reduces snapshot size significantly
- Maintains interactive element hierarchy
Snapshot Caching
Performance optimization with automatic expiration:
- 5-second TTL cache prevents redundant DOM processing for same URL+viewport
- Automatic expiration and deterministic pruning (every 10 snapshots)
- Cache key based on MD5 hash of URL and viewport dimensions
Display Scaling & Coordinate Systems
Proper handling of Retina/HiDPI displays (2x, 3x scale factors):
DevicePixelRatio Calculation
// Via Page.getLayoutMetrics (NOT Performance.getMetrics)
const devicePixelRatio = deviceWidth / cssWidth;
// Example: 2560 / 1280 = 2.0 (Retina display)Coordinate Transformation
- CDP Returns: Bounds in device pixels (e.g., [0, 0, 2560, 1440])
- Parser Converts: Divide by devicePixelRatio → CSS pixels (e.g., [0, 0, 1280, 720])
- Screenshot: Playwright captures at CSS resolution (1280x720)
- Highlights: Draw using CSS pixel coordinates directly (no scaling needed)
This ensures pixel-perfect element highlighting on all display types.
Serialization Features
The serialized DOM output includes:
Element Markers
Interactive elements are marked with [index] in the output:
<button class="submit">[1]Submit</button>
<input type="email" placeholder="Email">[2]Scroll Indicators
Page scroll position at the top:
PAGE_SCROLL: (V:100px/2000px)Scrollable elements show scroll capability:
<div class="scrollable">[5] (scroll: 0px/500px)Shadow DOM
Shadow roots are represented:
<custom-element>
#shadow-root
<button>[3]Click Me</button>
</custom-element>Minimal Output
Only required nodes are serialized:
- Interactive elements themselves
- All ancestor elements (for hierarchy context)
- Bounding box filtered (removes 99% contained children)
- Paint order filtered (removes occluded elements)
Result: ~5KB DOM snapshots instead of full 500KB+ page HTML
Code Generation
Every tool execution generates equivalent Playwright code:
const result = await orchestrator.executeTool('fill', {
index: 1,
text: '[email protected]'
});
console.log(result.code);
// Output:
// await page.locator('#email').fill('[email protected]');The serialized DOM output also includes [index] markers for easy identification:
PAGE_SCROLL: (V:0px/2000px)
<form class="login-form">
<input type="email" id="email">[1]
<input type="password" id="password">[2]
<button type="submit">[3]Login</button>
</form>Store and replay these generated commands for:
- Test automation
- Workflow recording
- Debugging and inspection
- Documentation generation
CDP Integration
Uses Chrome DevTools Protocol for reliable browser communication:
- Session Management - Automatic CDP session creation and recovery
- DOM Snapshots - Fast, accurate DOM state capture
- Event Handling - Real-time browser events
- Console Capture - Intercept console logs
- Network Monitoring - Track network requests (future feature)
⚙️ Configuration
BrowserOptions
interface BrowserOptions {
headless?: boolean; // Default: false
viewport?: {
width: number; // Default: 1280
height: number; // Default: 720
};
slowMo?: number; // Slow down operations (ms)
devtools?: boolean; // Open devtools
timeout?: number; // Default timeout (ms)
userAgent?: string; // Custom user agent
}
await orchestrator.initialize({
headless: false,
viewport: { width: 1920, height: 1080 },
slowMo: 100
});SnapshotOptions
interface SnapshotOptions {
includeScreenshot?: boolean; // Include base64 screenshot
fullPage?: boolean; // Full page vs viewport screenshot
highlightElements?: boolean; // Annotate elements on screenshot
}
const snapshot = await orchestrator.takeSnapshot({
includeScreenshot: true,
fullPage: false,
highlightElements: true
});🧪 Testing
# Run all tests
npm test
# Run unit tests only
npm run test:unit
# Run integration tests only
npm run test:integration
# Run tests in watch mode
npm run test:unit:watch
# Run tests with coverage
npm run test:coverageTest Structure
- Unit Tests (127+) - Test individual components and utilities
- Integration Tests (25+) - Test real browser interactions
- Test Coverage - High coverage across all modules
🔒 Security
Input Sanitization
The evaluate tool sanitizes JavaScript code to prevent:
- Prototype pollution (
__proto__,constructor.prototype) - Constructor access for code execution
- Direct
eval()calls - Unsafe patterns
Safe Code Execution
The executePlaywright tool:
- Executes code in isolated async function scope
- Provides accurate error line numbers for debugging
- Cleans up temporary files automatically
- Enforces execution timeouts
📖 TypeScript Support
Full TypeScript support with comprehensive type definitions:
import {
BrowserOrchestrator,
ToolRegistry,
ToolResponse,
ToolResult,
DOMSnapshot,
ElementData,
BrowserOptions
} from 'sentinel-mcp';🤝 Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Development Setup
# Clone repository
git clone https://github.com/KylixMedusa/sentinel-mcp.git
cd sentinel-mcp
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
# Run linter
npm run lint
# Type checking
npm run typecheck📝 Changelog
See CHANGELOG.md for release history and breaking changes.
📄 License
MIT License - see LICENSE file for details.
🔗 Links
🙏 Acknowledgments
- Playwright - Browser automation framework
- MCP Protocol - Model Context Protocol specification
📊 Project Stats
- 57 Tools across 7 categories
- 15-Stage Detection Algorithm with advanced filtering
- 177 Unit Tests with Vitest
- 130 Integration Tests with Playwright
- TypeScript - Full type safety
- Node.js >= 18.0.0
- Retina/HiDPI Support - 2x, 3x displays
Made with ❤️ for AI-driven browser automation
