mcp-ui-probe

v0.2.0

Published

2 months ago

MCP server for UI testing with headless drivers and intelligent UI recognition

Downloads

0High
0Medium
0Low

colmbyrne

mcp ui-testing automation playwright webdriver appium

UI-Probe: Test Any Website in Plain English

⚠️ Important: LLM API Key Required

UI-Probe uses AI to intelligently understand web pages and forms. An OpenAI or Anthropic API key is required.

Cost: ~$0.01-0.10 per test depending on complexity
Fallback Mode: Set UI_PROBE_FALLBACK_MODE=true for basic Playwright functionality without LLM
Get API Key:
- OpenAI: https://platform.openai.com/api-keys
- Anthropic: https://console.anthropic.com/

Without a valid API key, operations will fail with clear error messages.

The Problem

Website testing is broken. You write hundreds of lines of code that breaks the moment a developer changes a button class from btn-primary to button-primary. Your tests fail not because the app is broken, but because someone moved a div or renamed an ID.

The Solution

UI-Probe is an assistant-first, Claude/MCP-native web app tester.

UI-Probe lets you test websites by describing what you want to do in plain English. No code. No selectors. Just describe it like you'd tell a human.

⚡ Not the First, But a Different Take

Tools like Testim, Mabl, and Rainforest QA already bring codeless/AI testing to market. They’re powerful, but often enterprise-heavy, SaaS-locked, and tuned for QA engineers.

UI-Probe takes a different path:

Assistant-first → Runs natively inside Claude via MCP. You just talk to your assistant and watch it work.
Plain English by default → No scripts, no recorders. Just tell it what you want tested.
Beginner-friendly → PMs, designers, and non-devs can use it right away.
Open-source + lightweight → Clone, run, hack. No vendor lock-in.
From testing → to doing → Today: check your web flows. Tomorrow: actually run them.

From Testing → to Doing

UI-Probe doesn’t stop at testing. The same way you say:

“✅ Test if users can sign up”

…you can also say:

“✅ Actually sign me up for an account”
“✅ Buy a blue shirt from the shop”
“✅ Order me a ham + mustard sandwich from sandwich.com and deliver it”

So what starts as a QA helper can also become your personal web agent — able to test, repeat, and even perform real tasks for you.

UI Probe is not just testing — it's a universal intention layer for the web made simple.

What Makes UI-Probe Different

🚀 Stateful Testing Orchestrator vs Script Generator

Unlike standard Playwright MCP tools that create a new JavaScript file for every test, UI-Probe is a persistent, intelligent testing server that maintains context and learns from interactions.

Standard Playwright MCP:

Creates a new .js file for each test
No memory between tests
You write: await page.click('#submit')
Starts fresh browser each time
Manual selector management

UI-Probe:

Single persistent server managing all tests
Maintains browser context and state
You say: "Sign up as a new user"
Intelligent form understanding
Journey recording and replay

🧠 AI-Powered Intelligence Built-In

🤖 LLM Strategy Engine (requires API key) - Uses GPT-4/Claude to understand UI context and intent
🤖 Workflow Decomposer (requires API key) - Automatically breaks "Order a pizza" into logical steps
🤖 Adaptive Executor (requires API key) - Adjusts strategy when pages behave unexpectedly
🤖 Error Enhancer (requires API key) - Provides intelligent, actionable error messages
🤖 Form Inference Engine (requires API key) - Automatically understands form structure, validation rules, and generates appropriate test data

📼 Journey Recording & Replay System

Complete journey management that no standard Playwright tool offers:

✅ Record once, replay forever (works in fallback mode) - Capture complex workflows and replay them
🤖 Journey Validation (requires API key) - Ensures recorded journeys remain valid
🤖 Journey Analysis (requires API key) - Identifies patterns and improvements
🤖 Journey Discovery (requires API key) - Automatically discovers new test paths
🤖 Smart selector generation (requires API key) - Creates resilient selectors that survive UI changes

🎯 Natural Language Goal Execution

// Standard Playwright approach - you write the code:
const { chromium } = require('playwright');
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('http://example.com');
await page.fill('#email', '[email protected]');
await page.fill('#password', 'password123');
await page.click('button[type="submit"]');

// UI-Probe approach - just describe your goal:
run_flow({ goal: "Sign up as a new user" })

📊 Additional Unique Features

✅ Built-in test playground (works in fallback mode) - Test pages included to try before deploying to your project
✅ Real-time monitoring (works in fallback mode) - Watch tests run with live feedback
✅ Claude-native (works in fallback mode) - Designed specifically for Claude Code CLI, not retrofitted
🤖 Actually works for non-devs (requires API key) - PMs, designers, QA can use it immediately
✅ Open source (works in fallback mode) - No vendor lock-in, customize as needed
🤖 Semantic AI Resolution (requires API key) - Smart hybrid that uses Playwright's semantic selectors first, then falls back to LLM intelligence only when needed
🤖 No Code Required (requires API key) - Unlike raw Playwright MCP where Claude must write test scripts, UI-Probe works immediately without any programming
✅ Deterministic JSON Responses (works in fallback mode) - Every action returns structured JSON that enables conditional logic and automation:

{
  "success": true,
  "data": {
    "clicked": true,
    "selector": "button:has-text(\"Submit Form\")",
    "currentUrl": "http://localhost:8083/test/forms",
    "pageTitle": "Forms Testing - UI-Probe"
  }
}

This means you can build intelligent workflows:

# If form submission fails, try alternative flow
if response.success == false:
  navigate to backup_url
  retry with different_data

# Instead of this nightmare:
await driver.findElement(By.xpath("//div[@id='login-form']//input[@name='email']")).sendKeys("[email protected]");
await driver.findElement(By.css(".btn-submit.primary")).click();

# You can use natural language:
"Test the signup form"

# Or explicit commands when you need precision:
fill_form {"email": "[email protected]"}
click_button "Sign Up"

For Complete Beginners

Never written code? Perfect! UI-Probe is designed for you:

Install it (one-time setup, 5 minutes)
Tell it what to test in plain English
Get clear results - "✅ Account created" or "❌ The signup button is hidden"

Example for Non-Developers

Want to test your website's contact form every day? Just type:

# In Claude, you can be natural:
"Test the contact form on my homepage"

# Claude figures out the URL from context, or you can be explicit:
fill_form "https://mysite.com/contact" {"message": "Testing!"}

That's it. No programming required.

For Developers

UI-Probe gives you:

Self-healing tests - Automatically adapts when UI changes
80% less code - One line instead of dozens
AI-powered intelligence - Uses GPT-4/Claude to understand pages
Clear error messages - "Button is hidden by cookie banner" vs "ElementNotInteractableException"

Quick Start (5 Minutes)

🎯 TL;DR - Get Running Quickly

npx mcp-ui-probe setup                                                      # 1. Install browsers (one-time)
echo "OPENAI_API_KEY=sk-..." > .env                                        # 2. Add API key (optional but recommended)
curl -sSL https://raw.githubusercontent.com/Hulupeep/mcp-ui-probe/main/scripts/claude-setup.sh | bash  # 3. Connect to Claude
claude                                                                       # 4. Start using!

Prerequisites

Node.js 18+ (Download - just click "Next" through installer)
Claude Code CLI or any terminal
OpenAI or Anthropic API key (REQUIRED for intelligent features, see Cost Estimation below)

System Requirements

OS: Windows, macOS, or Linux
Node.js: Version 18 or higher
Disk Space: ~500MB for Playwright browsers (one-time download)
RAM: 2GB minimum, 4GB recommended

Option 1: Use with npx (Easiest - Works Everywhere!)

Step 1: Initial Setup (One-Time Only)

# Install Playwright browsers needed for web testing (~500MB, takes 2-3 minutes)
npx mcp-ui-probe setup

Step 2: Configure LLM Provider (Required for Intelligent Features)

UI-Probe requires an API key for intelligent form understanding and workflow execution:

# Create a .env file in your current directory
echo "OPENAI_API_KEY=your-key-here" > .env
# OR for Anthropic
echo "ANTHROPIC_API_KEY=your-key-here" > .env

# Optional: Enable fallback mode for basic Playwright functionality without LLM
echo "UI_PROBE_FALLBACK_MODE=true" >> .env

Important: Without a valid API key, intelligent features will not work:

🤖 Form field inference and understanding
🤖 Natural language workflow decomposition
🤖 Smart error message enhancement
🤖 Adaptive element detection

See Cost Estimation section below for API usage costs.

Step 3: Connect to Claude Code CLI

Why this extra step? Claude Code CLI can't find npx by itself because it doesn't have access to your shell's PATH. You need to tell Claude exactly where npx is located on your computer.

Option A: Automatic Setup (Easiest - does everything for you):

# This script will:
# 1. Find where npx is installed on your computer
# 2. Add UI-Probe to Claude with the correct path
# 3. Verify everything is configured properly
curl -sSL https://raw.githubusercontent.com/Hulupeep/mcp-ui-probe/main/scripts/claude-setup.sh | bash

After running this, just restart Claude and UI-Probe will be ready to use!

Option B: Manual Setup (if automatic doesn't work):

Step 1: Find your npx path

# On macOS/Linux:
which npx
# Example output: /usr/local/bin/npx or ~/.nvm/versions/node/v20.11.0/bin/npx

# On Windows:
where npx
# Example output: C:\Program Files\nodejs\npx.cmd

Step 2: Add to Claude with the full path

# Use YOUR path from Step 1:
claude mcp add ui-probe "/full/path/to/npx" "mcp-ui-probe@latest" "start"

# Real examples:
# Standard Node:
claude mcp add ui-probe "/usr/local/bin/npx" "mcp-ui-probe@latest" "start"

# Using NVM:
claude mcp add ui-probe "$HOME/.nvm/versions/node/v22.11.0/bin/npx" "mcp-ui-probe@latest" "start"

# Windows:
claude mcp add ui-probe "C:\Program Files\nodejs\npx.cmd" "mcp-ui-probe@latest" "start"

Step 4: Start Using UI-Probe in Claude!

# Start Claude Code CLI
claude

# UI-Probe tools are now available! Try:
# - Navigate to websites
# - Analyze page elements
# - Fill and submit forms
# - Run complete test flows

Step 5: (Optional) Try the Test Playground

Want to see UI-Probe in action before testing your own sites?

# Start the built-in test server with example forms
npx mcp-ui-probe test-server   # Runs on http://localhost:8081/test
npx mcp-ui-probe test-server --port 3000   # Use custom port if 8081 is busy

# Visit http://localhost:8081/test in your browser to see the playground
# Then in Claude, try: run_flow "Sign up as new user" "http://localhost:8081/test"

Option 2: Install from Source

# Clone it (this downloads the code)
git clone https://github.com/Hulupeep/mcp-ui-probe.git
cd mcp-ui-probe

# Install it (this sets everything up)
npm install

# CRITICAL: Install browsers (one-time, takes 2-3 minutes)
npx playwright install

# Add to Claude:
claude mcp add ui-probe "node" "/path/to/mcp-ui-probe/dist/index.js"

Start Testing!

# In Claude, just describe what you want:
"Test if users can sign up on example.com"

# Or be specific:
run_flow "Go to https://example.com/signup and create an account"

How UI-Probe Works in Claude

Natural Language (Default - Just Talk Normally!)

UI-Probe understands what you want to do:

# Just describe what you want - UI-Probe figures it out:
"Test if users can sign up on example.com"
"Check if the checkout process works"
"Fill out the contact form with test data"
"Click the submit button"

Explicit Commands (When You Need Precise Control)

Sometimes you need to be specific about exactly what to do:

# Use explicit commands for precise control:
navigate "https://staging.myapp.com/login"           # Go to exact URL
fill_form "https://myapp.com/contact" {"message": "Test"}  # Fill specific fields
click_button "Submit Order"                          # Click exact button text
assert_element "div.success" "visible"               # Check specific element

Best Practice: Start with natural language. If UI-Probe needs clarification or you need precise control, switch to explicit commands.

Common Tasks

Test a Login Form

# Natural language (recommended to start):
"Test if users can log in to myapp.com"
"Check the login flow"

# Explicit commands (for precise control):
navigate "https://myapp.com/login"
fill_form {"email": "[email protected]", "password": "password123"}
click_button "Sign In"
verify_page {"expectedContent": ["Dashboard", "Welcome"]}

Test a Purchase

# Natural language:
"Buy a blue shirt from shop.com"
"Test the checkout process with a test credit card"

# Explicit commands:
navigate "https://shop.com"
click_button "Shirts"
click_button "Blue Cotton Tee"
click_button "Add to Cart"
fill_form {"card": "4111111111111111", "exp": "12/25", "cvv": "123"}
click_button "Complete Order"

Test Form Validation

# Test what happens with bad data:
fill_form "https://myapp.com/signup" {"email": "not-an-email"}
# UI-Probe tells you: "❌ Email validation error appeared"

Check if Something Exists

assert_element "https://myapp.com" "Free shipping" "visible"
# Returns: "✅ Found 'Free shipping' on page"

What Makes UI-Probe Different

Traditional Testing Tools

Write code with specific selectors
Tests break when UI changes
Cryptic error messages
Need programming knowledge
Hundreds of lines of code

UI-Probe

Describe in plain English
Self-healing when UI changes
Clear, human-friendly errors
No programming needed
One line does it all

Real-World Examples

E-commerce Site

# Complete purchase flow
"Buy the cheapest laptop on the site"

# UI-Probe automatically:
# - Finds the shop
# - Searches for laptops
# - Sorts by price
# - Adds to cart
# - Fills checkout
# - Completes purchase

SaaS Application

# Test free trial signup
"Sign up for a free trial with a company email"

# UI-Probe:
# - Navigates to signup
# - Detects it's a business form
# - Fills company fields
# - Uses appropriate test data
# - Verifies trial activated

Banking App

# Test money transfer
"Transfer $50 from checking to savings"

# UI-Probe:
# - Logs in securely
# - Navigates to transfers
# - Fills amount
# - Selects accounts
# - Confirms transfer

🏗️ Architecture Overview

UI-Probe is built as an intelligent, stateful testing orchestrator rather than a simple script runner:

┌─────────────────────────────────────────────────────────┐
│                     Claude Code CLI                      │
│                    (Natural Language)                    │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   MCP Server (Persistent)                │
│  ┌─────────────────────────────────────────────────┐    │
│  │            Intelligent Components               │    │
│  │  • LLM Strategy Engine (GPT-4/Claude)          │    │
│  │  • Workflow Decomposer                         │    │
│  │  • Adaptive Executor                           │    │
│  │  • Form Inference Engine                       │    │
│  │  • Error Enhancer                              │    │
│  └─────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────┐    │
│  │            Journey System                       │    │
│  │  • Journey Recorder & Player                   │    │
│  │  • Journey Validator & Analyzer                │    │
│  │  • Journey Discovery & Storage                 │    │
│  └─────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────┐    │
│  │         Playwright Driver (Stateful)            │    │
│  │  • Persistent browser context                   │    │
│  │  • Smart selector generation                    │    │
│  │  • Automatic retry & recovery                   │    │
│  └─────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   Your Web Application                   │
└─────────────────────────────────────────────────────────┘

Key Architectural Advantages:

Persistent Server - No startup overhead, maintains state across tests
Intelligent Layer - LLM-powered understanding, not just automation
Journey System - Record once, replay with variations
Stateful Context - Remembers login sessions, previous interactions
Adaptive Execution - Adjusts strategy based on page behavior

🔬 Technical Architecture (For LLMs & Developers)

Request Flow: From Natural Language to Execution

User → "Search for blue t-shirt on Amazon"
  │
  ▼
┌─────────────────────────────────────────────────────────────┐
│ 1. MCP Server (src/server/MCPServer.ts:1270-1524)           │
│    handleRunFlow(goal: "Search for blue t-shirt")           │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. LLM Strategy (src/llm/llmStrategy.ts:86-148)             │
│    ✨ CALLS OPENAI API                                       │
│    parseGoal("Search for blue t-shirt")                     │
│    → {                                                       │
│         action: "fill",                                      │
│         target: "search bar",                                │
│         targetType: "input",                                 │
│         value: "blue t-shirt",  ← PARSED VALUE              │
│         submit: true                                         │
│       }                                                       │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. Form Inference (src/infer/form.ts)                       │
│    Analyzes page to understand form structure:              │
│    → {                                                       │
│         name: "site-search",                                 │
│         fields: [                                            │
│           {name: "field-keywords", type: "text"}, ← TARGET   │
│           {name: "nav-search-submit-button", type: "submit"} │
│         ]                                                     │
│       }                                                       │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. Override Creation (src/server/MCPServer.ts:1404-1441)    │
│    ✨ CRITICAL FIX: Maps LLM value to field name             │
│    overrides = {                                             │
│      "field-keywords": "blue t-shirt"  ← Uses LLM value      │
│    }                                                          │
│    (Instead of random "sample384")                           │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ 5. Flow Engine (src/flows/flowEngine.ts:15-89)              │
│    executeFlow(page, formSchema, overrides)                 │
│    For each field:                                           │
│      - Checks overrides first                                │
│      - Uses LLM value if present                             │
│      - Generates random data only if not in overrides        │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ 6. Playwright Execution                                      │
│    page.fill("#field-keywords", "blue t-shirt") ✅           │
│    page.click("#nav-search-submit-button")                  │
└─────────────────────────────────────────────────────────────┘

OpenAI Integration Points

UI-Probe uses OpenAI GPT-4 at these specific points (all in src/llm/llmStrategy.ts):

1. Goal Parsing (Line 86-148)

async parseGoal(goal: string): Promise<ParsedGoal>

Input: Natural language goal ("Search for blue t-shirt")
Output: Structured action object with {action, target, value}
OpenAI Call: openai.chat.completions.create() at line 261
Model: gpt-4-turbo-preview (configurable via LLM_MODEL)
Cost: ~$0.01-0.05 per request
Fallback: Regex parser if API fails or UI_PROBE_FALLBACK_MODE=true

Prompt sent to OpenAI (line 322-358):

Parse this UI testing goal into structured actions:
"Search for blue t-shirt"

Return JSON:
{
  "action": "fill",
  "target": "search bar",
  "value": "blue t-shirt",
  "submit": true
}

2. Alternative Selector Suggestions (Line 224-238)

async suggestAlternatives(failedSelector: string, pageContent: string): Promise<string[]>

When Called: Only when primary selector fails
Input: Failed selector + page HTML snippet
Output: Array of alternative selectors to try
Example: .submit-button fails → suggests ["button[type='submit']", "[aria-label='Submit']"]

3. Error Interpretation (Line 150-163)

async interpretError(error: string, context: any): Promise<ErrorInterpretation>

When Called: When test fails
Input: Error message + page context
Output: Human-readable explanation + recovery suggestions
Example: "Element not clickable" → "Element may be covered by overlay, try dismissing modal"

4. Text Completion (Line 168-222)

async complete(prompt: string): Promise<string>

Used by: Custom integrations and future features
Current Usage: Minimal (reserved for advanced features)

Key Files & Responsibilities

src/
├── server/
│   └── MCPServer.ts              # Main MCP server, request handling
│       ├── handleRunFlow()       # Entry point for natural language goals
│       ├── handleClickButton()   # Button clicking with AI fallback
│       └── handleNavigate()      # Page navigation
│
├── llm/
│   ├── llmStrategy.ts            # ✨ OpenAI integration hub
│   │   ├── parseGoal()           # Natural language → structured action
│   │   ├── suggestAlternatives() # Selector recovery
│   │   ├── interpretError()      # Error analysis
│   │   └── callLLM()             # Core OpenAI API wrapper
│   │
│   ├── workflowDecomposer.ts    # Multi-step workflow parsing
│   └── adaptiveExecutor.ts      # Execution with retry logic
│
├── flows/
│   └── flowEngine.ts             # Form execution engine
│       ├── executeFlow()         # Runs through form fields
│       ├── fillField()           # Fills individual field
│       └── Uses overrides map to prioritize LLM values
│
├── infer/
│   └── form.ts                   # Form structure analysis
│       └── inferForm()           # Analyzes page to understand forms
│
└── utils/
    ├── dataSynthesizer.ts        # Test data generation
    │   └── generateFieldData()   # Creates field values (checks overrides first)
    └── goalParser.ts             # Regex-based fallback parser

Data Flow Example: Amazon Search

// User command via MCP
run_flow({ goal: "Search for blue t-shirt" })

// Step 1: MCPServer.handleRunFlow()
const parsedGoal = await llmStrategy.parseGoal(goal);
// OpenAI returns: {action: "fill", value: "blue t-shirt"}

// Step 2: Analyze page structure
const analysis = await analyzeUI();
// Finds: search form with field-keywords input

// Step 3: Create overrides (THE FIX!)
const overrides = {
  "field-keywords": "blue t-shirt"  // Maps LLM value to field name
};

// Step 4: Execute with overrides
await flowEngine.executeFlow(page, formSchema, overrides);
//   → Calls dataSynthesizer.generateFieldData(field, overrides)
//   → Checks overrides["field-keywords"] first
//   → Returns "blue t-shirt" (not random data!)

// Step 5: Playwright fills the field
await page.fill("#field-keywords", "blue t-shirt");
await page.click("#nav-search-submit-button");

// ✅ Result: Searches for "blue t-shirt", not "sample384"

Environment Configuration

# Required for LLM features
OPENAI_API_KEY=sk-...              # OpenAI API key
LLM_MODEL=gpt-4-turbo-preview      # Model selection
LLM_TEMPERATURE=0.3                # Response randomness (0-1)

# Optional
UI_PROBE_FALLBACK_MODE=false       # Set true to disable LLM calls
LLM_CACHE_ENABLED=true             # Cache LLM responses (5 min TTL)
LLM_REQUEST_TIMEOUT=60000          # API timeout in ms
LLM_MAX_RETRIES=2                  # Retry failed API calls

# Cost controls
UI_PROBE_COST_LIMITS=true          # Enable cost tracking
UI_PROBE_MAX_COST=10               # Max spend in USD

Common Integration Patterns

Pattern 1: LLM-First with Fallback

// Try LLM parsing first, fall back to regex if fails
const parsed = await llmStrategy.parseGoal(goal);
// If LLM unavailable or fails → uses GoalParser.parse() as fallback

Pattern 2: Heuristics-First with LLM Recovery

// Try Playwright selectors (15+ patterns)
if (!found) {
  // Fall back to AI element detection
  const aiSelector = await findClickableElementWithAI(page, text);
}

Pattern 3: Cache for Cost Reduction

// LLM responses cached for 5 minutes
// Same goal → uses cached response, no API call
if (cacheEnabled && cached) {
  return cached;
}
const response = await callLLM(prompt);
cache.set(goal, response);

Debugging LLM Integration

Enable Debug Logging

export LOG_LEVEL=debug
export UI_PROBE_DEBUG=true

Check if OpenAI is Being Called

# Look for these log messages:
[DEBUG] Attempting LLM goal parsing (attempt 1/3)
[DEBUG] LLM goal parsing succeeded
[INFO] Using LLM-parsed value for field

View LLM Costs

// Check usage tracker (src/monitoring/usageTracker.ts)
const stats = llmStrategy.getUsageTracker()?.getStats();
console.log(`Total cost: $${stats.totalCost}`);
console.log(`Total tokens: ${stats.totalTokens}`);

Test Without LLM

# Use fallback mode for testing without API costs
UI_PROBE_FALLBACK_MODE=true npm start

Performance Characteristics

| Operation | Time | Cost | Caching | |-----------|------|------|---------| | Goal parsing (cached) | <50ms | $0 | ✅ 5 min | | Goal parsing (uncached) | 500-2000ms | ~$0.02 | ❌ | | Alternative selectors | 800-1500ms | ~$0.03 | ✅ 5 min | | Error interpretation | 600-1200ms | ~$0.02 | ❌ | | Form inference (no LLM) | 100-300ms | $0 | N/A | | Playwright actions | 50-200ms | $0 | N/A |

Total cost per test: $0.01-0.10 depending on complexity and caching

Architecture Decision Records

Why GPT-4 Turbo instead of GPT-3.5?

Accuracy: GPT-4 better understands form context and intent
Cost/Benefit: $0.02 extra per test, but 40% fewer failed tests
Configurable: Can use GPT-3.5 via LLM_MODEL=gpt-3.5-turbo

Why Parse Goal with LLM instead of Regex?

Flexibility: Handles variations ("search", "look for", "find")
Context: Understands "sign up" vs "sign in" distinction
Extensibility: Easy to add new action types without code changes
Fallback Available: Regex parser activates if LLM unavailable

Why Overrides Map instead of Direct Field Mapping?

Flexibility: Supports both LLM values and user-provided data
Priority: LLM values > formData > constraints > generated data
Compatibility: Works with existing data synthesizer
Testability: Easy to inject test data

Built-in Test Playground

UI-Probe includes a comprehensive test playground to try before deploying to your project:

# Start the test server (runs on port 8081)
npm run test:server

# Visit http://localhost:8081/test in your browser to see the playground

Available Test Pages:

Main Test Page (/test) - Complete sign-up form with validation
Forms Testing (/test/forms) - Every input type (text, select, radio, checkbox, etc.)
Navigation Testing (/test/navigation) - Multi-page navigation and routing
Dynamic Content (/test/dynamic) - JavaScript-driven UI updates
Validation Scenarios (/test/validation) - Error handling and edge cases

Test in Claude:

# Analyze form structure
analyze_ui "http://localhost:8081/test/forms"

# Fill and submit forms
fill_form "http://localhost:8081/test/forms" {"firstName": "John", "email": "[email protected]"}

# Run complete flows
run_flow(goal="Sign up as new user", url="http://localhost:8081/test")

## Cost Estimation

UI-Probe uses LLM API calls for intelligent features. Here's what to expect:

| Operation | GPT-4 Tokens | Estimated Cost | Frequency |
|-----------|--------------|----------------|-----------|
| Navigate (basic) | 0 | $0.000 | Per page |
| Navigate (with LLM analysis) | ~500 | $0.005 | Per page |
| Form Analysis (LLM) | ~1000 | $0.010 | Per form |
| Error Collection (LLM enhanced) | ~300 | $0.003 | Per test |
| UI Analysis (LLM) | ~800 | $0.008 | Per page |
| Complete Workflow (run_flow) | ~2000 | $0.020 | Per test |

### Cost Examples:
- **Per Test Suite** (10 tests with LLM): ~$0.30-$1.00
- **Per Month** (1000 tests with LLM): ~$30-$100
- **Fallback Mode**: $0.00 (no LLM calls)

### Cost Optimization Tips:
1. Use **fallback mode** for basic navigation and clicking
2. Enable **LLM caching** to reduce repeated API calls
3. Use explicit selectors when you know the exact element
4. Reserve `run_flow` for complex workflows that truly need intelligence
5. Monitor usage through your OpenAI/Anthropic dashboard

## Fallback Mode

UI-Probe can operate in fallback mode without LLM, providing basic Playwright functionality:

### Enable Fallback Mode:
```bash
# In .env file:
UI_PROBE_FALLBACK_MODE=true

What Works in Fallback Mode (No API Key Required):

✅ Basic navigation - navigate(url)
✅ Element clicking - Using explicit selectors
✅ Form filling - With explicit field names
✅ Screenshots - Capture page state
✅ Simple assertions - Check for element presence
✅ Journey replay - Recorded journeys with explicit selectors

What Requires LLM (API Key Required):

🤖 Natural language workflows - "Sign up as a new user"
🤖 Form inference - Automatic form structure understanding
🤖 Smart element detection - Finding elements without exact selectors
🤖 Error enhancement - Intelligent error messages
🤖 Journey analysis - AI-powered optimization
🤖 Adaptive execution - Self-healing when UI changes

Recommendation: Start with fallback mode for simple tests, add API key when you need intelligence.

Configuration

Basic (.env file)

# Required for intelligent features (see Cost Estimation above)
OPENAI_API_KEY=sk-...
# OR
ANTHROPIC_API_KEY=your-key

# Optional: Use fallback mode without LLM
UI_PROBE_FALLBACK_MODE=true

# See the browser window
HEADLESS=false

# Show detailed logs
DEBUG=true

Advanced Options

# Timeout for slow sites (milliseconds)
TIMEOUT=60000

# Retry failed operations
MAX_RETRIES=5

# Take screenshots on failure
SCREENSHOT_ON_FAILURE=true

# LLM Configuration
LLM_PROVIDER=openai  # or 'anthropic'
LLM_MODEL=gpt-4-turbo-preview
LLM_CACHE_ENABLED=true  # Reduce costs by caching responses

Troubleshooting

"Failed to connect" or "Connection failed" in Claude Code CLI

Problem: Claude shows ui-probe as "failed" or can't connect when you run claude mcp list.

Solution: Claude Code CLI needs the full path to npx, not just "npx".

Find your npx location:

which npx  # Mac/Linux
where npx  # Windows

Remove the broken configuration:
```
claude mcp remove ui-probe
```

Add with the full path:

# Use YOUR actual path from step 1
claude mcp add ui-probe "/path/from/step1/npx" "mcp-ui-probe@latest" "start"

Start a new Claude Code session

claude  # The MCP server will now connect properly

Common npx locations:

Standard Node.js: /usr/local/bin/npx
NVM (Node Version Manager): ~/.nvm/versions/node/vXX.XX.X/bin/npx
Homebrew (Mac): /opt/homebrew/bin/npx
Windows: C:\Program Files\nodejs\npx.cmd

"Port already in use" when starting test server

Solution: Use a different port:

npx mcp-ui-probe test-server --port 3000

"Navigation failed"

The site can't be reached. Check:

Is the URL correct?
Is the site running? (for localhost)
Run npx playwright install (if you haven't)

"Element not found"

The button/form/link isn't there. Try:

analyze_ui "URL" to see what's on the page
wait_for "URL" "element" "visible" for slow-loading content
Be more specific: "the blue submit button" vs just "submit"

Form won't fill

Custom form elements. Try:

analyze_ui to see what UI-Probe detects
Use click for custom dropdowns
Use run_flow for complex interactions

🎬 Journey Recording & Replay System

New Feature! UI-Probe now includes an intelligent journey recording system that eliminates the need to rediscover UI flows repeatedly.

⚡ Speed Up Testing by 80%

Record user interactions once, replay them instantly:

# Record a complete signup flow
journey_record_start {"name": "User Signup", "description": "Complete signup from landing page"}
# Perform your interactions...
journey_record_stop {"tags": ["auth", "signup"], "category": "user-onboarding"}

# Replay instantly later
journey_play {"journeyId": "journey_20250925_123456", "config": {"speed": 1.5}}

🤖 AI-Powered Intelligence

Smart naming: AI generates meaningful journey names and descriptions
Context validation: Ensures journeys only run when appropriate (can't order on a signup page)
Self-healing selectors: Adapts to UI changes with multiple fallback strategies
Pattern recognition: Suggests similar journeys and optimizations

🔍 Journey Discovery

# Find journeys compatible with your current page
journey_discover {"url": "https://mysite.com/login", "limit": 5}

# Search by tags, category, or success rate
journey_search {"query": "checkout", "tags": ["payment"], "minSuccessRate": 0.8}

📊 Success Tracking

Each journey tracks:

Success rate across multiple runs
Performance metrics and timing
Usage statistics
Difficulty estimation

🎯 Benefits

80% faster test execution by eliminating element discovery
Self-healing - journeys adapt to UI changes automatically
Reusable - share journeys across teams and projects
Discoverable - AI-powered search finds relevant flows
Reliable - context validation prevents mismatched executions

📖 See Complete Journey System Documentation for advanced features and usage patterns.

Smart Features

Automatic Test Data

UI-Probe generates appropriate test data:

Valid emails that pass validation
Strong passwords that meet requirements
Phone numbers in the right format
Realistic names and addresses
Test credit cards (4111111111111111)

Self-Healing Tests

When developers change:

Class names → UI-Probe still finds the button
IDs → Still works
Page structure → Adapts automatically
Text labels → Understands context

Clear Error Messages

❌ Traditional: "WebDriverException: unknown error: Element is not clickable at point (780, 532)"

✅ UI-Probe: "The submit button is hidden behind a cookie consent banner. Try dismissing the banner first."

Response Structure

MCP UI-Probe provides two types of deterministic JSON responses:

Simple Tool Responses

Individual tool commands (navigate, click_button, analyze_ui, etc.) return:

| Field | Type | Description | |-------|------|-------------| | success | boolean | Operation success status | | data | object | Tool-specific response data | | error | string | Error message (when success=false) |

Example:

{
  "success": true,
  "data": {
    "clicked": true,
    "selector": "button:has-text(\"Submit\")",
    "currentUrl": "http://example.com/success",
    "pageTitle": "Success Page"
  }
}

Complex Test Execution Responses

Comprehensive test commands (run_flow, fill_and_submit) return detailed reports with:

Complete test execution flow with all steps
Performance metrics and timings
Error collection and categorization
Accessibility findings
Test artifacts and evidence

This deterministic structure enables intelligent automation:

// Simple tool for conditional logic
const result = await ui_probe.click_button({ text: "Submit" });
if (result.success) {
  console.log(`Navigated to: ${result.data.currentUrl}`);
} else {
  console.log(`Failed: ${result.error}`);
}

// Complex command for full test execution
const testResult = await ui_probe.run_flow({
  goal: "Complete signup process",
  url: "http://example.com/signup"
});
console.log(`Test ${testResult.result}: ${testResult.metrics.steps} steps`)

📖 See Full Response Documentation for comprehensive details on both simple and complex response types.

API Reference

Core Testing Commands

| Command | What it does | Example | |---------|--------------|---------| | navigate | Go to a page | navigate "https://site.com" | | analyze_ui | See what's on the page | analyze_ui "https://site.com" | | fill_form | Fill out a form | fill_form "URL" {"field": "value"} | | run_flow | Do multiple steps | run_flow "Sign up and verify email" | | click_button | Click a button | click_button { text: "Submit" } | | assert_element | Check if something exists | assert_element "URL" "text" "visible" | | wait_for | Wait for something | wait_for "URL" "Loading..." "hidden" |

Journey Recording & Replay Commands

| Command | What it does | Example | |---------|--------------|---------| | journey_record_start | Start recording interactions | journey_record_start {"name": "Login Flow"} | | journey_record_stop | Stop and save recording | journey_record_stop {"tags": ["auth"]} | | journey_play | Replay saved journey | journey_play {"journeyId": "journey_20250925_123456"} | | journey_validate | Check if journey can run | journey_validate {"journeyId": "journey_20250925_123456"} | | journey_search | Search saved journeys | journey_search {"query": "login", "tags": ["auth"]} | | journey_discover | Find compatible journeys | journey_discover {"url": "https://site.com/login"} | | journey_list | List all journeys | journey_list {"category": "auth"} | | journey_analyze | Get AI analysis | journey_analyze {"journeyId": "journey_20250925_123456"} |

📖 See Complete API Reference for detailed parameters and response structures.

Comparison with playwright-mcp

Quick Summary

UI-Probe and playwright-mcp are complementary tools, not competitors:

playwright-mcp: Low-level infrastructure tool providing primitive browser commands for AI agents
UI-Probe: High-level testing application with plain English interface for end users

Key Differences

| Aspect | playwright-mcp | UI-Probe | |--------|---------------|----------| | Target User | Developers building AI agents | Non-technical users (PMs, designers, QA) | | Interface | Element-based (browser_click, browser_type) | Intent-based (run_flow "Sign up") | | Self-Healing | No - breaks if DOM changes | Yes - uses AI to adapt to changes | | Test Data | User must provide | Auto-generates valid data | | Setup | Single npx command | Clone repo + npm install | | Code Required | Yes - Claude writes test scripts | No - works immediately | | Response Format | Raw browser events | Structured JSON for automation |

Critical Advantage: No Code Generation Required

With playwright-mcp, Claude must write and execute test code:

// Claude has to generate this for every test
await page.goto('http://example.com');
await page.fill('#email', '[email protected]');
await page.click('button[type="submit"]');
// Error handling, retries, validation...

With UI-Probe, just describe what you want:

"Test the login form"
# That's it - no code generation needed

Deterministic JSON for Automation

Every UI-Probe action returns predictable JSON that enables conditional logic:

// UI-Probe response - always structured the same way
{
  "success": true,
  "data": {
    "formSubmitted": true,
    "validationErrors": [],
    "nextUrl": "/dashboard"
  }
}

// This enables intelligent automation:
if (!response.success) {
  // Handle failure automatically
  useAlternativeFlow();
}

The Bottom Line

Use playwright-mcp if you're a developer building an AI agent that needs browser control
Use UI-Probe if you want to test websites without writing code

Think of playwright-mcp as the engine and UI-Probe as the user-friendly car built around it.

→ For detailed comparison, see docs/comparison.md

Contributing

We love contributions! See CONTRIBUTING.md.

Support

Docs: Full documentation
Examples: More examples
Issues: Report problems
Discussions: Ask questions

License

MIT - Use it however you want!

Stop writing code that breaks. Start testing like a human.

Ready? Install now or see examples

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

UI-Probe: Test Any Website in Plain English

⚠️ Important: LLM API Key Required

The Problem

The Solution

⚡ Not the First, But a Different Take

From Testing → to Doing

What Makes UI-Probe Different

🚀 Stateful Testing Orchestrator vs Script Generator

🧠 AI-Powered Intelligence Built-In

📼 Journey Recording & Replay System

🎯 Natural Language Goal Execution

📊 Additional Unique Features

For Complete Beginners

Example for Non-Developers

For Developers

Quick Start (5 Minutes)

🎯 TL;DR - Get Running Quickly

Prerequisites

System Requirements

Option 1: Use with npx (Easiest - Works Everywhere!)

Step 1: Initial Setup (One-Time Only)

Step 2: Configure LLM Provider (Required for Intelligent Features)

Step 3: Connect to Claude Code CLI

Step 4: Start Using UI-Probe in Claude!

Step 5: (Optional) Try the Test Playground

Option 2: Install from Source

Start Testing!

How UI-Probe Works in Claude

Natural Language (Default - Just Talk Normally!)

Explicit Commands (When You Need Precise Control)

Common Tasks

Test a Login Form

Test a Purchase

Test Form Validation

Check if Something Exists

What Makes UI-Probe Different

Traditional Testing Tools

UI-Probe

Real-World Examples

E-commerce Site

SaaS Application

Banking App

🏗️ Architecture Overview

Key Architectural Advantages:

🔬 Technical Architecture (For LLMs & Developers)

Request Flow: From Natural Language to Execution

OpenAI Integration Points

1. Goal Parsing (Line 86-148)

2. Alternative Selector Suggestions (Line 224-238)

3. Error Interpretation (Line 150-163)

4. Text Completion (Line 168-222)

Key Files & Responsibilities

Data Flow Example: Amazon Search

Environment Configuration

Common Integration Patterns

Pattern 1: LLM-First with Fallback

Pattern 2: Heuristics-First with LLM Recovery

Pattern 3: Cache for Cost Reduction

Debugging LLM Integration

Enable Debug Logging

Check if OpenAI is Being Called

View LLM Costs

Test Without LLM

Performance Characteristics

Architecture Decision Records

Why GPT-4 Turbo instead of GPT-3.5?

Why Parse Goal with LLM instead of Regex?

Why Overrides Map instead of Direct Field Mapping?

Built-in Test Playground

Available Test Pages:

Test in Claude:

What Works in Fallback Mode (No API Key Required):

What Requires LLM (API Key Required):

Configuration