@devload/pagent

v1.0.0

Published

2 months ago

An agent that acts on real web pages. CLI and AI-powered browser automation with MCP support.

Downloads

0High
0Medium
0Low

devload

pagent browser-automation mcp ai-agent playwright chrome-extension cli e2e web-scraping

PAGENT

Page AGENT - An agent that acts on real web pages. Control your browser from CLI and AI assistants with YAML-based UI testing (Playwright) and Chrome extension bridge for real browser automation.

Features

YAML DSL: Write test scenarios in simple, readable YAML
Playwright-powered: Fast, reliable cross-browser testing
Rich artifacts: Collect HTML, screenshots, console logs, network requests, HAR files, computed styles, and traces
Parallel execution: Run multiple scenarios concurrently
Retry support: Automatically retry failed tests
CI-friendly: JSON output and proper exit codes

Installation

# Install globally from npm
npm install -g @devload/pagent

# Or install locally in your project
npm install @devload/pagent

From Source

# Clone the repository
git clone https://github.com/devload/pagent.git
cd pagent

# Install dependencies and build
npm install
npm run build

Quick Start

# Initialize with example scenarios
pagent init

# Validate scenarios
pagent validate scenarios/*.yaml

# Run all scenarios
pagent run "scenarios/*.yaml"

# Run with visible browser
pagent run scenarios/smoke-example.yaml --headed

CLI Commands

`pagent init`

Initialize PAGENT in the current directory with example scenarios.

pagent init [options]

Options:
  --json     Output as JSON
  --force    Overwrite existing files

Creates:

scenarios/ directory with example YAML files
artifacts/ directory with .gitignore
Example scenarios: smoke-example.yaml, hackernews.yaml

`pagent run <pattern>`

Run YAML test scenarios.

pagent run <pattern> [options]

Arguments:
  pattern              Glob pattern for scenario files (e.g., "scenarios/*.yaml")

Options:
  --headed             Run in headed mode (visible browser)
  --workers <number>   Number of parallel workers (default: 1)
  --base-url <url>     Override base URL for all scenarios
  --timeout <ms>       Default timeout in milliseconds
  --retries <number>   Number of retries for failed tests (default: 0)
  --artifact-dir <dir> Directory for artifacts (default: ./artifacts)
  --json               Output as JSON only

Examples:

# Run single scenario
pagent run scenarios/login.yaml

# Run all scenarios in parallel
pagent run "scenarios/*.yaml" --workers 4

# Run with retries and custom timeout
pagent run scenarios/*.yaml --retries 2 --timeout 60000

# CI mode with JSON output
pagent run "scenarios/*.yaml" --json

`pagent validate <pattern>`

Validate YAML scenario files without running them.

pagent validate <pattern> [options]

Arguments:
  pattern    Glob pattern for scenario files

Options:
  --json     Output as JSON

`pagent list [dir]`

List YAML scenarios in a directory.

pagent list [dir] [options]

Arguments:
  dir        Directory to search (default: "scenarios")

Options:
  --json     Output as JSON

YAML DSL Specification

Schema (Version 1)

version: 1                          # Required: Schema version
name: my-test-scenario              # Required: Unique scenario name
baseURL: https://example.com        # Optional: Base URL for relative paths

use:                                # Optional: Browser configuration
  headless: true                    # Default: true
  viewport:                         # Default: { width: 1280, height: 720 }
    width: 1280
    height: 720
  timeoutMs: 30000                  # Default: 30000 (30 seconds)
  locale: en-US                     # Optional: Browser locale
  timezoneId: America/New_York      # Optional: Timezone
  userAgent: "..."                  # Optional: Custom user agent

artifacts:                          # Optional: Artifact collection settings
  html: true                        # Capture final page HTML
  screenshot: true                  # Capture final screenshot
  console: true                     # Capture console logs
  network: true                     # Capture network requests
  har: false                        # Save HAR file (default: false)
  trace: on-first-retry             # Trace recording: on|off|on-first-retry|retain-on-failure
  styles:                           # Compute styles for specific selectors
    - selector: "#myButton"
      computed: ["display", "color", "font-size"]
  networkBodyCapture:               # Capture response bodies (default: disabled)
    enabled: false
    maxSizeBytes: 1048576           # Max 1MB per response
    contentTypes: ["text/*", "application/json"]

steps:                              # Required: List of test steps
  - goto: "/login"
  - fill: { selector: "#email", text: "[email protected]" }
  - click: { selector: "#submit" }
  - expect: { urlContains: "/dashboard" }

Available Steps

Navigation

# Navigate to URL (absolute or relative to baseURL)
- goto: "/login"
- goto: "https://example.com/page"

Interactions

# Click element
- click: { selector: "#button" }
- click: { selector: "button", button: "right", clickCount: 2 }

# Fill input (clears existing content)
- fill: { selector: "#email", text: "[email protected]" }

# Type text (character by character with optional delay)
- type: { selector: "#search", text: "query", delayMs: 50 }

# Press keyboard key
- press: { key: "Enter" }
- press: { key: "Tab", selector: "#input" }

Waiting

# Wait for element
- waitFor: { selector: ".loaded" }
- waitFor: { selector: "#content", timeoutMs: 10000 }

# Wait for page state
- waitFor: { state: "load" }
- waitFor: { state: "domcontentloaded" }
- waitFor: { state: "networkidle", timeoutMs: 15000 }

Assertions

# Check element visibility
- expect: { visible: "#welcome-message" }
- expect: { hidden: ".loading-spinner" }

# Check page content
- expect: { textContains: "Welcome back" }

# Check URL
- expect: { urlContains: "/dashboard" }

Snapshots

# Take screenshot
- snapshot: { name: "after-login" }
- snapshot: { name: "full-page", fullPage: true }

Debug

# Execute JavaScript
- evaluate: { js: "console.log('Debug:', document.title)" }

Output Structure

artifacts/
└── 20251220-143052-abc123/          # Run ID
    ├── summary.json                  # Overall run summary
    └── my-scenario/                  # Scenario folder
        ├── summary.json              # Scenario summary
        ├── page.html                 # Final page HTML
        ├── screenshot.png            # Final screenshot
        ├── console.jsonl             # Console logs (one JSON per line)
        ├── network.jsonl             # Network requests (one JSON per line)
        ├── network.har               # HAR file (if enabled)
        ├── styles.json               # Computed styles
        ├── trace.zip                 # Playwright trace
        └── after-login.png           # Named snapshots

summary.json Format

{
  "ok": true,
  "runId": "20251220-143052-abc123",
  "scenario": "my-scenario",
  "filePath": "/path/to/scenario.yaml",
  "startedAt": "2025-12-20T14:30:52.000Z",
  "endedAt": "2025-12-20T14:31:04.000Z",
  "durationMs": 12345,
  "steps": [
    { "i": 1, "type": "goto", "ok": true, "durationMs": 1200 },
    { "i": 2, "type": "fill", "ok": true, "durationMs": 50 },
    { "i": 3, "type": "click", "ok": true, "durationMs": 100 },
    { "i": 4, "type": "expect", "ok": true, "durationMs": 30 }
  ],
  "artifacts": {
    "html": "page.html",
    "screenshot": "screenshot.png",
    "console": "console.jsonl",
    "network": "network.jsonl",
    "snapshots": ["after-login.png"]
  }
}

console.jsonl Format

{"timestamp":"2025-12-20T14:30:53.000Z","type":"log","text":"Page loaded"}
{"timestamp":"2025-12-20T14:30:54.000Z","type":"error","text":"API error","location":{"url":"app.js","lineNumber":42}}

network.jsonl Format

{"timestamp":"2025-12-20T14:30:52.500Z","method":"GET","url":"https://api.example.com/user","resourceType":"fetch","status":200,"statusText":"OK","requestHeaders":{"authorization":"[MASKED]"},"responseHeaders":{"content-type":"application/json"},"timing":{"startTime":1703082652500,"responseEnd":1703082652700,"durationMs":200}}

Exit Codes

0: All scenarios passed
1: One or more scenarios failed, or validation error

Security

Sensitive headers are automatically masked in network logs:

authorization
cookie
set-cookie
x-api-key
x-auth-token
x-access-token

Response body capture is disabled by default. When enabled, bodies are limited by size and content type.

Programmatic Usage

import { runScenario, loadScenario } from '@devload/pagent';

// Load and run a scenario
const { scenario } = await loadScenario('./scenarios/test.yaml');
const result = await runScenario(scenario, './scenarios/test.yaml', {
  headless: true,
  artifactDir: './artifacts',
});

console.log(result.summary.ok ? 'Passed' : 'Failed');

MCP Integration

The runScenario function can be wrapped as an MCP tool:

import { runScenario, loadScenario } from '@devload/pagent';

// In your MCP server tool handler
async function runUITest(scenarioPath: string) {
  const loadResult = await loadScenario(scenarioPath);
  if (!loadResult.success) {
    return { error: loadResult.errors };
  }

  const result = await runScenario(loadResult.scenario!, scenarioPath);
  return result.summary;
}

Assumptions & Defaults

Browser: Chromium only (for speed and consistency)
Timeout: 30 seconds default for all operations
Viewport: 1280x720 default
Artifacts: HTML, screenshot, console, network enabled by default
HAR: Disabled by default (large files)
Trace: Recorded on first retry by default
Network body: Not captured by default (security)

Troubleshooting

"Browser executable not found"

npx playwright install chromium

"Timeout waiting for selector"

Increase the timeout in your scenario:

use:
  timeoutMs: 60000

Or for specific steps:

- waitFor: { selector: ".slow-element", timeoutMs: 30000 }

Viewing Traces

npx playwright show-trace artifacts/*/trace.zip

Chrome Extension Bridge

PAGENT includes a Chrome extension bridge for controlling a real browser instance.

Setup

Install the Extension:

# Open Chrome and go to chrome://extensions/
# Enable "Developer mode"
# Click "Load unpacked" and select chrome-extension/ folder

Start the Bridge Server:
```
pagent bridge start
```
Connect the Extension:
- Click the PAGENT extension icon
- Click "Connect"

Bridge Commands

# Start bridge server (default port 9222)
pagent bridge start
pagent bridge start --port 9000

# Get current page info
pagent bridge exec getPageInfo

# Capture screenshot
pagent bridge exec screenshot ./page.png

# Get page HTML
pagent bridge exec getDOM
pagent bridge exec getDOM "#main-content"

# Execute JavaScript
pagent bridge exec execute "document.title"
pagent bridge exec execute "document.querySelectorAll('a').length"

# Interact with elements
pagent bridge exec click "#submit-button"
pagent bridge exec fill "#email" "[email protected]"
pagent bridge exec navigate "https://example.com"

# Tab management
pagent bridge exec newTab "https://google.com"
pagent bridge exec listTabs
pagent bridge exec switchTab 123456789
pagent bridge exec closeTab 123456789

# Execute on specific tab (without switching)
pagent bridge exec getPageInfo --tab 123456789
pagent bridge exec screenshot --tab 123456789

# Get captured logs
pagent bridge exec consoleLogs
pagent bridge exec networkLogs

Bridge Architecture

┌──────────────────────────────────────┐
│          Chrome Browser              │
│  ┌────────────────────────────────┐  │
│  │     PAGENT Extension           │  │
│  │    (WebSocket Client)          │  │
│  └─────────────┬──────────────────┘  │
└────────────────┼─────────────────────┘
                 │ ws://localhost:9222
                 ▼
┌──────────────────────────────────────┐
│   pagent bridge start                │
│   (WebSocket Server)                 │
└──────────────────────────────────────┘

Use Cases

Real Browser Testing: Test in actual Chrome with real extensions
DevTools Integration: Access console logs, network requests in real-time
Manual + Automated: Combine manual browsing with CLI automation
AI Integration: Ask AI assistants to control your browser via MCP

MCP Integration (Model Context Protocol)

PAGENT can be used as an MCP server, allowing AI assistants like Claude to control your browser.

Setup for Claude Code (CLI) - Recommended

Option 1: Using claude mcp add command

# Add PAGENT as MCP server (one command!)
claude mcp add pagent -s user -- npx -y @devload/pagent

# Or with environment variable
claude mcp add pagent -s user -e BROWSER_BRIDGE_URL=ws://localhost:9222 -- npx -y @devload/pagent

# Verify installation
claude mcp list

Option 2: Manual configuration

Add to your project's .mcp.json:

{
  "mcpServers": {
    "pagent": {
      "command": "npx",
      "args": ["-y", "@devload/pagent"],
      "env": {
        "BROWSER_BRIDGE_URL": "ws://localhost:9222"
      }
    }
  }
}

Or add to global config (~/.claude/settings.json):

{
  "mcpServers": {
    "pagent": {
      "command": "npx",
      "args": ["-y", "@devload/pagent"],
      "env": {
        "BROWSER_BRIDGE_URL": "ws://localhost:9222"
      }
    }
  }
}

Setup for Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "pagent": {
      "command": "npx",
      "args": ["-y", "@devload/pagent"],
      "env": {
        "BROWSER_BRIDGE_URL": "ws://localhost:9222"
      }
    }
  }
}

Prerequisites

Before using MCP, you need:

Install the Chrome Extension:

# Load unpacked extension from chrome-extension/ folder
# in chrome://extensions/

Start the Bridge Server:
```
pagent bridge start
```
Connect the Extension:
- Click the PAGENT extension icon
- Click "Connect"

Available MCP Tools

| Tool | Description | |------|-------------| | browser_list_tabs | List all open browser tabs | | browser_get_page_info | Get URL, title, and state of a tab | | browser_navigate | Navigate to a URL | | browser_new_tab | Open a new tab | | browser_close_tab | Close a tab | | browser_click | Click an element by CSS selector | | browser_fill | Fill an input field | | browser_screenshot | Capture a screenshot | | browser_get_dom | Get page HTML content | | browser_console_logs | Get browser console logs | | browser_network_logs | Get network request logs |

Example Usage with Claude

Once connected, you can ask Claude to:

"Open google.com and search for 'Anthropic Claude'"
"Take a screenshot of the current page"
"Fill in the login form with my email"
"Click the submit button"
"Get all the links on this page"

License

MIT