wdio-agent-service

v0.2.0

Published

5 days ago

WebdriverIO plugin for agentic browser actions. Integrates into WebdrverIO codebase/ecosystem.

Downloads

320

0High
0Medium
0Low

vincegraics

wdio-plugin wdio-service webdriverio agent testing ai ollama

WebdriverIO Agent Service

A WebdriverIO service that adds LLM-powered browser and mobile automation through a simple browser.agent(prompt) command. Powered by @wdio/mcp for element snapshots.

Why?

Modern UIs change frequently — cookie banners appear, button labels shift, modals pop up unexpectedly. Traditional selectors break. LLMs can handle this ambiguity, but running every action through an LLM is slow and expensive.

The solution: use both.

// Stable actions → use regular WebdriverIO (fast, free, reliable)
await browser.url('https://shop.example.com');
await browser.$('input#search').setValue('mechanical keyboard');
await browser.$('button[type="submit"]').click();

// Unpredictable UI → let the LLM handle it (flexible, resilient)
await browser.agent('accept the cookie consent banner');
await browser.agent('close any promotional popup');

// Back to stable actions
await browser.$('.product-card').click();
await browser.$('#add-to-cart').click();

This hybrid approach lets you:

Save tokens on predictable interactions
Stay resilient against UI changes where they matter most
Keep tests fast by only invoking the LLM when needed
Reduce flakiness in areas prone to A/B tests or dynamic content

Installation

npm install wdio-agent-service

Configuration

Add the service to your wdio.conf.ts:

export const config: WebdriverIO.Config = {
  // ...
  services: [
    ['agent', {
      provider: 'ollama',
      model: 'qwen2.5-coder:3b',
      maxActions: 2,
    }]
  ],
};

Config Options

| Option | Type | Default | Description | |---------------|----------------------------|----------------------------|----------------------------------------------------------------------------------------------------------------------------| | provider | 'ollama' | 'ollama' | LLM provider | | providerUrl | string | 'http://localhost:11434' | Provider API endpoint | | model | string | 'qwen2.5-coder:7b' | Model name | | maxActions | number | 1 | Maximum actions per prompt | | timeout | number | 30000 | Request timeout in ms | | toonFormat | 'yaml-like' \| 'tabular' | 'yaml-like' | Element encoding format. yaml-like works better with smaller models, tabular is more token-efficient for larger models |

Usage

The service adds a single command to the browser object:

const actions = await browser.agent('your natural language instruction');
// [{ type: 'CLICK', target: 'button#submit' }]

Browser Actions

| Action | Description | Example Prompt | |-------------|--------------------------|-----------------------------------------------| | CLICK | Click on an element | "click the login button" | | SET_VALUE | Type into an input field | "type [email protected] in the email field" | | NAVIGATE | Go to a URL | "navigate to https://example.com" |

Mobile Actions

| Action | Description | Example Prompt | |-------------|--------------------------|---------------------------------------| | TAP | Tap on an element | "tap the Sign In button" | | SET_VALUE | Type into an input field | "enter admin in the username field" |

The service auto-detects the platform (browser, Android, iOS) and adjusts the available actions and prompts accordingly.

Browser Examples

await browser.agent('accept all cookies');
await browser.agent('close the newsletter signup modal');
await browser.agent('click on Settings in the user menu');
await browser.agent('enter [email protected] in the email input');

Mobile Examples (Appium)

await browser.agent('skip the onboarding');
await browser.agent('accept all cookies');
await browser.agent('go to Account');
await browser.agent('fill in admin into username field and password into password field');

Mobile Setup (Appium)

The service works with Appium for Android and iOS automation. Configure your wdio.conf.ts with Appium capabilities:

export const config: WebdriverIO.Config = {
  hostname: '127.0.0.1',
  port: 4723,
  path: '/',

  capabilities: [{
    platformName: 'Android',
    'appium:deviceName': 'emulator-5554',
    'appium:automationName': 'UiAutomator2',
    'appium:app': '/path/to/your/app.apk',
  }],

  services: [
    ['agent', {
      provider: 'ollama',
      model: 'qwen2.5-coder:3b',
      maxActions: 5,
    }]
  ],
};

Local LLM Setup (Ollama)

Install Ollama
Pull a model: ollama pull qwen2.5-coder:3b
Run ollama serve in the terminal
Ollama runs on http://localhost:11434 by default

Recommended Models

| Model | Size | Speed | Accuracy | |---------------------|-------|---------|----------| | qwen2.5-coder:3b | 1.9GB | Fastest | Good | | qwen2.5-coder:7b | 4.7GB | Fast | Better | | qwen2.5-coder:14b | 9GB | Medium | Best |

How It Works

browser.agent(prompt) captures visible, interactable elements using @wdio/mcp/snapshot
Elements are encoded in a token-efficient format (TOON) and sent to the LLM alongside the prompt
The LLM returns structured JSON actions (CLICK, SET_VALUE, NAVIGATE, TAP)
Actions are executed via WebdriverIO

The platform (browser/Android/iOS) is auto-detected, and the prompt and available actions adjust accordingly.

When to Use `agent()` vs Regular Commands

| Scenario | Recommendation | |-------------------------------------|---------------------| | Static selectors that rarely change | Regular WebdriverIO | | Login forms with stable IDs | Regular WebdriverIO | | Cookie consent banners | agent() | | Promotional popups/modals | agent() | | Third-party widgets | agent() | | Elements with dynamic/generated IDs | agent() | | A/B tested UI components | agent() | | Mobile onboarding flows | agent() |

License

MIT