npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

wdio-agent-service

v0.2.0

Published

WebdriverIO plugin for agentic browser actions. Integrates into WebdrverIO codebase/ecosystem.

Downloads

320

Readme

WebdriverIO Agent Service

A WebdriverIO service that adds LLM-powered browser and mobile automation through a simple browser.agent(prompt) command. Powered by @wdio/mcp for element snapshots.

Why?

Modern UIs change frequently — cookie banners appear, button labels shift, modals pop up unexpectedly. Traditional selectors break. LLMs can handle this ambiguity, but running every action through an LLM is slow and expensive.

The solution: use both.

// Stable actions → use regular WebdriverIO (fast, free, reliable)
await browser.url('https://shop.example.com');
await browser.$('input#search').setValue('mechanical keyboard');
await browser.$('button[type="submit"]').click();

// Unpredictable UI → let the LLM handle it (flexible, resilient)
await browser.agent('accept the cookie consent banner');
await browser.agent('close any promotional popup');

// Back to stable actions
await browser.$('.product-card').click();
await browser.$('#add-to-cart').click();

This hybrid approach lets you:

  • Save tokens on predictable interactions
  • Stay resilient against UI changes where they matter most
  • Keep tests fast by only invoking the LLM when needed
  • Reduce flakiness in areas prone to A/B tests or dynamic content

Installation

npm install wdio-agent-service

Configuration

Add the service to your wdio.conf.ts:

export const config: WebdriverIO.Config = {
  // ...
  services: [
    ['agent', {
      provider: 'ollama',
      model: 'qwen2.5-coder:3b',
      maxActions: 2,
    }]
  ],
};

Config Options

| Option | Type | Default | Description | |---------------|----------------------------|----------------------------|----------------------------------------------------------------------------------------------------------------------------| | provider | 'ollama' | 'ollama' | LLM provider | | providerUrl | string | 'http://localhost:11434' | Provider API endpoint | | model | string | 'qwen2.5-coder:7b' | Model name | | maxActions | number | 1 | Maximum actions per prompt | | timeout | number | 30000 | Request timeout in ms | | toonFormat | 'yaml-like' \| 'tabular' | 'yaml-like' | Element encoding format. yaml-like works better with smaller models, tabular is more token-efficient for larger models |

Usage

The service adds a single command to the browser object:

const actions = await browser.agent('your natural language instruction');
// [{ type: 'CLICK', target: 'button#submit' }]

Browser Actions

| Action | Description | Example Prompt | |-------------|--------------------------|-----------------------------------------------| | CLICK | Click on an element | "click the login button" | | SET_VALUE | Type into an input field | "type [email protected] in the email field" | | NAVIGATE | Go to a URL | "navigate to https://example.com" |

Mobile Actions

| Action | Description | Example Prompt | |-------------|--------------------------|---------------------------------------| | TAP | Tap on an element | "tap the Sign In button" | | SET_VALUE | Type into an input field | "enter admin in the username field" |

The service auto-detects the platform (browser, Android, iOS) and adjusts the available actions and prompts accordingly.

Browser Examples

await browser.agent('accept all cookies');
await browser.agent('close the newsletter signup modal');
await browser.agent('click on Settings in the user menu');
await browser.agent('enter [email protected] in the email input');

Mobile Examples (Appium)

await browser.agent('skip the onboarding');
await browser.agent('accept all cookies');
await browser.agent('go to Account');
await browser.agent('fill in admin into username field and password into password field');

Mobile Setup (Appium)

The service works with Appium for Android and iOS automation. Configure your wdio.conf.ts with Appium capabilities:

export const config: WebdriverIO.Config = {
  hostname: '127.0.0.1',
  port: 4723,
  path: '/',

  capabilities: [{
    platformName: 'Android',
    'appium:deviceName': 'emulator-5554',
    'appium:automationName': 'UiAutomator2',
    'appium:app': '/path/to/your/app.apk',
  }],

  services: [
    ['agent', {
      provider: 'ollama',
      model: 'qwen2.5-coder:3b',
      maxActions: 5,
    }]
  ],
};

Local LLM Setup (Ollama)

  1. Install Ollama
  2. Pull a model: ollama pull qwen2.5-coder:3b
  3. Run ollama serve in the terminal
  4. Ollama runs on http://localhost:11434 by default

Recommended Models

| Model | Size | Speed | Accuracy | |---------------------|-------|---------|----------| | qwen2.5-coder:3b | 1.9GB | Fastest | Good | | qwen2.5-coder:7b | 4.7GB | Fast | Better | | qwen2.5-coder:14b | 9GB | Medium | Best |

How It Works

  1. browser.agent(prompt) captures visible, interactable elements using @wdio/mcp/snapshot
  2. Elements are encoded in a token-efficient format (TOON) and sent to the LLM alongside the prompt
  3. The LLM returns structured JSON actions (CLICK, SET_VALUE, NAVIGATE, TAP)
  4. Actions are executed via WebdriverIO

The platform (browser/Android/iOS) is auto-detected, and the prompt and available actions adjust accordingly.

When to Use agent() vs Regular Commands

| Scenario | Recommendation | |-------------------------------------|---------------------| | Static selectors that rarely change | Regular WebdriverIO | | Login forms with stable IDs | Regular WebdriverIO | | Cookie consent banners | agent() | | Promotional popups/modals | agent() | | Third-party widgets | agent() | | Elements with dynamic/generated IDs | agent() | | A/B tested UI components | agent() | | Mobile onboarding flows | agent() |

License

MIT