elementus-ai

v1.5.1

Published

a month ago

Self-healing element resolution for Playwright, WDIO & Appium. AI-powered fallback when selectors break.

Downloads

161

0High
0Medium
0Low

morph93

playwright webdriverio wdio appium self-healing ai locator selector test-automation element-resolution llm gemini vision

Elementus

Self-healing element resolution for Playwright, WebDriverIO & Appium.

When a selector breaks, Elementus uses AI to find the element by natural-language description. Works with any action (click, fill, hover) and any assertion (toHaveText, toBeVisible). Supports local LLMs via LM Studio and cloud LLMs via Google Gemini API.

Installation

npm install elementus-ai

One-Prompt Setup

Copy this prompt to your AI coding agent (Claude, Cursor, Copilot, etc.) and it will analyze your project and integrate Elementus automatically:

I just installed the npm package "elementus-ai" — a self-healing element resolution library for test automation. Analyze my project and integrate it. Follow these steps:

1. DETECT MY FRAMEWORK
   - Search for: playwright.config, wdio.conf, appium config files
   - Check package.json for: @playwright/test, playwright, webdriverio, wdio, appium
   - Read a few existing test files to understand the test structure
   - Note whether the project is TypeScript (tsconfig.json or .ts test files) — this changes the fixture syntax (see step 3)
   - If none found, tell me you can't detect a supported framework and stop

2. CHOOSE THE LLM PROVIDER
   - Ask me: "Do you want to use a local LLM (LM Studio, free, private) or Google Gemini (cloud, fast, ~$0.001 per AI-healed selector on gemini-3.5-flash; selectors that still work cost nothing)?"
   - If Gemini: ask for API key or check for GEMINI_API_KEY env var
   - If LM Studio: use defaults (localhost:1234) with a vision/grounding model loaded (recommended: holo-3.1-9b)

3. INTEGRATE BASED ON MY FRAMEWORK

   First create the elementus instance using the provider chosen in step 2:
     const { createElementus } = require('elementus-ai')   // ESM/TS: import { createElementus } from 'elementus-ai'
     const el = createElementus({ provider: 'gemini', geminiApiKey: process.env.GEMINI_API_KEY })
     // LM Studio instead: createElementus({ provider: 'lmstudio', lmStudioUrl: 'http://localhost:1234/v1/chat/completions', model: 'holo-3.1-9b' })
   For exact, copy-pasteable code (TypeScript fixture, onHeal recipe), read the installed package's
   node_modules/elementus-ai/README.md ("Quick Start", "TypeScript", "Detecting heals") and docs/API.md.

   For Playwright:
   - Create or update a fixtures file that overrides the page fixture with el.wrapPage(page)
   - Make sure all tests import from the fixtures file instead of @playwright/test
   - TypeScript projects: use import/export and type the override as base.extend<{ page: ElementusPage }>({ ... }) (import ElementusPage from elementus-ai) so { ai } is autocompleted and documented. Types are bundled — do NOT add @types or a "declare module" shim
   - Set actionTimeout: 10000 in playwright config (Elementus respects framework timeouts)
   - Optional heal visibility: only if the user wants healed locators surfaced, wire onHeal to the
     framework's reporter. Use e.suggestion — a ready-to-use, escaped, framework-native replacement
     locator — verbatim. Playwright: push a testInfo "healed" annotation (HTML report) AND
     testInfo.attach (so it also reaches Allure + custom reporters); optionally fail CI on heal via an
     env-gated afterEach/fixture. WebdriverIO/Appium: collect in onHeal, fail in afterTest. See
     "Detecting heals" in the README. Default off

   For WebDriverIO:
   - In wdio.conf.js before hook, wrap browser and override global $:
     const wrapped = el.wrapBrowser(browser); globalThis.$ = wrapped.$.bind(wrapped)
   - This way all page objects use plain $() with optional { ai } — zero changes needed

   For Appium:
   - Add el.wrapBrowser(driver) in the before hook

4. EXAMPLE TEST
   Ask me one of these three options:
   a) "Which test case would you like me to add { ai } to as an example?"
   b) "Or should I pick one test from your repo that has fragile selectors?"
   c) "Or do you prefer no changes to existing tests — just the setup/config?"

   If (a): apply { ai } to 1-2 fragile locators in the test I specify
   If (b): find one test with fragile selectors (auto-generated IDs, deep CSS paths, nth-child), apply { ai } to 1-2 locators in that single test, explain why you chose it
   If (c): skip this step — just confirm the setup is ready and show a standalone code snippet of how { ai } would look with my framework

   IMPORTANT: Never modify more than one test file. This is an example — the user decides where to apply { ai } going forward.

5. VERIFY
   - If a test was modified: run that single test to confirm it passes
   - If no test was modified: confirm Elementus loads without errors by running a quick import check

Rules:
- Only modify ONE test file maximum, and only 1-2 locators in it — this is a demo, not a migration
- Do NOT add { ai } to every locator — only to ones with fragile selectors
- Stable selectors (data-testid, explicit IDs, aria labels) should NOT get { ai } — zero overhead matters
- The { ai } description should use words that appear in or near the element's visible text
- Never add runtime dependencies — Elementus has zero deps by design

Quick Start

const { createElementus } = require('elementus-ai')

const el = createElementus({
  provider: 'gemini',
  geminiApiKey: process.env.GEMINI_API_KEY,
})

// Wrap your page — add { ai } to any locator that might break:
const p = el.wrapPage(page)
await p.locator('#submit-btn', { ai: 'Submit order button' }).click()

// Locators WITHOUT { ai } work normally — zero overhead:
await p.locator('#stable-element').click()

Using TypeScript or ESM? import { createElementus } from 'elementus-ai' — type definitions are bundled. See TypeScript for the typed fixture pattern.

LLM Provider Setup

Option A: Local LLM via LM Studio (free, private)

Download LM Studio
Load a vision-capable model. Recommended: holo-3.1-9b — a GUI-grounding model that locates on-screen elements far better than general chat VLMs, and it's small (9B). Any vision model works, but grounding models earn their keep on the vision-fallback path.
Start the local server (default: http://localhost:1234)

const el = createElementus({
  provider: 'lmstudio',
  lmStudioUrl: 'http://localhost:1234/v1/chat/completions',
  model: 'holo-3.1-9b',
})

Tips for the local setup:

Context length: set it to 16k+ in LM Studio — the ARIA-snapshot grounding step can send large prompts, and the default 4k will silently truncate.
Semantic matching: load an embedding model (e.g. text-embedding-nomic-embed-text-v1.5) and set embeddingModel to let paraphrased descriptions ("sign in" vs "log in") resolve without vision.

Option B: Google Gemini API (cloud, fast, better vision)

Get an API key from Google AI Studio

const el = createElementus({
  provider: 'gemini',
  geminiApiKey: 'AIza...',  // or set GEMINI_API_KEY env var
  geminiModel: 'gemini-3.5-flash',  // or 'gemini-3.1-flash-lite' (cheaper/faster)
})

Older gemini-2.5-flash / gemini-2.5-flash-lite still work (Google retires them 2026-10-16) — Elementus picks the right thinking config per model family automatically.

Note: Google's computer-use capability requires dedicated models (gemini-2.5-computer-use-preview-*, gemini-3-flash-preview) and is not available on gemini-3.5-flash. Elementus's own vision pipeline does not need it — this only matters if you point geminiModel at a computer-use model expecting agentic behavior.

Framework Setup

Playwright

Wrap page once, add { ai } to any locator:

const p = el.wrapPage(page)
await p.locator('#btn', { ai: 'Submit order button' }).click()
await p.locator('#email', { ai: 'Email input field' }).fill('[email protected]')

Recommended: Playwright fixture (wrap once for all tests):

// fixtures.js
const { test: base } = require('@playwright/test')
const { createElementus } = require('elementus-ai')
const el = createElementus({ provider: 'gemini', geminiApiKey: '...' })

module.exports = base.extend({
  page: async ({ page }, use) => {
    await use(el.wrapPage(page))
  }
})

// In tests — page is already wrapped:
test('example', async ({ page }) => {
  await page.locator('#btn', { ai: 'Submit button' }).click()
})

WebDriverIO

Override the global $ in your wdio.conf.js so all page objects work transparently:

// wdio.conf.js
const { createElementus } = require('elementus-ai')
const el = createElementus({ provider: 'gemini', geminiApiKey: '...' })

exports.config = {
  // ... other config
  async before() {
    const wrapped = el.wrapBrowser(browser)
    globalThis.$ = wrapped.$.bind(wrapped)
  }
}

// In tests / page objects — plain $() with optional { ai }:
await $('[data-testid="btn-send"]')                                // unchanged, zero overhead
await $('#btn', { ai: 'Submit order button' }).click()              // self-healing
await $('#email', { ai: 'Email input field' }).setValue('[email protected]')

Appium (Native Android / iOS / Flutter)

Same wrapBrowser pattern. Elementus auto-detects native apps and parses the element tree from driver.getPageSource() (XML) instead of DOM scanning.

const d = el.wrapBrowser(driver)
await d.$('~loginButton', { ai: 'Login button on welcome screen' }).click()
await d.$('~emailField', { ai: 'Email input' }).setValue('[email protected]')

Works with Flutter, React Native, native Android/iOS — any Appium driver.

TypeScript

Type definitions are bundled — there is no @types/elementus-ai package to install and no declare module shim to write. Because @playwright/test is an optional peer, WDIO/Appium-only projects can use the types without installing Playwright.

import { createElementus, type ElementusPage } from 'elementus-ai'

Typed Playwright fixture. wrapPage changes the page's runtime value but not its static type, so override the page fixture's type with ElementusPage — then { ai } is recognized and autocompleted (with docs) in your tests:

// fixtures.ts
import { test as base, expect } from '@playwright/test'
import { createElementus, type ElementusPage } from 'elementus-ai'

const el = createElementus({ provider: 'gemini', geminiApiKey: process.env.GEMINI_API_KEY })

export const test = base.extend<{ page: ElementusPage }>({
  page: async ({ page }, use) => {
    await use(el.wrapPage(page))
  },
})
export { expect }

// In tests — page is already wrapped and typed:
test('example', async ({ page }) => {
  await page.locator('#btn', { ai: 'Submit button' }).click() // { ai } type-checks
  await page.locator('#btn').click()                          // plain locator, zero overhead
})

The override is for editor support — IntelliSense and inline docs for { ai }. It heals at runtime either way, and because Playwright's locator() options are permissive, { ai } compiles with or without the override; the override just surfaces it as a documented option.

Page Object Model. Type the page your objects receive as ElementusPage:

import { type ElementusPage } from 'elementus-ai'

abstract class BasePage {
  constructor(protected readonly page: ElementusPage) {}
}

class LoginPage extends BasePage {
  readonly submit = this.page.locator('#submit', { ai: 'Submit button' })
}

Exported types: ElementusOptions, Elementus, ElementusPage, AiLocatorOptions. AiLocatorOptions is Playwright's own locator() option type plus ai?: string, derived from the installed Playwright version so it never drifts.

API Reference

`el.wrapPage(page)`

Wraps a Playwright page. Returns a proxy where page.locator(selector, { ai: 'description' }) auto-creates AI-fallback locators. Locators without { ai } pass through unchanged.

`el.wrapBrowser(browser)`

Wraps a WDIO/Appium browser. Returns a proxy where browser.$(selector, { ai: 'description' }) auto-creates AI-fallback elements.

`el.find(context, description)`

Find element by description only (no locator needed). Returns a framework-native locator/element.

const found = await el.find(page, 'Submit order button')
await found.click()
await expect(found).toHaveText('Submit')

`el.locate(context, locator, description)`

Try locator first, fall back to AI if it fails. Returns a framework-native locator/element. Respects your framework's configured action timeout.

const found = await el.locate(page, page.locator('#btn'), 'Submit button')
await found.click()

`el.click(context, locator, description)`

Click with optimized fallback: uses page.goto() for links (avoids hover/overlay issues) and JS click for buttons (no mouse movement). Best for navigation clicks. Respects your framework's configured action timeout.

await el.click(page, page.locator('#nav-blog'), 'Blog page link')

`el.wrap(context, locator, description)`

Low-level: wraps any single locator/element with AI fallback. Prefer wrapPage/wrapBrowser for cleaner code.

Configuration

createElementus({
  // LLM Provider
  provider: 'lmstudio',    // 'lmstudio' | 'gemini'

  // LM Studio
  lmStudioUrl: 'http://localhost:1234/v1/chat/completions',
  model: 'holo-3.1-9b',

  // Gemini
  geminiApiKey: null,       // or GEMINI_API_KEY env var
  geminiModel: 'gemini-3.5-flash',

  // Behavior
  maxCandidates: 20,        // max elements sent to LLM for disambiguation
  visionMaxWidth: 1280,     // max screenshot width (px) sent to vision LLM

  // Fingerprint cache (opt-in) — remember healed elements across runs and
  // re-match them algorithmically (zero LLM cost, ~ms) before any AI call
  cacheFile: null,          // e.g. './elementus-cache.json'

  // Semantic matching (opt-in) — embedding model for paraphrase matching
  // when keyword scoring finds nothing
  embeddingModel: null,     // e.g. 'text-embedding-nomic-embed-text-v1.5'

  // Debugging
  debug: false,             // save screenshots to debugDir
  debugDir: null,           // required when debug: true, e.g. './debug'

  // Custom stop words
  stopWords: null,          // Set of words to ignore in descriptions

  // Heal telemetry (opt-in) — called once per heal with { description, selector, method, framework, resolved, suggestion }.
  // Best-effort & isolated: a throwing callback never breaks a test. See "Detecting heals" below.
  onHeal: null,             // e.g. (e) => console.log('healed', e.selector, '→', e.description)
})

Detecting heals (reporting & CI)

A heal is silent by default — the test still passes. To surface it, pass an onHeal(event) callback; elementus calls it once per heal with { description, selector, method, framework, resolved, suggestion }. The library only emits the event — turning it into a report annotation or a CI failure is framework-specific (a few lines on your side). It fires when the AI resolves a replacement element, so it signals "the selector drifted and AI stepped in", not "the action ultimately passed".

Playwright — annotate the test (shows in the Playwright HTML report), and optionally fail CI on heal:

// fixtures.js
const { test: base } = require('@playwright/test')
const { createElementus } = require('elementus-ai')

const el = createElementus({
  provider: 'gemini',
  geminiApiKey: process.env.GEMINI_API_KEY,
  onHeal: (e) => {
    // e.suggestion is a ready-to-use, escaped, framework-native replacement locator — use it verbatim.
    const line = `${e.selector}  →  ${e.suggestion ?? '(located visually — no DOM suggestion)'}`
    try {
      const info = base.info()
      info.annotations.push({ type: 'healed', description: line })          // Playwright HTML report
      void info.attach('healed', { body: line, contentType: 'text/plain' }) // Allure + custom reporters
    } catch {}  // ignored outside a running test
  },
})

const test = base.extend({
  page: async ({ page }, use) => { await use(el.wrapPage(page)) },
  // Optional: fail CI when a selector only passed via self-healing (drift you should fix).
  failOnHeal: [async ({}, use, testInfo) => {
    await use()
    const heals = testInfo.annotations.filter(a => a.type === 'healed')
    if (process.env.ELEMENTUS_FAIL_ON_HEAL && heals.length) {
      throw new Error('Passed only via self-healing — fix the selector(s):\n  ' + heals.map(h => h.description).join('\n  '))
    }
  }, { auto: true }],
})
module.exports = { test, expect: require('@playwright/test').expect }

(TypeScript: import the HealEvent type from elementus-ai; the annotation/fixture code is identical.)

WebdriverIO / Appium — collect in onHeal, fail (or report) in afterTest:

// wdio.conf.js
const { createElementus } = require('elementus-ai')
let heals = []
const el = createElementus({ provider: 'gemini', geminiApiKey: '...', onHeal: (e) => heals.push(e) })

exports.config = {
  beforeTest() { heals = [] },
  afterTest()  {
    if (process.env.ELEMENTUS_FAIL_ON_HEAL && heals.length) {
      throw new Error('healed: ' + heals.map(h => h.description).join(', '))
    }
  },
}

Where each report shows it: the annotation appears in the Playwright HTML report (test → Annotations). In Allure, the heal shows in the test's step tree (the broken locator → the healed [data-elementus=…] element) and, with fail-on-heal on, the failure message — note allure-playwright does not convert runtime annotations into labels/steps, so for a dedicated Allure marker also testInfo.attach('healed', { body: \${e.selector} → ${e.description}`, contentType: 'text/plain' })` (attachments surface in Allure and most custom reporters). A custom HTML reporter surfaces the heal via the failure message and captured stdout.

Security Notes

Debug screenshots capture the full page — including any sensitive data visible on it. Keep debugDir out of version control.
Page content reaches the LLM. Element texts and screenshots from the page under test are sent to your configured LLM provider as part of the resolution prompts. Run Elementus against pages you trust, and prefer a local LLM (LM Studio) for sensitive applications.
The Gemini API key is sent via the x-goog-api-key header (never in the URL) and can be supplied via the GEMINI_API_KEY env var instead of code.

Timeouts

Elementus respects your framework's configured timeouts. It does not override or race against them. Set appropriate action timeouts in your framework config:

// Playwright: playwright.config.js or test.use()
test.use({ actionTimeout: 10_000 })  // 10s before locator fails, then AI takes over

// WDIO: wdio.conf.js
waitforTimeout: 10000

If a selector works, it returns immediately (zero overhead). If it fails after your configured timeout, Elementus AI fallback kicks in.

How It Works

When a selector fails, Elementus runs a 5-step pipeline — free and deterministic steps first, LLM steps later, vision last:

Step 1: Locator — Try the original selector. If it works, done (zero overhead).

Step 2: Fingerprint cache (opt-in via cacheFile) — If this selector+description healed before on this page, re-match the stored multi-attribute fingerprint (tag, id, text, neighbor text, position, size, …) against the live DOM with weighted similarity. Milliseconds, zero LLM cost. Accepted only with a confidence threshold and a margin over the runner-up.

Step 3: DOM/Element Tree Scoring — Scan all interactive elements on the page (DOM for web, XML source for native apps). Score each by keyword and phrase relevance to the description. If one clear winner, use it. If multiple tied, send the ranked top-N to the LLM. If all identical (e.g., 10x "Edit" buttons), use positional LLM with coordinates. With embeddingModel set, zero keyword matches fall back to semantic (embedding) ranking.

Step 4: Snapshot grounding — Playwright: capture an ARIA accessibility snapshot with element refs and ask the text LLM to pick the matching ref. WDIO/native: synthesize an indexed role/name list from the element scan and do the same. No vision model needed.

Step 5: Vision — First Set-of-Marks: numbered badges drawn on the candidate elements, one vision call returns a badge number. If that fails: screenshot with a labeled 3x3 grid, region re-scan, then a precise-coordinate fail-safe for elements a DOM scan can never see (e.g. canvas-rendered icons). The fail-safe narrows the search band to roughly one viewport before asking the model for pixels — so the target is never too small to locate — then verifies the result on a zoomed crop and fails loudly rather than ever clicking the wrong place.

Tips for Writing Descriptions

Good — use words that appear in or near the element:

'Submit order button' matches <button>Submit</button>
'Email input field' matches <input> near "Email" label
'Privacy Policy footer link' matches <a>Privacy Policy</a>

For identical elements, add positional context:

'first Edit button near the top'
'Delete button in the third row'

Avoid vague descriptions:

'the button' matches every button
'Save Changes button' is specific and matchable

Platform Support

| Platform | Element scan | Vision | Status | |----------|-------------|--------|--------| | Playwright (web) | DOM | Full-page screenshot + LLM | Full support | | WDIO (web) | DOM | Viewport screenshot + LLM | Full support (WDIO v9+ recommended) | | Appium (mobile web) | DOM | Viewport screenshot + LLM | Full support | | Appium (native Android/iOS) | XML source | — | Element-tree resolution (no vision) | | Appium (Flutter) | XML source | — | Element-tree resolution (no vision) | | Appium (React Native) | XML source | — | Element-tree resolution (no vision) |

Notes:

WDIO screenshots are viewport-only, so the vision fallback sees the current viewport rather than the full page. DOM scoring always covers the whole document on every platform.
Native apps have no DOM to overlay the vision grid on — resolution uses the native element tree (accessibility-id, resource-id, text) and stops there with an actionable error if unresolved.

License

MIT