@agentic-intelligence/dom-engine

v1.2.0-dev.9

Published

7 months ago

Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents

0High
0Medium
0Low

chapa0711

dom analysis automation typescript web-scraping ai-agents browser-automation dom-manipulation interactive-elements scroll-management action-execution human-like-interaction click-simulation type-simulation

DOM Engine

A simple, lightweight library that turns website DOMs into actionable context for browser agents. Supports both browser extension environments and headless browser automation via Puppeteer.

Installation

npm install @agentic-intelligence/dom-engine

Usage

Basic Usage

import { getInteractiveContext, scrollToNewContent, executeActions } from '@agentic-intelligence/dom-engine';

// 1. Analyze page and get interactive elements
const domData = getInteractiveContext({ injectTrackers: true });
console.log('Interactive elements:', domData.interactiveElements);
console.log('Scroll info:', domData.scrollInfo);

// 2. Execute actions on elements
const actions = [
  {
    agenticPurposeId: domData.interactiveElements.inputs[0].agenticPurposeId,
    actionType: "type",
    value: "[email protected]"
  },
  {
    agenticPurposeId: domData.interactiveElements.buttons[0].agenticPurposeId,
    actionType: "click"
  }
];

const result = executeActions(actions);
console.log('Actions executed:', result.results);

// 3. Navigate with smart scroll
const scrollResult = scrollToNewContent();
if (scrollResult.success) {
  console.log('Scrolled to:', scrollResult.scrolledTo);
}

Note: injectTrackers adds a unique agenticPurposeId to each element that AI agents can use to reference and interact with elements. It also injects event listeners to detect clicks and evaluate if interactions were successful.

DOM Analysis

import { getInteractiveContext } from '@agentic-intelligence/dom-engine';

// Analyze entire page
const domData = getInteractiveContext({ injectTrackers: true });
console.log('Buttons found:', domData.interactiveElements.buttons);
console.log('Inputs found:', domData.interactiveElements.inputs);
console.log('Links found:', domData.interactiveElements.links);
console.log('Total elements:', domData.interactiveElements.total);

Example Response

Here's what a typical response looks like:

const domData = getInteractiveContext({ injectTrackers: true });

// Example response structure:
{
  interactiveElements: {
    total: 5,
    buttons: [
      {
        text: "Submit",
        agenticPurposeId: "a1b2c3d4",
        className: "btn btn-primary",
        ...
      }
    ],
    inputs: [
      {
        text: "Placeholder: Enter your email | Name: email",
        agenticPurposeId: "e5f6g7h8",
        type: "email",
        className: "form-control",
        ...
      }
    ],
    links: [
      {
        text: "Text: Learn more | Title: Documentation",
        agenticPurposeId: "i9j0k1l2",
        href: "/docs",
        className: "nav-link",
        ...
      }
    ],
    ...
  },
  scrollInfo: {
    totalHeight: 2000,
    viewportHeight: 800,
    scrollTop: 0,
    verticalScrollPercentage: 0,
    remainingHeight: 1200,
    nextContentPixel: 800
  }
}

Scroll Management

import { getInteractiveContext, scrollToNewContent } from '@agentic-intelligence/dom-engine';

// Get scroll information (no parameters needed!)
const domData = getInteractiveContext();
console.log('Scroll percentage:', domData.scrollInfo.verticalScrollPercentage);
console.log('Remaining content:', domData.scrollInfo.remainingHeight);

// Scroll to new content (automatically handles scroll to top if no new content)
const result = scrollToNewContent();
console.log('Scrolled to:', result.scrolledTo);

Smart Scroll Behavior:

If there's new content below: scrolls to the next unseen content
If no new content available: scrolls back to the top (pixel 0)
Always returns success: true with the scroll position

Action Execution

import { executeActions } from '@agentic-intelligence/dom-engine';

// Execute multiple actions
const actions = [
  {
    agenticPurposeId: "a1b2c3d4",
    actionType: "type",
    value: "[email protected]"
  },
  {
    agenticPurposeId: "a1b2c3d4",
    actionType: "click"
  }
];

const result = executeActions(actions);
console.log('Results:', result.results);

Available Action Types:

click: Click on buttons, links, or any clickable element
type: Type text into inputs, textareas, or contentEditable elements

Human-like Interaction:

Simulates realistic mouse events with coordinates
Multiple fallback methods for reliable clicking
Proper event sequences (mouseover, mousedown, mouseup, click)
Keyboard events for activation

Using with Puppeteer & Headless Browsers

Building the Bundle

Before using with Puppeteer, create a browser-compatible bundle:

// bundle-dom-engine.js
const esbuild = require('esbuild');

async function bundleDomEngine() {
    try {
        await esbuild.build({
            entryPoints: [require.resolve('@agentic-intelligence/dom-engine')],
            bundle: true,
            format: 'iife',
            globalName: 'DomEngine',
            platform: 'browser',
            target: 'es2015',
            outfile: './dom-engine-bundle.js',
            minify: false
        });
        console.log('✅ Bundle created successfully');
    } catch (error) {
        console.error('❌ Error creating bundle:', error);
    }
}

bundleDomEngine();

Injecting and Using with Puppeteer

const puppeteer = require('puppeteer');
const path = require('path');

// Launch browser and navigate
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });

// Inject the bundle
const bundlePath = path.join(__dirname, 'dom-engine-bundle.js');
await page.addScriptTag({ path: bundlePath });

// Use dom-engine
const domData = await page.evaluate(() => {
    return DomEngine.getInteractiveContext({ 
        injectTrackers: true
    });
});

console.log('Found buttons:', domData.interactiveElements.buttons);

// Click element by agenticPurposeId
const button = domData.interactiveElements.buttons[0];
await page.evaluate((purposeId) => {
    const element = document.querySelector(`[data-agentic-purpose-id="${purposeId}"]`);
    element?.click();
}, button.agenticPurposeId);

await browser.close();

Project Structure

src/
├── core/
│   └── dom-engine.ts          # Main DOM analysis engine
├── read/
│   ├── element-analyzer.ts    # Element text extraction and analysis
│   └── interactive-finder.ts  # Interactive element detection
├── actions/
│   ├── executor.ts            # Action coordination and execution
│   ├── click.ts               # Click action implementation
│   ├── type.ts                # Type action implementation
│   └── scroll.ts              # Scroll action implementation
├── utils/
│   └── helpers.ts             # Utility functions
├── types.ts                   # TypeScript type definitions
└── index.ts                   # Public API exports

Use Case Example

🤖 AI Agents & Automation

// AI agent workflow
const domData = getInteractiveContext({ injectTrackers: true });
const actions = aiAgent.decideActions(domData.interactiveElements);
const result = executeActions(actions);

🧪 E2E Testing

// Automated testing
const domData = getInteractiveContext({ injectTrackers: true });
const actions = [
  { agenticPurposeId: "email-input", actionType: "type", value: "[email protected]" },
  { agenticPurposeId: "submit-btn", actionType: "click" }
];
const result = executeActions(actions);
assert(result.results.every(r => r.success));

API Reference

Core Functions

`getInteractiveContext(options?)`

Analyzes the DOM and returns interactive elements with scroll information.

Parameters:

options.injectTrackers?: boolean - Inject unique IDs for action tracking
options.context?: DOMContext - Custom DOM context for extensions/iframes

Returns: DOMExtractionResult

`executeActions(actions, context?)`

Executes multiple actions on DOM elements.

Parameters:

actions: Action[] - Array of actions to execute
context?: DOMContext - Custom DOM context

Returns: ActionsResult

`scrollToNewContent(context?)`

Scrolls to new content or returns to top if no new content available.

Parameters:

context?: DOMContext - Custom DOM context

Returns: ScrollResult

Roadmap

✅ Smart Element Analysis: Automatically detects interactive elements (buttons, inputs, links)
✅ Advanced Categorization: Classifies elements by type and functionality
✅ Smart Scroll Management: Intelligent scroll control with automatic top return
✅ Visibility Filtering: Only processes actually visible elements
✅ Zero Dependencies: Pure JavaScript, no external libraries
✅ Cross-Platform: Works in modern browsers and Node.js
✅ Custom DOM Context: Support for analyzing different document contexts (extensions, iframes)
✅ Element Tracking: Inject unique IDs for agent tracking and interaction
🔲 Interaction History: Track and maintain history of interacted elements
🔲 Iframe Processing: Support for analyzing and interacting with iframe content

Contributing

Contributions are welcome! Please:

Fork the project
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Made with ❤️ by Luis Chapa Morin

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Installation

Usage

Basic Usage

DOM Analysis

Example Response

Scroll Management

Action Execution

Using with Puppeteer & Headless Browsers

Building the Bundle

Injecting and Using with Puppeteer

Project Structure

Use Case Example

🤖 AI Agents & Automation

🧪 E2E Testing

API Reference

Core Functions

getInteractiveContext(options?)

executeActions(actions, context?)

scrollToNewContent(context?)

Roadmap

Contributing

License

Author

`getInteractiveContext(options?)`

`executeActions(actions, context?)`

`scrollToNewContent(context?)`