@agentic-intelligence/dom-engine
v1.2.0-dev.9
Published
Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents
Maintainers
Readme
DOM Engine
A simple, lightweight library that turns website DOMs into actionable context for browser agents. Supports both browser extension environments and headless browser automation via Puppeteer.
Installation
npm install @agentic-intelligence/dom-engineUsage
Basic Usage
import { getInteractiveContext, scrollToNewContent, executeActions } from '@agentic-intelligence/dom-engine';
// 1. Analyze page and get interactive elements
const domData = getInteractiveContext({ injectTrackers: true });
console.log('Interactive elements:', domData.interactiveElements);
console.log('Scroll info:', domData.scrollInfo);
// 2. Execute actions on elements
const actions = [
{
agenticPurposeId: domData.interactiveElements.inputs[0].agenticPurposeId,
actionType: "type",
value: "[email protected]"
},
{
agenticPurposeId: domData.interactiveElements.buttons[0].agenticPurposeId,
actionType: "click"
}
];
const result = executeActions(actions);
console.log('Actions executed:', result.results);
// 3. Navigate with smart scroll
const scrollResult = scrollToNewContent();
if (scrollResult.success) {
console.log('Scrolled to:', scrollResult.scrolledTo);
}Note: injectTrackers adds a unique agenticPurposeId to each element that AI agents can use to reference and interact with elements. It also injects event listeners to detect clicks and evaluate if interactions were successful.
DOM Analysis
import { getInteractiveContext } from '@agentic-intelligence/dom-engine';
// Analyze entire page
const domData = getInteractiveContext({ injectTrackers: true });
console.log('Buttons found:', domData.interactiveElements.buttons);
console.log('Inputs found:', domData.interactiveElements.inputs);
console.log('Links found:', domData.interactiveElements.links);
console.log('Total elements:', domData.interactiveElements.total);Example Response
Here's what a typical response looks like:
const domData = getInteractiveContext({ injectTrackers: true });
// Example response structure:
{
interactiveElements: {
total: 5,
buttons: [
{
text: "Submit",
agenticPurposeId: "a1b2c3d4",
className: "btn btn-primary",
...
}
],
inputs: [
{
text: "Placeholder: Enter your email | Name: email",
agenticPurposeId: "e5f6g7h8",
type: "email",
className: "form-control",
...
}
],
links: [
{
text: "Text: Learn more | Title: Documentation",
agenticPurposeId: "i9j0k1l2",
href: "/docs",
className: "nav-link",
...
}
],
...
},
scrollInfo: {
totalHeight: 2000,
viewportHeight: 800,
scrollTop: 0,
verticalScrollPercentage: 0,
remainingHeight: 1200,
nextContentPixel: 800
}
}Scroll Management
import { getInteractiveContext, scrollToNewContent } from '@agentic-intelligence/dom-engine';
// Get scroll information (no parameters needed!)
const domData = getInteractiveContext();
console.log('Scroll percentage:', domData.scrollInfo.verticalScrollPercentage);
console.log('Remaining content:', domData.scrollInfo.remainingHeight);
// Scroll to new content (automatically handles scroll to top if no new content)
const result = scrollToNewContent();
console.log('Scrolled to:', result.scrolledTo);Smart Scroll Behavior:
- If there's new content below: scrolls to the next unseen content
- If no new content available: scrolls back to the top (pixel 0)
- Always returns
success: truewith the scroll position
Action Execution
import { executeActions } from '@agentic-intelligence/dom-engine';
// Execute multiple actions
const actions = [
{
agenticPurposeId: "a1b2c3d4",
actionType: "type",
value: "[email protected]"
},
{
agenticPurposeId: "a1b2c3d4",
actionType: "click"
}
];
const result = executeActions(actions);
console.log('Results:', result.results);Available Action Types:
click: Click on buttons, links, or any clickable elementtype: Type text into inputs, textareas, or contentEditable elements
Human-like Interaction:
- Simulates realistic mouse events with coordinates
- Multiple fallback methods for reliable clicking
- Proper event sequences (mouseover, mousedown, mouseup, click)
- Keyboard events for activation
Using with Puppeteer & Headless Browsers
Building the Bundle
Before using with Puppeteer, create a browser-compatible bundle:
// bundle-dom-engine.js
const esbuild = require('esbuild');
async function bundleDomEngine() {
try {
await esbuild.build({
entryPoints: [require.resolve('@agentic-intelligence/dom-engine')],
bundle: true,
format: 'iife',
globalName: 'DomEngine',
platform: 'browser',
target: 'es2015',
outfile: './dom-engine-bundle.js',
minify: false
});
console.log('✅ Bundle created successfully');
} catch (error) {
console.error('❌ Error creating bundle:', error);
}
}
bundleDomEngine();Injecting and Using with Puppeteer
const puppeteer = require('puppeteer');
const path = require('path');
// Launch browser and navigate
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
// Inject the bundle
const bundlePath = path.join(__dirname, 'dom-engine-bundle.js');
await page.addScriptTag({ path: bundlePath });
// Use dom-engine
const domData = await page.evaluate(() => {
return DomEngine.getInteractiveContext({
injectTrackers: true
});
});
console.log('Found buttons:', domData.interactiveElements.buttons);
// Click element by agenticPurposeId
const button = domData.interactiveElements.buttons[0];
await page.evaluate((purposeId) => {
const element = document.querySelector(`[data-agentic-purpose-id="${purposeId}"]`);
element?.click();
}, button.agenticPurposeId);
await browser.close();Project Structure
src/
├── core/
│ └── dom-engine.ts # Main DOM analysis engine
├── read/
│ ├── element-analyzer.ts # Element text extraction and analysis
│ └── interactive-finder.ts # Interactive element detection
├── actions/
│ ├── executor.ts # Action coordination and execution
│ ├── click.ts # Click action implementation
│ ├── type.ts # Type action implementation
│ └── scroll.ts # Scroll action implementation
├── utils/
│ └── helpers.ts # Utility functions
├── types.ts # TypeScript type definitions
└── index.ts # Public API exportsUse Case Example
🤖 AI Agents & Automation
// AI agent workflow
const domData = getInteractiveContext({ injectTrackers: true });
const actions = aiAgent.decideActions(domData.interactiveElements);
const result = executeActions(actions);🧪 E2E Testing
// Automated testing
const domData = getInteractiveContext({ injectTrackers: true });
const actions = [
{ agenticPurposeId: "email-input", actionType: "type", value: "[email protected]" },
{ agenticPurposeId: "submit-btn", actionType: "click" }
];
const result = executeActions(actions);
assert(result.results.every(r => r.success));API Reference
Core Functions
getInteractiveContext(options?)
Analyzes the DOM and returns interactive elements with scroll information.
Parameters:
options.injectTrackers?: boolean- Inject unique IDs for action trackingoptions.context?: DOMContext- Custom DOM context for extensions/iframes
Returns: DOMExtractionResult
executeActions(actions, context?)
Executes multiple actions on DOM elements.
Parameters:
actions: Action[]- Array of actions to executecontext?: DOMContext- Custom DOM context
Returns: ActionsResult
scrollToNewContent(context?)
Scrolls to new content or returns to top if no new content available.
Parameters:
context?: DOMContext- Custom DOM context
Returns: ScrollResult
Roadmap
- ✅ Smart Element Analysis: Automatically detects interactive elements (buttons, inputs, links)
- ✅ Advanced Categorization: Classifies elements by type and functionality
- ✅ Smart Scroll Management: Intelligent scroll control with automatic top return
- ✅ Visibility Filtering: Only processes actually visible elements
- ✅ Zero Dependencies: Pure JavaScript, no external libraries
- ✅ Cross-Platform: Works in modern browsers and Node.js
- ✅ Custom DOM Context: Support for analyzing different document contexts (extensions, iframes)
- ✅ Element Tracking: Inject unique IDs for agent tracking and interaction
- 🔲 Interaction History: Track and maintain history of interacted elements
- 🔲 Iframe Processing: Support for analyzing and interacting with iframe content
Contributing
Contributions are welcome! Please:
- Fork the project
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Made with ❤️ by Luis Chapa Morin
