@isoldex/sentinel
v4.1.0
Published
AI-powered browser automation with Gemini – fast, cheap Stagehand alternative
Maintainers
Readme
@isoldex/sentinel
Sentinel is an AI-powered browser automation framework built on Playwright and designed around the principle that web automation should be expressed in plain language, not CSS selectors or XPaths.
Describe what you want to do. Sentinel figures out how.
The fastest, cheapest alternative to Stagehand and BrowserUse — ~40× lower cost (Gemini Flash vs. GPT-4o), multi-LLM support, vision grounding, autonomous agent loop, and self-healing locators built in.
Table of Contents
- Why Sentinel over Stagehand?
- Features
- Installation
- Quick Start
- Configuration
- API Reference
- LLM Providers
- Architecture
- Error Handling
- Self-Healing Locators
- Intelligent Error Messages
- Prompt Cache
- Proxy Providers
- Examples
Why Sentinel over Stagehand?
Real-World Benchmark (April 2026)
Same model (Gemini Flash), same instructions, same machine. Default settings for both.
| Website | Sentinel | Stagehand v3.2 | Winner | |---|---|---|---| | Amazon (search + extract) | 3 steps, $0.003 | 6 actions, ~$0.03 | Sentinel (2x fewer steps, 10x cheaper) | | npmjs (search + click + extract) | 5 steps, $0.003 | 10 actions, ~$0.03 | Sentinel (2x fewer steps) | | Booking.com (search + extract hotels) | 6 steps, $0.003 | 13 actions, failed | Sentinel (Stagehand can't extract) | | Wikipedia DE (search + extract article) | 4 steps, $0.003 | timeout/failed | Sentinel (Stagehand crashes) | | Durchblicker.at (complex multi-step form) | 17 steps, $0.018 | 26 actions, failed | Sentinel (Stagehand gives up) |
Why the difference? Stagehand sends the entire accessibility tree to the LLM (29,000-51,000 tokens per action). Sentinel filters to the 50 most relevant elements (2,000-5,000 tokens). Result: 6-10x fewer tokens, faster responses, better decisions.
Feature Comparison
| | Sentinel | Stagehand | BrowserUse |
|---|---|---|---|
| Default model | Gemini 3 Flash | GPT-4o | GPT-4o / Claude |
| Cost per run | ~$0.002 | ~$0.03-0.08 | ~$0.05 |
| Tokens per action | 2-5k | 29-51k | ~10-20k |
| Separate planner model | ✅ (plannerModel option) | ❌ | ❌ |
| Declarative form filling | ✅ (fillForm(json)) | ❌ | ❌ |
| Network interception | ✅ (intercept()) | ❌ | ❌ |
| TOTP/MFA automation | ✅ (mfa: { secret }) | ❌ | ❌ |
| Cookie/overlay auto-recovery | ✅ (proactive) | partial | ❌ |
| Click-target verification | ✅ | ❌ | ❌ |
| Validation error detection | ✅ | ❌ | ❌ |
| Form field intelligence | ✅ (filled/unfilled status) | ❌ | ❌ |
| Self-healing locators | ✅ | ❌ | ❌ |
| Custom LLM provider | ✅ (OpenAI, Claude, Gemini, Ollama) | Partial (adapter-based) | OpenAI, Claude |
| Parallel execution | ✅ | ❌ | ❌ |
| CLI | ✅ | ❌ | ❌ |
| MCP server | ✅ | ❌ | ❌ |
| Playwright Test fixture | ✅ | ❌ | ❌ |
| Selector export | ✅ | ❌ | ❌ |
| Detection mode | aom / hybrid / vision | dom / hybrid | vision |
| Language | TypeScript | TypeScript | Python |
| Open source | ✅ MIT | ✅ MIT | ✅ MIT |
Features
| Feature | Description |
|---|---|
| Natural Language Actions | act('Click the login button') — no selectors needed |
| Structured Extraction | Zod-typed extract() with full TypeScript inference |
| Network Interception | intercept(pattern, trigger) — capture raw API data instead of scraping DOM |
| Declarative Form Filling | fillForm({ name: 'Max', email: '[email protected]' }) — one JSON, all fields |
| TOTP/MFA Automation | mfa: { type: 'totp', secret: '...' } — auto-generate 2FA codes during login |
| Autonomous Agent Loop | run(goal) — Plan, Execute, Verify, Reflect cycle |
| Separate Planner Model | plannerModel: 'gemini-3.1-pro' — smart model for planning, cheap model for execution |
| Detection Modes | mode: 'aom' \| 'hybrid' \| 'vision' — choose speed vs. reliability |
| Click-Target Verification | Verifies the element at click coordinates matches the intended target |
| Validation Error Detection | Reads form error messages (aria-invalid, role=alert) and shows them to the planner |
| Form Intelligence | Separates form fields from buttons, tracks filled/unfilled status |
| Widget Detection | 9 patterns: custom dropdowns, datepickers, sliders, CSS-library components |
| Cookie/Overlay Auto-Recovery | Proactively dismisses cookie banners before each step |
| Self-Healing Locators | Cache successful element lookups — skip the LLM on repeated calls |
| Vision Grounding | Vision-model fallback for canvas, shadow DOM, and custom components |
| Multi-LLM Support | OpenAI, Claude, Gemini, Ollama — swap providers with one line |
| Parallel Execution | Sentinel.parallel(tasks, { concurrency: 5 }) for batch automation |
| Stealth and Proxy | Bezier mouse movements, human-like typing delays, proxy rotation |
| Session Persistence | Save and restore cookies and localStorage for authenticated flows |
| Selector Export | AI-generated stable CSS selectors for Playwright test integration |
| CLI | npx sentinel run "goal" --url https://... — no code needed |
| MCP Server | Use Sentinel from Claude, Cursor, or any MCP-compatible AI assistant |
| Playwright Test Fixture | import { test } from '@isoldex/sentinel/test' — drop-in for existing suites |
| Intelligent Errors | Failure messages include which paths were tried and an actionable fix tip |
| CLI | npx @isoldex/sentinel run/act/extract/screenshot — no code required |
| MCP Server | Expose all browser tools directly to Cursor, Windsurf, Claude Desktop |
| Playwright Test Integration | import { test } from '@isoldex/sentinel/test' — ai fixture drop-in |
Installation
npm install @isoldex/sentinel playwright
npx playwright install chromiumPlaywright is a peer dependency. Install it alongside Sentinel. The playwright install chromium step downloads the browser binary.
Requirements: Node.js 18+
Quick Start
import { Sentinel, z } from '@isoldex/sentinel';
// With Gemini (built-in, no extra package needed):
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY! });
// Or with any other provider — see LLM Providers section:
// const sentinel = new Sentinel({ apiKey: '', provider: new OpenAIProvider({ apiKey: '...' }) });
await sentinel.init();
await sentinel.goto('https://news.ycombinator.com');
// Extract structured data
const data = await sentinel.extract('Get the top 3 stories', z.object({
stories: z.array(z.object({
title: z.string(),
points: z.number(),
}))
}));
console.log(data.stories);
// Natural language actions
await sentinel.act('Click on the "new" link in the header');
await sentinel.act('Fill "[email protected]" into the email field');
await sentinel.close();Sentinel works with any supported LLM provider. The built-in provider uses Gemini and requires a GEMINI_API_KEY. For other providers (OpenAI, Claude, Ollama), pass a provider option and set apiKey to an empty string — no .env required.
# Only needed when using the built-in Gemini provider:
GEMINI_API_KEY=your_api_key_here
GEMINI_VERSION=gemini-3-flash-preview # optional, defaults to gemini-3-flash-previewConfiguration
new Sentinel(options: SentinelOptions)
| Option | Type | Default | Description |
|---|---|---|---|
| apiKey | string | — | API key for the built-in Gemini provider. Pass '' when using a custom provider. |
| headless | boolean | false | Run browser in headless mode |
| browser | 'chromium' \| 'firefox' \| 'webkit' | 'chromium' | Browser engine. Note: CDP/AOM is Chromium-only; Firefox and WebKit fall back to DOM parsing. |
| viewport | { width: number; height: number } | 1280x720 | Viewport dimensions |
| verbose | 0 \| 1 \| 2 \| 3 | 1 | Log verbosity: 0 = silent, 1 = key actions, 2 = full debug, 3 = LLM decision JSON + chunk stats |
| enableCaching | boolean | true | Cache AOM state between calls (500ms TTL). Set to false for always-fresh state. |
| visionFallback | boolean | false | Enable Vision fallback when AOM cannot locate an element. Uses the configured provider's analyzeImage — works with Gemini, OpenAI, Claude, and Ollama vision models. |
| provider | LLMProvider | GeminiService | Custom LLM provider (see LLM Providers) |
| sessionPath | string | — | Path to a session file. If the file exists, it is loaded on init(). Saves cookies and localStorage only. |
| userDataDir | string | — | Path to a persistent browser profile directory. Persists cookies, localStorage, IndexedDB, and ServiceWorkers. Required for services that use IndexedDB for auth (e.g. WhatsApp Web). Takes precedence over sessionPath. |
| proxy | ProxyOptions \| IProxyProvider | — | Static proxy config or a dynamic proxy provider (see Proxy Providers) |
| humanLike | boolean | false | Human-like mouse movement via cubic Bézier curves, pre-click pauses (80–200ms), and per-keystroke delays (30–80ms) |
| domSettleTimeoutMs | number | 3000 | Maximum time (ms) to wait for the DOM to settle after an action |
| locatorCache | boolean \| string | false | Cache successful element lookups. true = in-memory, 'file.json' = file-persisted. Skips LLM on repeated calls. |
| promptCache | boolean \| string | false | Cache LLM responses by prompt hash. true = in-memory (200 entries, LRU), 'file.json' = file-persisted |
| maxElements | number | 50 | Max interactive elements sent to the LLM per act() call. Filters by keyword relevance when the page has more. |
| mode | 'aom' \| 'hybrid' \| 'vision' | 'aom' | Element detection mode. aom = fast/cheap, hybrid = AOM + vision fallback, vision = screenshot-based (CUA-style) |
| plannerModel | string | — | Gemini model name for the agent planner (e.g. 'gemini-3.1-pro-preview'). Uses a stronger model for planning decisions while keeping a cheap model for execution. |
| plannerProvider | LLMProvider | — | Custom LLM provider for the planner (overrides plannerModel). |
| mfa | { type: 'totp', secret: string } | — | TOTP/MFA configuration. When set, the agent auto-generates 2FA codes during login flows. The secret is the base32 key from your authenticator app. |
ProxyOptions
| Field | Type | Description |
|---|---|---|
| server | string | Proxy server URL, e.g. http://proxy.example.com:8080 |
| username | string | Optional proxy username |
| password | string | Optional proxy password |
API Reference
Core Actions
sentinel.init(): Promise<void>
Initialize the browser and all internal engines. Must be called before any other method.
await sentinel.init();sentinel.goto(url: string): Promise<void>
Navigate to a URL and wait for the DOM to settle.
await sentinel.goto('https://example.com');sentinel.close(): Promise<void>
Close the browser and release all resources.
await sentinel.close();sentinel.act(instruction, options?): Promise<ActionResult>
Perform a natural language action on the current page. After every action, Sentinel runs semantic verification and retries automatically on weak confidence.
await sentinel.act('Click the "Add to Cart" button');
await sentinel.act('Fill "[email protected]" into the email field');
await sentinel.act('Select "Germany" from the country dropdown');
await sentinel.act('Press Enter');
await sentinel.act('Scroll down');
await sentinel.act('Double-click the product image');Variable interpolation is supported:
await sentinel.act('Fill %email% into the email field', {
variables: { email: '[email protected]' }
});Supported action types: click, fill, append, hover, press, select, double-click, right-click, scroll-down, scroll-up, scroll-to
The append action adds text to the end of an input field without clearing its existing content:
await sentinel.act('Append " (urgent)" to the subject line');ActOptions
| Field | Type | Description |
|---|---|---|
| variables | Record<string, string> | Values to interpolate into the instruction string |
| retries | number | Override the default retry count (default: 2) |
ActionResult
| Field | Type | Description |
|---|---|---|
| success | boolean | Whether the action was successfully verified |
| message | string | Human-readable outcome description |
| action | string (optional) | The resolved action that was executed |
Data Extraction
sentinel.extract<T>(instruction, schema): Promise<T>
Extract structured data from the current page. The schema can be a Zod schema or a raw JSON Schema object. TypeScript generics are inferred automatically from the schema.
import { Sentinel, z } from '@isoldex/sentinel';
const result = await sentinel.extract(
'Get all product names and prices',
z.object({
products: z.array(z.object({
name: z.string(),
price: z.number(),
}))
})
);
// result.products is typed as { name: string; price: number }[]Observation
sentinel.observe(instruction?): Promise<ObserveResult[]>
Return a list of interactive elements visible on the current page, optionally filtered by a natural language hint.
const elements = await sentinel.observe();
const loginElements = await sentinel.observe('Find login-related elements');Autonomous Agent
sentinel.run(goal, options?): Promise<AgentResult>
Run a fully autonomous multi-step agent to achieve a high-level goal. The agent operates in a Plan → Execute → Verify → Reflect loop until the goal is met, the step limit is reached, or an abort condition triggers.
const result = await sentinel.run(
'Go to Amazon, search for "mechanical keyboard under 100 euros", and extract the top 5 results',
{
maxSteps: 20,
onStep: (event) => {
console.log(`Step ${event.stepNumber}: ${event.instruction}`);
console.log(` Reasoning: ${event.reasoning}`);
},
}
);
console.log(result.success); // boolean
console.log(result.goalAchieved); // boolean
console.log(result.totalSteps); // number
console.log(result.message); // human-readable summary
console.log(result.history); // AgentStepEvent[]
console.log(result.data); // structured data extracted during the run (if any)AgentRunOptions
| Field | Type | Default | Description |
|---|---|---|---|
| maxSteps | number | 15 | Maximum number of steps before aborting |
| onStep | (event: AgentStepEvent) => void | — | Callback invoked after each step |
AgentResult
| Field | Type | Description |
|---|---|---|
| success | boolean | Whether the agent considers the goal achieved |
| goalAchieved | boolean | Result of the final LLM reflection check |
| totalSteps | number | Number of steps executed |
| message | string | Human-readable outcome |
| history | AgentStepEvent[] | Full step-by-step history |
| data | any (optional) | Structured data extracted by an extract step during the run |
The agent automatically aborts if the same instruction repeats three times without progress (loop detection) or if three consecutive steps fail.
AgentResult.selectors is also populated after each run — a camelCase map of instruction slugs to the most stable CSS selector found for that element. Copy them directly into Playwright tests.
sentinel.fillForm(data, options?): Promise<AgentResult>
Fill a form declaratively with a JSON object. Sentinel maps keys to form fields automatically via LLM — no step-by-step instructions needed. Works in any language.
await sentinel.goto('https://www.durchblicker.at/autoversicherung');
await sentinel.fillForm({
brand: 'BMW',
model: '4er',
year: 2020,
fuel: 'Benzin',
postalCode: '1010',
});
// Sentinel maps: brand → Marke, model → Modell, year → Baujahr, etc.
// Fills all fields top-to-bottom, then clicks submit.sentinel.intercept(urlPattern, trigger): Promise<T[]>
Capture raw API responses during an action. Extracts structured JSON data directly from network traffic instead of scraping the DOM — more reliable, complete, and fast.
const hotels = await sentinel.intercept('graphql', async () => {
await sentinel.act('Click the search button');
});
// hotels = [{ data: { searchResults: [{ name: "Hotel A", price: 89 }, ...] } }]sentinel.runStream(goal, options?): AsyncGenerator<AgentStepEvent | AgentResult>
Streams agent steps in real time. Yields one AgentStepEvent per step, then the final AgentResult. Designed for Server-Sent Events in Next.js App Router routes or any for await consumer.
// Next.js API Route (App Router)
export async function GET() {
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY! });
await sentinel.init();
await sentinel.goto('https://example.com');
const stream = new ReadableStream({
async start(controller) {
for await (const event of sentinel.runStream('Find the checkout button')) {
controller.enqueue(`data: ${JSON.stringify(event)}\n\n`);
}
controller.close();
await sentinel.close();
},
});
return new Response(stream, { headers: { 'Content-Type': 'text/event-stream' } });
}Sentinel.parallel(tasks, options?): Promise<ParallelResult[]>
Run multiple independent agent tasks in parallel, each in its own browser session. A worker pool limits simultaneous sessions to concurrency (default: 3). Results are returned in input order. One task failing never affects others.
const results = await Sentinel.parallel(
[
{ goal: 'Extract top 5 products from amazon.de/s?k=laptop', url: 'https://amazon.de/s?k=laptop' },
{ goal: 'Extract top 5 products from amazon.de/s?k=phone', url: 'https://amazon.de/s?k=phone' },
{ goal: 'Get homepage headline', url: 'https://news.ycombinator.com' },
],
{
concurrency: 3,
sentinelOptions: { apiKey: process.env.GEMINI_API_KEY!, headless: true },
onProgress: (completed, total, result) => {
console.log(`${completed}/${total} done — ${result.success ? 'ok' : 'failed'}`);
},
}
);ParallelOptions
| Field | Type | Default | Description |
|---|---|---|---|
| concurrency | number | 3 | Max simultaneous browser sessions |
| sentinelOptions | SentinelOptions | — | Options applied to every session |
| onProgress | (completed, total, result) => void | — | Fires after each task completes |
Tab Management
// Open a new tab and optionally navigate to a URL
const tabIndex = await sentinel.newTab('https://google.com');
// Switch the active tab
await sentinel.switchTab(0);
await sentinel.switchTab(tabIndex);
// Close a tab by index
await sentinel.closeTab(tabIndex);
// Number of currently open tabs
console.log(sentinel.tabCount);Note: When using Firefox or WebKit, CDP is not available. AOM-based state parsing falls back to DOM on those browsers.
Session Persistence
Save and restore authenticated sessions across runs — cookies and localStorage included.
// First run: log in manually and save the session
await sentinel.goto('https://github.com/login');
await sentinel.act('Fill "myuser" into the username field');
await sentinel.act('Fill "mypassword" into the password field');
await sentinel.act('Click the sign in button');
await sentinel.saveSession('./sessions/github.json');
// Subsequent runs: load the saved session and skip the login page
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY!,
sessionPath: './sessions/github.json', // loaded automatically on init()
});
await sentinel.init();
await sentinel.goto('https://github.com'); // already authenticatedsentinel.saveSession(filePath: string): Promise<void>
Writes Playwright storageState (cookies + localStorage) to a JSON file.
sentinel.hasLoginForm(): Promise<boolean>
Returns true if the current page contains a password input field.
Record and Replay
Capture any automation session as a replayable workflow.
// Start recording
sentinel.startRecording('checkout-flow');
await sentinel.goto('https://shop.example.com');
await sentinel.act('Click the login button');
await sentinel.act('Fill "[email protected]" into the email field');
await sentinel.act('Click Add to Cart');
// Stop and get the workflow object
const workflow = sentinel.stopRecording();
// Export as TypeScript source code
const code = sentinel.exportWorkflowAsCode(workflow);
console.log(code);
// Export as JSON
const json = sentinel.exportWorkflowAsJSON(workflow);
// Replay the workflow
await sentinel.replay(workflow);Vision
sentinel.screenshot(): Promise<Buffer>
Take a PNG screenshot of the current viewport. Returns a Buffer.
const png = await sentinel.screenshot();sentinel.describeScreen(): Promise<string>
Uses the configured provider's vision capability to produce a natural language description of the current page. Requires visionFallback: true in SentinelOptions.
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY!,
visionFallback: true,
});
const description = await sentinel.describeScreen();
console.log(description);Vision Grounding also activates automatically inside act() when the AOM state parser cannot locate the target element — no additional code is needed.
Observability
Events
Sentinel extends Node.js EventEmitter. The following events are emitted:
sentinel.on('action', (event) => {
console.log('Action:', event.instruction, event.result);
});
sentinel.on('navigate', (event) => {
console.log('Navigated to:', event.url);
});
sentinel.on('close', () => {
console.log('Browser closed');
});Direct Page Access
// Raw Playwright Page and BrowserContext objects
const page = sentinel.page;
const context = sentinel.context;Token Tracking
const usage = sentinel.getTokenUsage();
console.log(usage);
// {
// totalInputTokens: 9800,
// totalOutputTokens: 2600,
// totalTokens: 12400,
// estimatedCostUsd: 0.00093,
// entries: [...]
// }
// Export full log as JSON to a file
sentinel.exportLogs('./logs/session.json');LLM Providers
Sentinel supports four LLM providers out of the box. Pass the provider via the provider option. The built-in shortcut (apiKey without an explicit provider) uses Gemini.
OpenAI
Requires: npm install openai
import { Sentinel, OpenAIProvider } from '@isoldex/sentinel';
const sentinel = new Sentinel({
apiKey: '',
provider: new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4o',
}),
});Claude
Requires: npm install @anthropic-ai/sdk
import { Sentinel, ClaudeProvider } from '@isoldex/sentinel';
const sentinel = new Sentinel({
apiKey: '',
provider: new ClaudeProvider({
apiKey: process.env.ANTHROPIC_API_KEY!,
model: 'claude-sonnet-4-6',
}),
});Gemini
Built-in — no extra package needed.
import { Sentinel, GeminiProvider } from '@isoldex/sentinel';
const sentinel = new Sentinel({
apiKey: '',
provider: new GeminiProvider({
apiKey: process.env.GEMINI_API_KEY!,
model: 'gemini-3-flash-preview', // or set GEMINI_VERSION in .env
}),
});Shorthand (uses Gemini implicitly):
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY! });Ollama (local)
Requires a running Ollama instance. No additional npm package needed.
import { Sentinel, OllamaProvider } from '@isoldex/sentinel';
const sentinel = new Sentinel({
apiKey: '',
provider: new OllamaProvider({
model: 'llama3.2',
baseURL: 'http://localhost:11434', // default
}),
});Provider Comparison
| Provider | Class | Default Model | Peer Dependency | Notes |
|---|---|---|---|---|
| OpenAI | OpenAIProvider | gpt-4o | npm install openai | Supports any OpenAI-compatible API via baseURL |
| Claude | ClaudeProvider | claude-sonnet-4-6 | npm install @anthropic-ai/sdk | — |
| Gemini | GeminiProvider | gemini-3-flash-preview | none (bundled) | Set GEMINI_VERSION env var to override model |
| Ollama | OllamaProvider | — (required) | none | Runs locally; no API key needed |
All providers implement automatic retry with exponential backoff on rate limit errors (HTTP 429/503), connection resets, and timeouts (up to 3 attempts).
Custom Provider
Implement the LLMProvider interface to integrate any LLM:
import type { LLMProvider, SchemaInput } from '@isoldex/sentinel';
class MyProvider implements LLMProvider {
async generateStructuredData<T>(prompt: string, schema: SchemaInput<T>): Promise<T> {
// call your API and return parsed, typed data
}
async generateText(prompt: string, systemInstruction?: string): Promise<string> {
// call your API and return plain text
}
}
const sentinel = new Sentinel({ apiKey: '', provider: new MyProvider() });Architecture
Sentinel is composed of five cooperating subsystems:
StateParser
Produces a normalized list of interactive UI elements from the current page state.
- AOM (primary) — reads the full accessibility tree via
CDP Accessibility.getFullAXTree. Enriches generic button names with card/container context by walking AOM ancestors and scraping nearby DOM headings, paragraphs, and badge spans. This allows the LLM to distinguish identical-named buttons across card UIs (e.g. multiple "Select plan" buttons each resolve to a unique, context-rich label). - DOM fallback — used on Firefox and WebKit where CDP is unavailable.
- Form input fallback — handles inputs not exposed through the accessibility tree.
ActionEngine
Translates a natural language instruction into a Playwright action using a three-layer fallback:
- Coordinate click — computes element coordinates from the AOM bounding box and uses
page.mouse.wheel/page.mouse.click. - Vision Grounding — if coordinate click fails or the element is off-screen, captures a screenshot and uses the configured provider's vision capability to locate the element visually (requires
visionFallback: true; supported by Gemini, OpenAI, Claude, and Ollama vision models). - Playwright locators — four-strategy chain: exact role+name, inexact role+name, CSS
:has-text, plain text locator.
Before clicking, a viewport bounds check confirms the element is visible. If not, scrollIntoViewIfNeeded is called before retrying. Radio and checkbox inputs styled to hide the native control are handled via querySelector + closest('label') traversal.
Verifier
Confirms that an action produced the expected change without defaulting to LLM calls for common cases:
- URL/title change — navigation detected by comparing URLs before and after.
- Checked-state fast path — radio/checkbox selection detected directly (confidence 0.92).
- DOM delta — detects significant DOM mutations.
- LLM semantic verification — full before/after state comparison sent to the LLM when fast paths are inconclusive.
On LLM errors, the Verifier returns { success: true, confidence: 0.5 } rather than throwing, so automation continues rather than aborting.
AgentLoop
Implements the autonomous agent cycle:
Plan → Execute → Verify → Reflect → (repeat)- Plan: The
Planneruses the current page state and a rolling memory window to decide the next action. - Execute: The planned instruction is passed to the
ActionEngine. - Verify: The
Verifierconfirms success. - Reflect: After the loop exits, a final LLM reflection checks whether the goal was actually achieved.
Abort conditions: three consecutive failures, instruction loop detected (same instruction repeated three times without progress), or maxSteps reached.
DOM Settle
After every navigation or action, Sentinel waits for the DOM to stabilize using a MutationObserver that resolves after 300ms of silence (hard cap: 3 seconds). This replaces the previous networkidle wait and correctly handles SPA route transitions that do not produce network activity.
Error Handling
All Sentinel errors extend SentinelError, which carries a code string and an optional context object.
import {
SentinelError,
ActionError,
ExtractionError,
NavigationError,
AgentError,
NotInitializedError,
} from '@isoldex/sentinel';
try {
await sentinel.act('Click the submit button');
} catch (err) {
if (err instanceof ActionError) {
console.error('Action failed:', err.message, err.code, err.context);
}
}| Class | Code | When thrown |
|---|---|---|
| SentinelError | — | Base class; never thrown directly |
| ActionError | ACTION_FAILED | Action fails after all retries |
| ExtractionError | EXTRACTION_FAILED | Structured extraction fails |
| NavigationError | NAVIGATION_FAILED | Navigation to a URL fails |
| AgentError | AGENT_ERROR | Agent loop exceeds max steps or gets stuck |
| NotInitializedError | NOT_INITIALIZED | Any method called before init() |
Self-Healing Locators
Enable locator caching to skip the LLM on repeated act() calls:
// In-memory: cached for the lifetime of this instance
const sentinel = new Sentinel({ apiKey, locatorCache: true });
// File-persisted: survives process restarts — ideal for test suites
const sentinel = new Sentinel({ apiKey, locatorCache: '.sentinel-cache.json' });On the first call Sentinel runs the full LLM pipeline and caches { action, role, name } for the resolved element. On subsequent calls with the same URL and instruction it finds the element directly in the current DOM — no LLM call, zero token cost.
If the cached element is no longer present or the action fails, the entry is automatically invalidated and the LLM path takes over.
Provide a custom cache (e.g. Redis-backed for distributed test runs) by implementing ILocatorCache:
import type { ILocatorCache, CachedLocator } from '@isoldex/sentinel';
class RedisLocatorCache implements ILocatorCache {
get(url: string, instruction: string): CachedLocator | undefined { /* ... */ }
set(url: string, instruction: string, entry: CachedLocator): void { /* ... */ }
invalidate(url: string, instruction: string): void { /* ... */ }
}Intelligent Error Messages
When all action paths fail, Sentinel returns a structured error with the full diagnostic:
const result = await sentinel.act('Click the checkout button');
if (!result.success) {
console.log(result.message);
// Action fehlgeschlagen: "Click the checkout button" auf "Checkout"
// 3 Pfade versucht:
// • coordinate-click: Element "Checkout" is outside viewport at (640, 950) — triggering scroll fallback
// • vision-grounding: Element nicht im Screenshot gefunden
// • locator-fallback: strict mode violation: locator resolved to 3 elements
// Tipp: Element könnte außerhalb des sichtbaren Bereichs sein. Versuche zuerst:
// sentinel.act('scroll to "Checkout"')
console.log(result.attempts);
// [
// { path: 'coordinate-click', error: '...' },
// { path: 'vision-grounding', error: '...' },
// { path: 'locator-fallback', error: '...' },
// ]
}result.attempts is only present on failure and lists each attempted path with its specific error.
Prompt Cache
Cache LLM responses by a hash of the prompt + DOM state. A cache hit costs zero tokens and skips the model entirely. The cache naturally misses when the URL, page title, or element list changes — no manual invalidation needed.
// In-memory (LRU, 200 entries)
const sentinel = new Sentinel({ apiKey, promptCache: true });
// File-persisted — survives process restarts
const sentinel = new Sentinel({ apiKey, promptCache: 'sentinel-prompt-cache.json' });
// Flush the cache programmatically (e.g. between test runs)
sentinel.clearPromptCache();Plug in your own backend by implementing IPromptCache:
import type { IPromptCache } from '@isoldex/sentinel';
class RedisPromptCache implements IPromptCache {
async get(key: string): Promise<string | undefined> { /* ... */ }
async set(key: string, value: string): Promise<void> { /* ... */ }
}Proxy Providers
The proxy option accepts either a static ProxyOptions object or a dynamic IProxyProvider that rotates proxies on every request.
Round-Robin (static list)
import { RoundRobinProxyProvider } from '@isoldex/sentinel';
const proxy = new RoundRobinProxyProvider([
{ server: 'http://p1:8080', username: 'u', password: 'pw' },
{ server: 'http://p2:8080', username: 'u', password: 'pw' },
]);
const sentinel = new Sentinel({ apiKey, proxy });Webshare (API-backed, lazy-fetch)
import { WebshareProxyProvider } from '@isoldex/sentinel';
const proxy = new WebshareProxyProvider({
apiKey: process.env.WEBSHARE_KEY!,
protocol: 'http', // or 'socks5'
country: 'DE', // optional geo-filter
});
const sentinel = new Sentinel({ apiKey, proxy });Proxies are fetched once on the first getProxy() call and cached. Concurrent calls during the initial fetch wait on the same Promise — no duplicate API requests. releaseProxy() is called automatically on sentinel.close().
Custom Provider
import type { IProxyProvider, ProxyOptions } from '@isoldex/sentinel';
class MyProxyPool implements IProxyProvider {
getProxy(): ProxyOptions { return { server: 'http://...' }; }
releaseProxy?(): void { /* return proxy to pool */ }
}Examples
See the examples/ directory for ready-to-run scripts:
hacker-news.ts— Extract top stories from Hacker Newsgoogle-search.ts— Search Google, extract results, and demo Record and Replayagent-amazon.ts— Autonomous shopping agent on Amazon
Licensed under MIT. Author: Huseyin Aras — [email protected]
