@memberjunction/computer-use
v5.7.0
Published
MemberJunction: Computer Use - Vision-to-Action engine for driving web browsers via LLM reasoning over screenshots. MJ-independent base layer.
Keywords
Readme
@memberjunction/computer-use
A Vision-to-Action browser automation engine that enables LLM-driven web interactions through screenshot analysis and natural language reasoning.
Overview
The ComputerUse package provides a sophisticated abstraction layer for LLM-controlled browser automation. Unlike traditional browser automation tools that require explicit selectors and DOM knowledge, ComputerUse allows AI models to interact with web pages the same way humans do: by seeing screenshots and reasoning about what actions to take next.
Core Concept: The engine captures screenshots of web pages, sends them to vision-capable LLMs, and executes the actions the model determines are necessary to accomplish a natural language goal. A separate judge component evaluates progress and determines goal completion.
This package serves as the foundational layer for MemberJunction's Computer Use capabilities, designed to be extended by the @memberjunction/computer-use-engine package for full MJ integration.
Key Features
🎯 Vision-Based Automation
- Screenshot-Driven Reasoning: LLMs analyze visual state instead of DOM structure
- Normalized Coordinate System: Actions use 1000x1000 space for viewport independence
- Ring Buffer History: Maintains bounded screenshot history for context
- Adaptive Delays: Configurable screenshot capture delays for render stability
🧠 Intelligent Judge System
- HybridJudge (Recommended): Combines heuristics with LLM evaluation
- Zero-cost stuck detection (identical screenshots)
- Navigation loop detection (URL cycles)
- Error accumulation tracking
- LLM fallback for ambiguous cases
- HeuristicJudge: Fast, free pattern-based evaluation
- LLMJudge: Vision-capable model for nuanced goal assessment
- Configurable Frequency: Evaluate every step, every N steps, or on stagnation
🔐 Sophisticated Authentication
- Per-Domain Bindings: Different auth methods for different domains
- Lazy Application: Auth applied on first navigation to each domain
- Global Callback Support: SSO and MFA flows via custom callbacks
- Multiple Auth Types: Basic, Bearer, API Key, OAuth2, Cookies, LocalStorage, Custom
🛠️ Extensible Tool System
- Type-Safe Tools: Generic
ComputerUseTool<TInput, TOutput>with JSON Schema - Controller Integration: Tools exposed to LLM for custom operations
- Error Resilience: Tool failures don't crash the step
- Automatic Schema Generation: Input/output validation via JSON Schema
🌐 Navigation Control
- Domain Filtering: Whitelist/blacklist with wildcard support
- Pattern Matching:
*.example.com, exact domains, or wildcard* - Precedence Rules: BlockedDomains override AllowedDomains
- Pre-Navigation Checks: Validation before browser navigation
🔌 Browser Adapter Abstraction
- Strategy Pattern: Swap browser implementations (Playwright, Selenium, Puppeteer)
- PlaywrightBrowserAdapter: Production-ready Chromium automation
- Isolated Interface: No direct browser library dependencies in core logic
- Extensibility: Implement
BaseBrowserAdapterfor custom browsers
🎪 Robust Error Handling
- No Uncaught Exceptions: All errors wrapped in
ComputerUseResult - Categorized Errors: Browser crashes, timeouts, LLM errors, auth failures, etc.
- Partial Success: Tool/action failures captured but don't halt execution
- Graceful Degradation: Missing LLM data returns empty actions
🎛️ Flexible Configuration
- Model Selection: Separate controller and judge models
- Custom Prompts: Template-based with variable substitution
- Browser Settings: Headless mode, viewport size, user agent, timeouts
- Performance Tuning: Max steps, screenshot depth, delays
- Cooperative Cancellation:
Stop()method for graceful termination
Installation
npm install @memberjunction/computer-useRequired Peer Dependency: Install Playwright for the default browser adapter:
npm install playwright
npx playwright install chromiumBasic Usage
Simple Goal Execution
import { ComputerUseEngine, RunComputerUseParams, ModelConfig } from '@memberjunction/computer-use';
// Create engine instance
const engine = new ComputerUseEngine();
// Configure the run
const params = new RunComputerUseParams();
params.Goal = "Navigate to news.ycombinator.com and find the top story title";
params.StartUrl = "https://news.ycombinator.com";
params.MaxSteps = 10;
params.Headless = false; // Watch the browser work
// Configure models (controller analyzes screenshots, judge evaluates completion)
params.ControllerModel = new ModelConfig("anthropic", "claude-3-5-sonnet-20241022");
params.JudgeModel = new ModelConfig("anthropic", "claude-3-5-sonnet-20241022");
// Execute
const result = await engine.Run(params);
if (result.Success) {
console.log(`Goal completed in ${result.TotalSteps} steps!`);
console.log(`Final URL: ${result.FinalUrl}`);
console.log(`Judge verdict: ${result.FinalJudgeVerdict?.Reason}`);
} else {
console.error(`Failed: ${result.Status}`);
if (result.Error) {
console.error(`Error: ${result.Error.Message}`);
}
}
// Access step-by-step history
for (const step of result.Steps) {
console.log(`Step ${step.StepNumber}: ${step.ControllerReasoning}`);
console.log(`Actions: ${step.ActionsRequested.length}`);
// step.Screenshot contains base64 PNG for audit/debugging
}Using Authentication
import {
RunComputerUseParams,
DomainAuthBinding,
BasicAuthMethod,
BearerTokenAuthMethod,
ComputerUseAuthConfig
} from '@memberjunction/computer-use';
const params = new RunComputerUseParams();
params.Goal = "Login to example.com and check notifications";
params.StartUrl = "https://example.com/login";
// Configure per-domain authentication
const authConfig = new ComputerUseAuthConfig();
// Basic auth for login form (credentials shown to LLM)
authConfig.Bindings.push(new DomainAuthBinding(
["example.com"],
new BasicAuthMethod({
Username: "[email protected]",
Password: "secure-password",
FormLogin: true // LLM will enter credentials into form
})
));
// Bearer token for API subdomain
authConfig.Bindings.push(new DomainAuthBinding(
["api.example.com"],
new BearerTokenAuthMethod({
Token: "eyJhbGciOi..."
})
));
params.Auth = authConfig;
const result = await engine.Run(params);Adding Custom Tools
import { ComputerUseTool, RunComputerUseParams } from '@memberjunction/computer-use';
// Define a tool for the controller LLM to invoke
const weatherTool = new ComputerUseTool<{ city: string }, { temperature: number; condition: string }>(
"get_weather",
"Get current weather for a city",
{
type: "object",
properties: {
city: { type: "string", description: "City name" }
},
required: ["city"]
},
async (args) => {
// Your implementation (API call, database lookup, etc.)
const weather = await fetchWeatherAPI(args.city);
return {
temperature: weather.temp,
condition: weather.condition
};
}
);
const params = new RunComputerUseParams();
params.Goal = "Check the weather in San Francisco using the weather tool";
params.Tools = [weatherTool];
const result = await engine.Run(params);
// Access tool call history
for (const step of result.Steps) {
for (const toolCall of step.ToolCalls || []) {
console.log(`Tool: ${toolCall.ToolName}`);
console.log(`Input: ${JSON.stringify(toolCall.Input)}`);
console.log(`Output: ${JSON.stringify(toolCall.Output)}`);
}
}Navigation Control
const params = new RunComputerUseParams();
params.Goal = "Research competitor pricing on trusted sites only";
params.StartUrl = "https://google.com";
// Only allow specific trusted domains
params.AllowedDomains = [
"google.com",
"competitor1.com",
"competitor2.com",
"*.trusteddomain.com" // Wildcard for subdomains
];
// Block known problematic domains
params.BlockedDomains = [
"ads.example.com",
"tracker.analytics.com"
];
const result = await engine.Run(params);
// Browser will be prevented from navigating to blocked/non-allowed domainsJudge Configuration
import { RunComputerUseParams, EveryNStepsFrequency, OnStagnationFrequency } from '@memberjunction/computer-use';
const params = new RunComputerUseParams();
params.Goal = "Find and save the top 5 tech news headlines";
// Option 1: Evaluate every 3 steps (cost/accuracy balance)
params.JudgeFrequency = new EveryNStepsFrequency(3);
// Option 2: Evaluate only when stuck (cost optimization)
params.JudgeFrequency = new OnStagnationFrequency(3); // After 3 identical screenshots
// Option 3: Evaluate every step (maximum accuracy, higher cost)
// params.JudgeFrequency = new EveryStepFrequency(); // Default
const result = await engine.Run(params);
// Check judge evaluations
for (const step of result.Steps) {
if (step.JudgeVerdict) {
console.log(`Step ${step.StepNumber} Judge:`);
console.log(` Done: ${step.JudgeVerdict.Done}`);
console.log(` Confidence: ${step.JudgeVerdict.Confidence}`);
console.log(` Reason: ${step.JudgeVerdict.Reason}`);
console.log(` Feedback: ${step.JudgeVerdict.Feedback}`);
}
}Architecture
Core Components
ComputerUseEngine
The main orchestrator that manages the entire vision-to-action loop:
Run(params)
→ Initialize Components (NavGuard, AuthHandler, ToolProvider, Judge)
→ Launch Browser
→ Apply Global Auth Callback
→ Navigate to StartUrl
→ Main Step Loop:
├─ Capture Screenshot → Ring Buffer
├─ Build Controller Prompt (goal, screenshots, tools, feedback)
├─ Execute Controller Prompt → LLM reasoning
├─ Parse Response → Actions + Tool Calls
├─ Execute Tool Calls → Wrapped results
├─ Scale Actions (1000x1000 → viewport pixels)
├─ Execute Browser Actions (with NavGuard + Auth)
├─ Evaluate Judge (if frequency indicates)
├─ Check Judge verdict (Done=true → exit)
└─ Track empty steps (3+ consecutive → abort)
→ Close Browser
→ Return ComputerUseResultExtension Points (protected virtual methods):
executeControllerPrompt()- Override to use custom LLM routingexecuteJudgePrompt()- Override for custom judge executiononStepComplete()- Hook called after each step (logging, persistence)onRunComplete()- Hook called at run completion (cleanup, finalization)
Browser Adapters
BaseBrowserAdapter (abstract interface):
abstract class BaseBrowserAdapter {
abstract Launch(config: BrowserConfig): Promise<void>;
abstract Close(): Promise<void>;
abstract Navigate(url: string): Promise<void>;
abstract CaptureScreenshot(): Promise<string>; // Base64 PNG
abstract ExecuteAction(action: BrowserAction): Promise<ActionResult>;
abstract SetExtraHeaders(headers: Record<string, string>): Promise<void>;
abstract SetCookies(cookies: Cookie[]): Promise<void>;
abstract SetLocalStorage(domain: string, items: Record<string, string>): Promise<void>;
}PlaywrightBrowserAdapter (default implementation):
- Uses Playwright's Chromium for production automation
- Handles all browser action types (Click, Type, Keypress, Scroll, Navigate, etc.)
- Manages viewport scaling and coordinate normalization
- Supports custom headers, cookies, and localStorage injection
Custom Adapters: Implement BaseBrowserAdapter to use Selenium, Puppeteer, or custom browsers.
Judge System
Judge Verdict Structure:
interface JudgeVerdict {
Done: boolean; // Has the goal been accomplished?
Confidence: number; // 0.0 - 1.0 certainty score
Reason: string; // Explanation of the verdict
Feedback: string; // Actionable guidance for controller
SuggestedNextAction?: string; // Optional hint (not enforced)
}Judge Types:
HeuristicJudge (zero-cost):
- Stuck detection: 3+ identical screenshots in a row
- Loop detection: 2 or 3-length URL navigation cycles
- Error accumulation: 3+ consecutive step errors
- Returns
Confidence > 0when problem detected,Confidence = 0when inconclusive
LLMJudge (vision-based):
- Analyzes screenshots with LLM to evaluate goal completion
- Nuanced understanding of visual state
- Higher cost but handles complex goal states
HybridJudge (recommended default):
- Heuristics first (free, fast)
- LLM fallback when heuristics are inconclusive (cost optimization)
- Best balance of cost and accuracy
Authentication System
Lifecycle:
- Global Callback (optional): Runs once after browser launch for SSO/MFA
- Per-Domain Bindings: Applied lazily on first navigation to each domain
- First-Match Wins: First binding in array order matching the domain applies
- One-Time Application:
appliedDomainsSet prevents re-application
Auth Methods:
// Basic Auth (FormLogin variant with credential exposure)
new BasicAuthMethod({
Username: "user",
Password: "pass",
FormLogin: true // Credentials shown to controller LLM for form entry
})
// Bearer Token (Authorization header)
new BearerTokenAuthMethod({
Token: "eyJhbGci..."
})
// API Key Header
new APIKeyHeaderAuthMethod({
HeaderName: "X-API-Key",
HeaderValue: "sk-..."
})
// OAuth2 Client Credentials
new OAuthClientCredentialsAuthMethod({
ClientId: "client-id",
ClientSecret: "secret",
TokenEndpoint: "https://auth.example.com/token",
Scopes: ["read", "write"]
})
// Cookie Injection
new CookieInjectionAuthMethod({
Cookies: [{
Name: "session",
Value: "abc123",
Domain: "example.com",
Path: "/",
Secure: true
}]
})
// LocalStorage Injection (pre-populated at context creation)
new LocalStorageInjectionAuthMethod({
Items: {
"auth_token": "xyz789",
"user_id": "12345"
}
})
// Custom Callback
new CustomCallbackAuthMethod({
Callback: async (adapter) => {
await adapter.Navigate("https://example.com/custom-auth");
// Custom auth logic here
}
})Tool System
Tool Definition:
const tool = new ComputerUseTool<TInput, TOutput>(
name: string, // Alphanumeric + underscores only
description: string, // LLM-friendly explanation
inputSchema: object, // JSON Schema for validation
handler: async (args: TInput) => Promise<TOutput>,
outputSchema?: object // Optional output documentation
);Tool Execution:
- Controller LLM requests tool calls in its response
- Engine executes handlers and wraps results in
ToolCallRecord - Failures captured but don't crash the step
- Results provided to controller in next prompt for adaptation
Example Tool Call Record:
interface ToolCallRecord {
ToolName: string;
Input: TInput;
Output?: TOutput;
Error?: string;
DurationMs: number;
}Navigation Guard
Purpose: Enforces domain restrictions to prevent the controller from navigating to untrusted sites.
Pattern Matching:
*- Allow all domains (open navigation)example.com- Exact match only*.example.com- All subdomains of example.com
Precedence Rules:
- If domain matches
BlockedDomains→ BLOCKED - Else if domain matches
AllowedDomains→ ALLOWED - Else if
AllowedDomainsis empty → ALLOWED (open navigation) - Else → BLOCKED
Data Flow
Coordinate Normalization
The controller LLM reasons about actions in a normalized 1000x1000 space regardless of actual viewport size. This provides viewport independence and consistent prompting.
Flow:
- Screenshot captured at actual viewport size (e.g., 1280x720)
- Controller reasons about clicks at normalized coordinates (e.g.,
Click(500, 300)= center) - Engine scales coordinates to actual viewport before browser execution
- Example:
(500, 300)normalized →(640, 216)at 1280x720 viewport
Benefits:
- Prompts don't need to mention viewport size
- Same prompt works across different browser configurations
- LLM reasoning is viewport-agnostic
Screenshot Ring Buffer
Purpose: Maintain bounded screenshot history without unbounded memory growth.
Behavior:
- Configurable via
ScreenshotHistoryDepth(default: 5) - Acts as a ring buffer (FIFO eviction)
- Most recent N screenshots included in controller prompt
- Provides temporal context while limiting memory usage
Example with depth=3:
Step 1: [A] → Buffer: [A]
Step 2: [B] → Buffer: [A, B]
Step 3: [C] → Buffer: [A, B, C]
Step 4: [D] → Buffer: [B, C, D] (A evicted)
Step 5: [E] → Buffer: [C, D, E] (B evicted)Error Handling
Philosophy: Never throw exceptions; wrap all errors in results for graceful handling.
Error Categories (ErrorCategory enum):
BrowserCrash- Browser process terminated unexpectedlyNavigationTimeout- Page load exceeded timeoutElementNotFound- Action target not found in pageLLMError- LLM API call failedLLMParseError- LLM response parsing failedToolExecutionError- Custom tool handler threw exceptionAuthenticationError- Auth method application failedDomainBlocked- Navigation blocked by NavigationGuardCancelled- User calledStop()method
Error Wrapping:
interface ComputerUseError {
Message: string;
Category: ErrorCategory;
Step?: number;
OriginalError?: Error;
}Recovery Strategies:
- Tool failures: Wrapped, reported to controller, step continues
- Action failures: Wrapped, reported to controller, step continues
- Parse failures: Return empty actions, empty step detection triggers after 3 consecutive
- Fatal errors: Stop execution, return result with Error field populated
Configuration
Run Parameters
Required:
params.Goal = "Natural language description of what to accomplish";Model Configuration:
params.ControllerModel = new ModelConfig("anthropic", "claude-3-5-sonnet-20241022");
params.JudgeModel = new ModelConfig("anthropic", "claude-3-5-sonnet-20241022");
// Optional: Override driver class
params.ControllerModel = new ModelConfig("anthropic", "claude-3-5-sonnet-20241022", "AnthropicLLM");Browser Configuration:
params.Headless = true; // Hide browser window
params.BrowserConfig.ViewportWidth = 1280;
params.BrowserConfig.ViewportHeight = 720;
params.BrowserConfig.UserAgent = "Custom/1.0";
params.BrowserConfig.NavigationTimeoutMs = 30000; // Page load timeout
params.BrowserConfig.ActionTimeoutMs = 10000; // Action execution timeout
params.BrowserConfig.SlowMo = 100; // Debugging: slow down actions by 100msEngine Configuration:
params.MaxSteps = 30; // Maximum controller loop iterations
params.ScreenshotHistoryDepth = 5; // Screenshots in ring buffer
params.ScreenshotDelayMs = 500; // Wait after action before screenshotJudge Configuration:
import { EveryStepFrequency, EveryNStepsFrequency, OnStagnationFrequency } from '@memberjunction/computer-use';
params.JudgeFrequency = new EveryStepFrequency(); // Default: every step
params.JudgeFrequency = new EveryNStepsFrequency(5); // Every 5 steps
params.JudgeFrequency = new OnStagnationFrequency(3); // After 3 stuck/loop indicatorsNavigation Configuration:
params.AllowedDomains = ["example.com", "*.trusted.com", "partner.com"];
params.BlockedDomains = ["ads.example.com", "tracker.com"];Authentication Configuration:
const authConfig = new ComputerUseAuthConfig();
// Per-domain bindings
authConfig.Bindings.push(new DomainAuthBinding(
["example.com", "*.example.com"],
new BearerTokenAuthMethod({ Token: "..." })
));
// Global callback for SSO/MFA
authConfig.GlobalCallback = async (adapter) => {
await adapter.Navigate("https://sso.example.com");
// Custom SSO flow
};
params.Auth = authConfig;Custom Prompts:
// Override controller prompt (template variables supported)
params.ControllerPrompt = `
You are controlling a browser to accomplish this goal: {{goal}}
Current step: {{stepNumber}}/{{maxSteps}}
Current URL: {{url}}
Previous judge feedback: {{judgeFeedback}}
Analyze the screenshot and determine the next action...
`;
// Override judge prompt
params.JudgePrompt = `
Goal: {{goal}}
Current URL: {{url}}
Has the goal been accomplished? Respond with JSON...
`;Available Template Variables:
{{goal}}- User's natural language goal{{stepNumber}}- Current step number{{maxSteps}}- Maximum steps configured{{url}}- Current page URL{{judgeFeedback}}- Feedback from most recent judge verdict{{credentials}}- Available auth credentials (for FormLogin scenarios){{tools}}- JSON schema of available tools
Browser Actions
The controller LLM can request these actions in its response:
// Click at normalized coordinates
{
Type: "Click",
X: 500, // 0-1000 normalized
Y: 300, // 0-1000 normalized
Button: "Left", // "Left" | "Right" | "Middle"
ClickCount: 1 // Single, double, triple click
}
// Type text
{ Type: "Type", Text: "hello world" }
// Press keys (supports combinations)
{ Type: "Keypress", Key: "Enter" }
{ Type: "Keypress", Key: "Shift+A" }
{ Type: "Keypress", Key: "ControlOrMeta+C" } // Ctrl on Windows/Linux, Cmd on Mac
// Individual key state
{ Type: "KeyDown", Key: "Shift" }
{ Type: "KeyUp", Key: "Shift" }
// Scroll
{ Type: "Scroll", DeltaX: 0, DeltaY: -100 } // Negative = up
// Wait (for loading, animations)
{ Type: "Wait", DurationMs: 2000 }
// Navigate to URL
{ Type: "Navigate", Url: "https://example.com" }
// Browser navigation
{ Type: "GoBack" }
{ Type: "GoForward" }
{ Type: "Refresh" }Results
interface ComputerUseResult {
Success: boolean; // Goal accomplished?
Status: ComputerUseStatus; // Terminal state
Steps: StepRecord[]; // Complete history
TotalSteps: number;
TotalDurationMs: number;
FinalScreenshot?: string; // Base64 PNG
FinalUrl?: string;
FinalJudgeVerdict?: JudgeVerdict;
Error?: ComputerUseError; // If Status is 'Error'
}
type ComputerUseStatus =
| "Completed" // Goal achieved (judge verdict Done=true)
| "Failed" // Judge determined goal failed
| "MaxStepsReached" // Hit MaxSteps without completion
| "Error" // Fatal error occurred
| "Cancelled"; // Stop() calledinterface StepRecord {
StepNumber: number; // 1-indexed
Screenshot: string; // Base64 PNG
ControllerReasoning: string; // LLM's explanation
ActionsRequested: BrowserAction[];
ActionResults: ActionResult[];
ToolCalls?: ToolCallRecord[];
JudgeVerdict?: JudgeVerdict; // If evaluated this step
DurationMs: number;
Error?: ComputerUseError;
Url: string;
}Advanced Usage
Extending ComputerUseEngine
The engine is designed to be subclassed for custom integrations:
import { ComputerUseEngine, RunComputerUseParams } from '@memberjunction/computer-use';
class CustomComputerUseEngine extends ComputerUseEngine {
// Override to use custom LLM routing
protected override async executeControllerPrompt(
request: ControllerPromptRequest
): Promise<string> {
// Your custom LLM integration
const response = await myCustomLLMService.call({
prompt: request.Prompt,
images: request.Screenshots,
tools: request.Tools
});
return response.text;
}
// Override to use custom judge execution
protected override async executeJudgePrompt(
request: JudgePromptRequest
): Promise<string> {
// Your custom judge logic
return await myJudgeService.evaluate(request);
}
// Hook called after each step
protected override async onStepComplete(
step: StepRecord,
context: RunContext
): Promise<void> {
// Custom logging, persistence, etc.
await this.saveStepToDatabase(step);
}
// Hook called at run completion
protected override async onRunComplete(
result: ComputerUseResult,
context: RunContext
): Promise<void> {
// Cleanup, finalization, etc.
await this.updateDashboard(result);
}
}
const engine = new CustomComputerUseEngine();
const result = await engine.Run(params);Custom Browser Adapters
Implement BaseBrowserAdapter to use alternative browser automation libraries:
import { BaseBrowserAdapter, BrowserConfig, BrowserAction, ActionResult } from '@memberjunction/computer-use';
class SeleniumBrowserAdapter extends BaseBrowserAdapter {
private driver: WebDriver;
async Launch(config: BrowserConfig): Promise<void> {
// Initialize Selenium WebDriver
this.driver = await new Builder()
.forBrowser('chrome')
.setChromeOptions(new ChromeOptions().headless())
.build();
}
async CaptureScreenshot(): Promise<string> {
return await this.driver.takeScreenshot();
}
async ExecuteAction(action: BrowserAction): Promise<ActionResult> {
switch (action.Type) {
case "Click":
// Selenium click implementation
break;
case "Type":
// Selenium type implementation
break;
// ... implement all action types
}
}
// ... implement remaining abstract methods
}
// Use custom adapter
const engine = new ComputerUseEngine();
const params = new RunComputerUseParams();
params.BrowserAdapter = new SeleniumBrowserAdapter();Cooperative Cancellation
For long-running tasks, support graceful cancellation:
const engine = new ComputerUseEngine();
const params = new RunComputerUseParams();
params.Goal = "Long-running research task...";
// Start execution (non-blocking)
const resultPromise = engine.Run(params);
// Cancel after 60 seconds
setTimeout(() => {
engine.Stop(); // Graceful cancellation after current step
}, 60000);
const result = await resultPromise;
if (result.Status === "Cancelled") {
console.log("Run was cancelled");
console.log(`Completed ${result.TotalSteps} steps before cancellation`);
}Multi-Step SSO Authentication
Handle complex authentication flows with global callbacks:
const authConfig = new ComputerUseAuthConfig();
// Global callback for SSO flow
authConfig.GlobalCallback = async (adapter) => {
// Navigate to SSO login
await adapter.Navigate("https://sso.example.com/login");
// Wait for redirect
await adapter.ExecuteAction({ Type: "Wait", DurationMs: 2000 });
// Enter credentials (these would be from secure storage)
await adapter.ExecuteAction({ Type: "Type", Text: "username" });
await adapter.ExecuteAction({ Type: "Keypress", Key: "Tab" });
await adapter.ExecuteAction({ Type: "Type", Text: "password" });
await adapter.ExecuteAction({ Type: "Keypress", Key: "Enter" });
// Wait for MFA prompt
await adapter.ExecuteAction({ Type: "Wait", DurationMs: 3000 });
// Handle MFA (could integrate with authenticator app)
const mfaCode = await getMFACode();
await adapter.ExecuteAction({ Type: "Type", Text: mfaCode });
await adapter.ExecuteAction({ Type: "Keypress", Key: "Enter" });
};
params.Auth = authConfig;Best Practices
✅ Do
- Use HybridJudge for the best cost/accuracy balance
- Set reasonable MaxSteps (20-50) to prevent runaway costs
- Configure AllowedDomains for security-sensitive tasks
- Use ring buffer depth of 3-5 for sufficient context without memory bloat
- Add descriptive tool descriptions so the LLM understands when to use them
- Test with Headless=false initially to debug controller behavior
- Check result.Success before accessing step data
- Store screenshots from critical steps for audit trails
- Use EveryNSteps or OnStagnation for cost optimization on long tasks
- Provide clear, specific goals for better controller performance
- Use template variables in custom prompts for flexibility
- Implement onStepComplete for persistence in production
- Set appropriate timeouts based on expected page load times
❌ Don't
- Don't use
anytypes - this package maintains full type safety - Don't ignore result.Error - check for errors even if Success is false
- Don't modify params during execution - params are frozen after Run() starts
- Don't call Run() concurrently on the same engine instance - create separate instances
- Don't assume screenshots are identical to visual equality - use judge evaluation
- Don't expose sensitive credentials in FormLogin scenarios without review
- Don't use EveryStep judge frequency for long tasks (cost explosion)
- Don't hardcode viewport sizes in custom prompts - use normalized coordinates
- Don't skip error handling for tool implementations - wrap in try/catch
- Don't use overly broad AllowedDomains (
*) for security-sensitive tasks - Don't rely on precise element targeting - this is vision-based, not selector-based
- Don't expect pixel-perfect clicking - vision models approximate element centers
Performance Optimization
- Judge Frequency: Use
EveryNStepsFrequency(5)orOnStagnationFrequency(3)for long tasks - Screenshot Depth: Use depth of 3-5; more history = larger prompts = higher cost
- Screenshot Delay: Tune based on site render times (100-500ms for static, 1000ms+ for SPAs)
- Model Selection: Use lighter models for simple tasks (Claude Haiku for basic navigation)
- Max Steps: Set based on expected task complexity (simple navigation: 10, complex research: 50)
Security Considerations
- Domain Whitelisting: Always use
AllowedDomainsfor production tasks - Credential Exposure: Be cautious with
FormLogin: true- credentials are shown to LLM - Tool Validation: Validate tool inputs thoroughly to prevent injection attacks
- Screenshot Retention: PII/PHI may appear in screenshots - handle according to compliance needs
- Error Messages: Don't expose sensitive system details in error messages
- Network Isolation: Consider running in sandboxed environments for untrusted goals
Debugging Tips
- Enable visual mode: Set
Headless: falseto watch the browser - Slow down execution: Use
SlowMo: 1000to see each action clearly - Check step reasoning: Examine
step.ControllerReasoningto understand LLM decisions - Review screenshots: Decode
step.Screenshotbase64 to see what the LLM saw - Enable judge every step: Use
EveryStepFrequency()for detailed evaluation - Check action results: Review
step.ActionResultsfor execution details - Monitor tool calls: Examine
step.ToolCallsfor tool execution history - Review judge feedback: Check
step.JudgeVerdict.Feedbackfor guidance
Troubleshooting
Controller keeps clicking the same element
Cause: Vision model may not detect state changes from previous actions.
Solution:
- Increase
ScreenshotDelayMsto allow more render time - Check if site uses animations that obscure state changes
- Add custom feedback in judge prompts about progress indicators
Browser times out on navigation
Cause: Page load exceeds NavigationTimeoutMs.
Solution:
- Increase timeout:
params.BrowserConfig.NavigationTimeoutMs = 60000 - Check if site has slow resources (fonts, analytics)
- Use navigation guard to skip slow third-party domains
Judge always says goal is incomplete
Cause: Goal may be ambiguous or too complex for vision-based evaluation.
Solution:
- Break down into smaller, more specific goals
- Provide more explicit success criteria in goal description
- Add URL pattern or element checks to judge prompt
- Use custom tools to verify state programmatically
Tool calls failing repeatedly
Cause: JSON Schema mismatch or handler exceptions.
Solution:
- Validate tool input schema matches handler expectations
- Add error handling in tool handler with descriptive messages
- Check
step.ToolCalls[].Errorfor specific failure reasons - Ensure async handlers return promises correctly
MaxSteps reached without completion
Cause: Goal too complex, controller stuck, or judge not evaluating.
Solution:
- Increase
MaxStepsfor complex tasks - Review step reasoning to identify loops
- Enable HybridJudge to detect stuck states
- Simplify goal or break into multiple runs
Contributing
We welcome contributions! Please see the main MemberJunction Contributing Guide for details.
Development Setup
# Clone repository
git clone https://github.com/memberjunction/mj.git
cd mj
# Install dependencies
npm install
# Build the package
cd packages/AI/ComputerUse
npm run build
# Run tests
npm testPrompt Sync Process
The package includes default prompts for the controller and judge LLMs. These prompts are automatically synced from MemberJunction metadata during the build process:
- Source:
metadata/prompts/templates/computer-use/(metadata definitions) - Target:
packages/AI/ComputerUse/src/prompts/(package code) - Sync Script:
scripts/sync-prompts.mjs(runs automatically as prebuild step)
The script converts Nunjucks-style metadata templates to simple TypeScript constants. If you modify the metadata prompts, they'll be automatically synced the next time you run npm run build.
For more details, see scripts/README.md.
License
This package is part of the MemberJunction project and is licensed under the MIT License. See LICENSE for details.
