@ejazullah/smart-browser-automation

v1.0.9

Published

3 months ago

A smart AI-driven browser automation library and REST API server using MCP (Model Context Protocol) and LangChain for multi-step task execution. Includes both programmatic library usage and HTTP API server for remote automation.

0High
0Medium
0Low

ejazdoinger

browser-automation ai langchain mcp playwright automation smart-automation multi-step huggingface express api server rest-api

Smart Browser Automation

A powerful AI-driven browser automation library using MCP (Model Context Protocol) and LangChain. This tool can execute complex multi-step browser automation tasks through programmatic library usage.

🚀 Features

AI-Powered: Smart task execution using LangChain and LLM integration
Multi-step Automation: Execute complex browser workflows
MCP Integration: Model Context Protocol for advanced browser control
Multiple LLM Support: HuggingFace, Ollama, and extensible architecture
Flexible Configuration: Easy setup and customization options

📦 Installation

npm install @ejazullah/smart-browser-automation

🏃‍♂️ Quick Start

import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';

// Configuration
const llmConfig = new HuggingFaceConfig("your_huggingface_token");
const mcpEndpoint = 'http://your-mcp-endpoint';
const cdpEndpoint = 'wss://your-cdp-endpoint';

// Create automation instance
const automation = new SmartBrowserAutomation({
  maxSteps: 10,
  temperature: 0.0
});

try {
  // Initialize
  await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);

  // Execute task
  const result = await automation.executeTask(
    "go to https://duckduckgo.com/ and search for 'AI tools'",
    { verbose: true }
  );

  console.log("Task completed:", result);
} finally {
  // Clean up
  await automation.close();
}

📚 Examples

Basic Usage

// examples/search-example.js
import { SmartBrowserAutomation, HuggingFaceConfig } from '../index.js';

async function searchExample() {
  const automation = new SmartBrowserAutomation({ maxSteps: 10 });
  const llmConfig = new HuggingFaceConfig("your_token");
  
  await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
  const result = await automation.executeTask(
    "go to https://example.com and find the contact information"
  );
  await automation.close();
}

TypeScript Usage

This package includes full TypeScript declarations for better development experience:

import { 
  SmartBrowserAutomation, 
  HuggingFaceConfig, 
  type TaskExecutionOptions,
  type TaskExecutionResult 
} from '@ejazullah/smart-browser-automation';

async function typedExample() {
  // Configuration with type checking
  const config = new HuggingFaceConfig('your-api-key');
  const automation = new SmartBrowserAutomation({ 
    maxSteps: 10, 
    temperature: 0.1 
  });
  
  // Options with proper typing
  const options: TaskExecutionOptions = {
    verbose: true,
    onProgress: (update) => {
      console.log(`Step ${update.step}: ${update.message}`);
    }
  };
  
  await automation.initialize(config, mcpEndpoint, cdpEndpoint);
  
  // Result with proper typing
  const result: TaskExecutionResult = await automation.executeTask(
    "Navigate to Google and search for TypeScript tutorials",
    options
  );
  
  console.log(`Completed ${result.steps} steps, success: ${result.success}`);
  await automation.close();
}

🛠️ Configuration

LLM Configurations

// HuggingFace
const hfConfig = new HuggingFaceConfig("hf_token", {
  model: "microsoft/DialoGPT-medium",
  temperature: 0.0
});

// Ollama
const ollamaConfig = new OllamaConfig("ollama_endpoint", {
  model: "llama2",
  temperature: 0.1
});

Publishing to NPM

Login to npm:

npm login

Use the publishing script:

./publish.sh

Or manually:

npm version patch  # or minor/major
npm publish

📈 Use Cases

Web Scraping: Automated data extraction from websites
E2E Testing: End-to-end testing automation
Form Automation: Automated form filling and submission
Social Media Management: Automated posting and interactions
Website Monitoring: Change detection and monitoring
Data Entry: Bulk data processing and entry tasks

📖 Documentation

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Issues: GitHub Issues
Documentation: Check the examples/ directory

Made with ❤️ by Ejaz Ullah

Features

🤖 AI-Driven: Uses advanced language models to understand and execute complex browser tasks
🔄 Multi-Step Execution: Automatically performs sequences of actions to complete tasks
🧠 Smart Decision Making: Analyzes page content and decides next actions intelligently
🔌 Multiple LLM Support: Works with Hugging Face, Ollama, OpenAI, and other providers
🎯 Task Completion Detection: Knows when a task is fully completed
📊 Detailed Logging: Provides comprehensive execution logs and results

Installation

npm install @ejazullah/smart-browser-automation

Quick Start

import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';

// Configure your LLM
const llmConfig = new HuggingFaceConfig("your-hugging-face-api-key");

// MCP and WebDriver configuration
const mcpEndpoint = 'http://your-mcp-server:8006/mcp';
const driverUrl = 'wss://your-webdriver-endpoint';

// Create automation instance
const automation = new SmartBrowserAutomation({
  maxSteps: 10,
  temperature: 0.0
});

// Initialize and execute task
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);

const result = await automation.executeTask(
  "go to https://example.com and fill out the contact form"
);

console.log(result);
await automation.close();

Configuration Options

LLM Configurations

Hugging Face

import { HuggingFaceConfig } from '@ejazullah/smart-browser-automation';

const config = new HuggingFaceConfig(
  "your-api-key", 
  "Qwen/Qwen3-Coder-480B-A35B-Instruct" // optional model
);

Ollama

import { OllamaConfig } from '@ejazullah/smart-browser-automation';

const config = new OllamaConfig(
  "http://localhost:11434", // optional base URL
  "llama2" // optional model
);

OpenAI

import { OpenAIConfig } from '@ejazullah/smart-browser-automation';

const config = new OpenAIConfig("your-api-key", "gpt-4");

Automation Options

const automation = new SmartBrowserAutomation({
  maxSteps: 15,        // Maximum steps to execute
  temperature: 0.1,    // LLM temperature (0.0 = deterministic)
});

API Reference

SmartBrowserAutomation

Constructor

new SmartBrowserAutomation(config)
- config.maxSteps (number): Maximum execution steps (default: 10)
- config.temperature (number): LLM temperature (default: 0.0)

Methods

`initialize(llmConfig, mcpEndpoint, driverUrl)`

Initialize the automation system.

llmConfig: LLM configuration object
mcpEndpoint: MCP server endpoint URL
driverUrl: WebDriver WebSocket URL

`executeTask(taskDescription, options)`

Execute an automation task.

taskDescription (string): Natural language description of the task
options.verbose (boolean): Enable detailed logging (default: true)
options.systemPrompt (string): Custom system prompt for the AI

Returns:

{
  success: boolean,
  steps: number,
  results: Array,
  completed: boolean
}

`close()`

Clean up and close connections.

Examples

Search Example

const result = await automation.executeTask(
  "go to https://duckduckgo.com/ and search for 'AI tools'"
);

Form Filling Example

const result = await automation.executeTask(
  "navigate to the contact page, fill out the form with name 'John Doe' and email '[email protected]', then submit it"
);

E-commerce Example

const result = await automation.executeTask(
  "go to the online store, search for 'laptop', filter by price under $1000, and add the first result to cart"
);

Error Handling

try {
  await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
  const result = await automation.executeTask("your task here");
  
  if (!result.success) {
    console.error("Task failed:", result);
  }
} catch (error) {
  console.error("Automation error:", error);
} finally {
  await automation.close();
}

Requirements

Node.js 18+
A running MCP server with browser capabilities
Access to a WebDriver endpoint
API key for your chosen LLM provider

License

MIT

Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.

Support

For issues and questions, please visit our GitHub Issues page.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Smart Browser Automation

🚀 Features

📦 Installation

🏃‍♂️ Quick Start

📚 Examples

Basic Usage

TypeScript Usage

🛠️ Configuration

LLM Configurations

Publishing to NPM

📈 Use Cases

📖 Documentation

🤝 Contributing

📄 License

🆘 Support

Features

Installation

Quick Start

Configuration Options

LLM Configurations

Hugging Face

Ollama

OpenAI

Automation Options

API Reference

SmartBrowserAutomation

Constructor

Methods

initialize(llmConfig, mcpEndpoint, driverUrl)

executeTask(taskDescription, options)

close()

Examples

Search Example

Form Filling Example

E-commerce Example

Error Handling

Requirements

License

Contributing

Support

`initialize(llmConfig, mcpEndpoint, driverUrl)`

`executeTask(taskDescription, options)`

`close()`