@ejazullah/smart-browser-automation
v1.0.9
Published
A smart AI-driven browser automation library and REST API server using MCP (Model Context Protocol) and LangChain for multi-step task execution. Includes both programmatic library usage and HTTP API server for remote automation.
Maintainers
Readme
Smart Browser Automation
A powerful AI-driven browser automation library using MCP (Model Context Protocol) and LangChain. This tool can execute complex multi-step browser automation tasks through programmatic library usage.
🚀 Features
- AI-Powered: Smart task execution using LangChain and LLM integration
- Multi-step Automation: Execute complex browser workflows
- MCP Integration: Model Context Protocol for advanced browser control
- Multiple LLM Support: HuggingFace, Ollama, and extensible architecture
- Flexible Configuration: Easy setup and customization options
📦 Installation
npm install @ejazullah/smart-browser-automation🏃♂️ Quick Start
import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
// Configuration
const llmConfig = new HuggingFaceConfig("your_huggingface_token");
const mcpEndpoint = 'http://your-mcp-endpoint';
const cdpEndpoint = 'wss://your-cdp-endpoint';
// Create automation instance
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.0
});
try {
// Initialize
await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
// Execute task
const result = await automation.executeTask(
"go to https://duckduckgo.com/ and search for 'AI tools'",
{ verbose: true }
);
console.log("Task completed:", result);
} finally {
// Clean up
await automation.close();
}📚 Examples
Basic Usage
// examples/search-example.js
import { SmartBrowserAutomation, HuggingFaceConfig } from '../index.js';
async function searchExample() {
const automation = new SmartBrowserAutomation({ maxSteps: 10 });
const llmConfig = new HuggingFaceConfig("your_token");
await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
const result = await automation.executeTask(
"go to https://example.com and find the contact information"
);
await automation.close();
}TypeScript Usage
This package includes full TypeScript declarations for better development experience:
import {
SmartBrowserAutomation,
HuggingFaceConfig,
type TaskExecutionOptions,
type TaskExecutionResult
} from '@ejazullah/smart-browser-automation';
async function typedExample() {
// Configuration with type checking
const config = new HuggingFaceConfig('your-api-key');
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.1
});
// Options with proper typing
const options: TaskExecutionOptions = {
verbose: true,
onProgress: (update) => {
console.log(`Step ${update.step}: ${update.message}`);
}
};
await automation.initialize(config, mcpEndpoint, cdpEndpoint);
// Result with proper typing
const result: TaskExecutionResult = await automation.executeTask(
"Navigate to Google and search for TypeScript tutorials",
options
);
console.log(`Completed ${result.steps} steps, success: ${result.success}`);
await automation.close();
}🛠️ Configuration
LLM Configurations
// HuggingFace
const hfConfig = new HuggingFaceConfig("hf_token", {
model: "microsoft/DialoGPT-medium",
temperature: 0.0
});
// Ollama
const ollamaConfig = new OllamaConfig("ollama_endpoint", {
model: "llama2",
temperature: 0.1
});Publishing to NPM
- Login to npm:
npm login- Use the publishing script:
./publish.sh- Or manually:
npm version patch # or minor/major
npm publish📈 Use Cases
- Web Scraping: Automated data extraction from websites
- E2E Testing: End-to-end testing automation
- Form Automation: Automated form filling and submission
- Social Media Management: Automated posting and interactions
- Website Monitoring: Change detection and monitoring
- Data Entry: Bulk data processing and entry tasks
📖 Documentation
🤝 Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🆘 Support
- Issues: GitHub Issues
- Documentation: Check the
examples/directory
Made with ❤️ by Ejaz Ullah
Features
- 🤖 AI-Driven: Uses advanced language models to understand and execute complex browser tasks
- 🔄 Multi-Step Execution: Automatically performs sequences of actions to complete tasks
- 🧠 Smart Decision Making: Analyzes page content and decides next actions intelligently
- 🔌 Multiple LLM Support: Works with Hugging Face, Ollama, OpenAI, and other providers
- 🎯 Task Completion Detection: Knows when a task is fully completed
- 📊 Detailed Logging: Provides comprehensive execution logs and results
Installation
npm install @ejazullah/smart-browser-automationQuick Start
import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
// Configure your LLM
const llmConfig = new HuggingFaceConfig("your-hugging-face-api-key");
// MCP and WebDriver configuration
const mcpEndpoint = 'http://your-mcp-server:8006/mcp';
const driverUrl = 'wss://your-webdriver-endpoint';
// Create automation instance
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.0
});
// Initialize and execute task
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
const result = await automation.executeTask(
"go to https://example.com and fill out the contact form"
);
console.log(result);
await automation.close();Configuration Options
LLM Configurations
Hugging Face
import { HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
const config = new HuggingFaceConfig(
"your-api-key",
"Qwen/Qwen3-Coder-480B-A35B-Instruct" // optional model
);Ollama
import { OllamaConfig } from '@ejazullah/smart-browser-automation';
const config = new OllamaConfig(
"http://localhost:11434", // optional base URL
"llama2" // optional model
);OpenAI
import { OpenAIConfig } from '@ejazullah/smart-browser-automation';
const config = new OpenAIConfig("your-api-key", "gpt-4");Automation Options
const automation = new SmartBrowserAutomation({
maxSteps: 15, // Maximum steps to execute
temperature: 0.1, // LLM temperature (0.0 = deterministic)
});API Reference
SmartBrowserAutomation
Constructor
new SmartBrowserAutomation(config)config.maxSteps(number): Maximum execution steps (default: 10)config.temperature(number): LLM temperature (default: 0.0)
Methods
initialize(llmConfig, mcpEndpoint, driverUrl)
Initialize the automation system.
llmConfig: LLM configuration objectmcpEndpoint: MCP server endpoint URLdriverUrl: WebDriver WebSocket URL
executeTask(taskDescription, options)
Execute an automation task.
taskDescription(string): Natural language description of the taskoptions.verbose(boolean): Enable detailed logging (default: true)options.systemPrompt(string): Custom system prompt for the AI
Returns:
{
success: boolean,
steps: number,
results: Array,
completed: boolean
}close()
Clean up and close connections.
Examples
Search Example
const result = await automation.executeTask(
"go to https://duckduckgo.com/ and search for 'AI tools'"
);Form Filling Example
const result = await automation.executeTask(
"navigate to the contact page, fill out the form with name 'John Doe' and email '[email protected]', then submit it"
);E-commerce Example
const result = await automation.executeTask(
"go to the online store, search for 'laptop', filter by price under $1000, and add the first result to cart"
);Error Handling
try {
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
const result = await automation.executeTask("your task here");
if (!result.success) {
console.error("Task failed:", result);
}
} catch (error) {
console.error("Automation error:", error);
} finally {
await automation.close();
}Requirements
- Node.js 18+
- A running MCP server with browser capabilities
- Access to a WebDriver endpoint
- API key for your chosen LLM provider
License
MIT
Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.
Support
For issues and questions, please visit our GitHub Issues page.
