@freelw/agent-tars-core
v0.3.21
Published
Agent TARS core.
Readme
@agent-tars/core
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
📣 Just released: Agent TARS Beta - check out our announcement blog post!
https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8
For more use cases, please check out #842.
Overview
@agent-tars/core is the core implementation of Agent TARS, built on top of the Tarko Agent framework. It provides a comprehensive multimodal AI agent with advanced browser automation, filesystem operations, and intelligent search capabilities.
Core Features
- 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
- 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
- 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
- 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.
Quick Start
# Luanch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-keyVisit the comprehensive Quick Start guide for detailed setup instructions.
Quick Start
Installation
npm install @agent-tars/coreRunning Agent TARS
Agent TARS can be started in multiple ways:
Option 1: Using @agent-tars/cli (Recommended)
# Install globally
npm install -g @agent-tars/cli
# Run Agent TARS
agent-tars
# Or use directly via npx
npx @agent-tars/cliOption 2: Using @tarko/agent-cli
# Install globally
npm install -g @tarko/agent-cli
# Run Agent TARS through tarko CLI
tarko run agent-tars
# Or use directly via npx
npx @tarko/agent-cli run agent-tarsOption 3: Programmatic Usage
Basic Usage
import { AgentTARS } from '@agent-tars/core';
// Create an agent instance
const agent = new AgentTARS({
model: {
provider: 'openai',
model: 'gpt-4',
apiKey: process.env.OPENAI_API_KEY,
},
workspace: './workspace',
browser: {
headless: false,
control: 'hybrid',
},
});
// Initialize and run
await agent.initialize();
const result = await agent.run('Search for the latest AI research papers');
console.log(result);Configuration
AgentTARSOptions
interface AgentTARSOptions {
// Model configuration
model?: {
provider: 'openai' | 'anthropic' | 'doubao';
model: string;
apiKey: string;
};
// Browser settings
browser?: {
headless?: boolean;
control?: 'dom' | 'visual-grounding' | 'hybrid';
cdpEndpoint?: string;
};
// Search configuration
search?: {
provider: 'browser_search' | 'tavily';
count?: number;
apiKey?: string;
};
// Workspace settings
workspace?: string;
// MCP implementation
mcpImpl?: 'in-memory' | 'stdio';
}Browser Control Modes
dom: Direct DOM manipulation (fastest, most reliable)visual-grounding: Vision-based interaction (most flexible)hybrid: Combines both approaches (recommended)
Advanced Usage
Custom Instructions
const agent = new AgentTARS({
instructions: `
You are a specialized research assistant.
Focus on academic papers and technical documentation.
Always provide citations and sources.
`,
// ... other options
});Browser State Management
// Get browser control information
const browserInfo = agent.getBrowserControlInfo();
console.log(`Mode: ${browserInfo.mode}`);
console.log(`Tools: ${browserInfo.tools.join(', ')}`);
// Access browser manager
const browserManager = agent.getBrowserManager();
if (browserManager) {
const isAlive = await browserManager.isBrowserAlive();
console.log(`Browser status: ${isAlive ? 'alive' : 'dead'}`);
}Workspace Operations
// Get current workspace
const workspace = agent.getWorkingDirectory();
console.log(`Working in: ${workspace}`);
// All file operations are automatically scoped to workspace
const result = await agent.run('Create a summary.md file with today\'s findings');Error Handling
try {
await agent.initialize();
const result = await agent.run('Your task here');
} catch (error) {
console.error('Agent error:', error);
} finally {
// Always cleanup
await agent.cleanup();
}API Reference
Core Methods
initialize(): Initialize the agent and all componentsrun(message): Execute a task with the given messagecleanup(): Clean up all resourcesgetWorkingDirectory(): Get current workspace pathgetBrowserControlInfo(): Get browser control statusgetBrowserManager(): Access browser manager instance
Events
The agent emits events through the event stream:
agent.eventStream.subscribe((event) => {
if (event.type === 'tool_result') {
console.log(`Tool ${event.name} completed`);
}
});Resources
- 📄 Blog Post
- 🐦 Release Announcement on Twitter
- 🐦 Official Twitter
- 💬 Discord Community
- 💬 飞书交流群
- 🚀 Quick Start
- 💻 CLI Documentation
- 🖥️ Web UI Guide
- 📁 Workspace Documentation
- 🔌 MCP Documentation
Features
- 🌐 Advanced Browser Control: Multiple control strategies (DOM, Visual, Hybrid)
- 📁 Safe Filesystem Operations: Workspace-scoped file management
- 🔍 Intelligent Search: Integration with multiple search providers
- 🔧 MCP Integration: Built-in Model Context Protocol support
- 📸 Visual Understanding: Screenshot-based browser interaction
- 🛡️ Safety First: Path validation and workspace isolation
What's Changed
See Full CHANGELOG
Contributing
See CONTRIBUTING.md for development guidelines.
License
Apache-2.0 - See LICENSE for details.
