@freelw/agent-tars-core

v0.3.21

Published

14 days ago

Agent TARS core.

0High
0Medium
0Low

@agent-tars/core

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.

It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.

📣 Just released: Agent TARS Beta - check out our announcement blog post!

https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8

For more use cases, please check out #842.

Overview

@agent-tars/core is the core implementation of Agent TARS, built on top of the Tarko Agent framework. It provides a comprehensive multimodal AI agent with advanced browser automation, filesystem operations, and intelligent search capabilities.

Core Features

🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

Quick Start

# Luanch with `npx`.
npx @agent-tars/cli@latest

# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g

# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

Visit the comprehensive Quick Start guide for detailed setup instructions.

Quick Start

Installation

npm install @agent-tars/core

Running Agent TARS

Agent TARS can be started in multiple ways:

Option 1: Using @agent-tars/cli (Recommended)

# Install globally
npm install -g @agent-tars/cli

# Run Agent TARS
agent-tars

# Or use directly via npx
npx @agent-tars/cli

Option 2: Using @tarko/agent-cli

# Install globally
npm install -g @tarko/agent-cli

# Run Agent TARS through tarko CLI
tarko run agent-tars

# Or use directly via npx
npx @tarko/agent-cli run agent-tars

Option 3: Programmatic Usage

Basic Usage

import { AgentTARS } from '@agent-tars/core';

// Create an agent instance
const agent = new AgentTARS({
  model: {
    provider: 'openai',
    model: 'gpt-4',
    apiKey: process.env.OPENAI_API_KEY,
  },
  workspace: './workspace',
  browser: {
    headless: false,
    control: 'hybrid',
  },
});

// Initialize and run
await agent.initialize();
const result = await agent.run('Search for the latest AI research papers');
console.log(result);

Configuration

AgentTARSOptions

interface AgentTARSOptions {
  // Model configuration
  model?: {
    provider: 'openai' | 'anthropic' | 'doubao';
    model: string;
    apiKey: string;
  };
  
  // Browser settings
  browser?: {
    headless?: boolean;
    control?: 'dom' | 'visual-grounding' | 'hybrid';
    cdpEndpoint?: string;
  };
  
  // Search configuration
  search?: {
    provider: 'browser_search' | 'tavily';
    count?: number;
    apiKey?: string;
  };
  
  // Workspace settings
  workspace?: string;
  
  // MCP implementation
  mcpImpl?: 'in-memory' | 'stdio';
}

Browser Control Modes

dom: Direct DOM manipulation (fastest, most reliable)
visual-grounding: Vision-based interaction (most flexible)
hybrid: Combines both approaches (recommended)

Advanced Usage

Custom Instructions

const agent = new AgentTARS({
  instructions: `
    You are a specialized research assistant.
    Focus on academic papers and technical documentation.
    Always provide citations and sources.
  `,
  // ... other options
});

Browser State Management

// Get browser control information
const browserInfo = agent.getBrowserControlInfo();
console.log(`Mode: ${browserInfo.mode}`);
console.log(`Tools: ${browserInfo.tools.join(', ')}`);

// Access browser manager
const browserManager = agent.getBrowserManager();
if (browserManager) {
  const isAlive = await browserManager.isBrowserAlive();
  console.log(`Browser status: ${isAlive ? 'alive' : 'dead'}`);
}

Workspace Operations

// Get current workspace
const workspace = agent.getWorkingDirectory();
console.log(`Working in: ${workspace}`);

// All file operations are automatically scoped to workspace
const result = await agent.run('Create a summary.md file with today\'s findings');

Error Handling

try {
  await agent.initialize();
  const result = await agent.run('Your task here');
} catch (error) {
  console.error('Agent error:', error);
} finally {
  // Always cleanup
  await agent.cleanup();
}

API Reference

Core Methods

initialize(): Initialize the agent and all components
run(message): Execute a task with the given message
cleanup(): Clean up all resources
getWorkingDirectory(): Get current workspace path
getBrowserControlInfo(): Get browser control status
getBrowserManager(): Access browser manager instance

Events

The agent emits events through the event stream:

agent.eventStream.subscribe((event) => {
  if (event.type === 'tool_result') {
    console.log(`Tool ${event.name} completed`);
  }
});

Resources

agent-tars-banner

Features

🌐 Advanced Browser Control: Multiple control strategies (DOM, Visual, Hybrid)
📁 Safe Filesystem Operations: Workspace-scoped file management
🔍 Intelligent Search: Integration with multiple search providers
🔧 MCP Integration: Built-in Model Context Protocol support
📸 Visual Understanding: Screenshot-based browser interaction
🛡️ Safety First: Path validation and workspace isolation

What's Changed

See Full CHANGELOG

Contributing

See CONTRIBUTING.md for development guidelines.

License

Apache-2.0 - See LICENSE for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@agent-tars/core

Overview

Core Features

Quick Start

Quick Start

Installation

Running Agent TARS

Option 1: Using @agent-tars/cli (Recommended)

Option 2: Using @tarko/agent-cli

Option 3: Programmatic Usage

Basic Usage

Configuration

AgentTARSOptions

Browser Control Modes

Advanced Usage

Custom Instructions

Browser State Management

Workspace Operations

Error Handling

API Reference

Core Methods

Events

Resources

Features

What's Changed

Contributing

License