@open-agent-studio/agent

v0.12.9

Published

2 months ago

CLI agent runtime with Skill Hub, Plan Files, and permissioned tools

0High
0Medium
0Low

praveencs

agent cli automation skills mcp llm

🤖 Agent Runtime

Your autonomous AI employee. Give it a goal, walk away. It decomposes, executes, scripts, and learns — all by itself.

$ npm install -g @open-agent-studio/agent
$ agent init
$ agent

  🤖 Agent Runtime v0.11.0
  > Build a system health dashboard with monitoring scripts

  🧠 Decomposing into 5 subtasks...
  ⚡ [1/5] Create project structure ✓
  ⚡ [2/5] Gather system data ✓        ← created .agent/scripts/system-info/
  ⚡ [3/5] Build HTML dashboard ✓      ← created dashboard.html + dashboard.css
  ⚡ [4/5] Create update script ✓      ← created .agent/scripts/update-dashboard/
  ⚡ [5/5] Write README ✓

  ✓ Goal completed (42.1s) — 5/5 tasks done

What Is This?

Agent Runtime is your personal, autonomous AI employee that runs right on your computer.

Have you ever used ChatGPT or Claude and felt annoyed that you have to constantly copy-paste code, fix silly mistakes, or do the actual computer work for them? Agent Runtime fixes that.

Instead of treating the AI like a chatbox, you treat it like an intern. You assign it a massive goal (like "Build a custom React dashboard" or "Write a python script that cleans up my hard drive"), and then you walk away.

Here is what the Agent does while you sleep:

Breaks your goal into steps — it thinks about what needs to be done.
Types code by itself — it creates HTML, Python, and Javascript files in your project directory.
Runs programs — it executes terminal commands safely, installing its own dependencies if it needs them.
Checks its work — if it breaks something or a script crashes, it literally stares at the error message, plans a fix, and tries again.
Teamwork (Swarms) — doing a huge task? It will deploy a team of AI sub-workers (a coder, a reviewer) so they can collaborate together for you.
Remembers you — close your laptop halfway through? All good. The built-in SQLite database remembers everything across sessions indefinitely.
Visually Stunning App — want to see what it's thinking? Open Agent Studio's web dashboard and watch its glowing, real-time matrix terminal work.

Think of it as a junior developer you can assign tasks to and check on later.

🚀 Quick Start (5 minutes)

1. Install

npm install -g @open-agent-studio/agent

2. Initialize a project

cd your-project
agent init

This creates a .agent/ directory with configuration, skills, commands, and scripts.

3. Configure your LLM

# Set your preferred LLM provider
export OPENAI_API_KEY=sk-...
# OR
export ANTHROPIC_API_KEY=sk-ant-...

The agent supports OpenAI, Anthropic, Azure OpenAI, and Ollama (local) with automatic fallback.

4. Start using it

# Interactive mode (recommended)
agent

# Or one-shot command
agent run "Add input validation to the signup form"

# Run remotely on a cloud server
agent run "Add input validation" --remote http://server:3333

# Or start the background daemon
agent daemon start

📖 How It Works

The Agent Loop

You give a goal
     ↓
🧠 LLM decomposes it into subtasks with dependencies
     ↓
⚡ Daemon picks up tasks (up to 3 in parallel)
     ↓
🔧 Each task uses tools: file system, shell, git, HTTP, scripts, credentials
     ↓
✅ On success → saves output, triggers dependent tasks
❌ On failure → retries 3x, then re-decomposes with LLM
     ↓
💾 Everything stored in memory for future context

Tool Ecosystem

The agent has access to these tools when executing tasks:

| Tool | What It Does | |------|-------------| | fs.read / fs.write | Read and write files | | fs.mkdir / fs.list | Create directories, list contents | | cmd.run | Execute shell commands | | git.status / git.diff / git.commit | Git operations | | http.request | Make HTTP API calls (GET/POST/PUT/DELETE) | | secrets.get / secrets.list | Access encrypted credentials | | script.run | Execute project scripts by name | | command.execute | Run pre-defined command workflows | | notify.send | Send alerts via webhook, email, or log | | cost.summary | Get token usage and cost tracking | | desktop.browser.open | Open a URL in the agent's Playwright browser | | desktop.browser.scrape | Extract text/HTML from a web page | | desktop.browser.click / fill | Interact with web page elements | | desktop.browser.screenshot | Capture a PNG screenshot of the current page |

🎯 Goal-Driven Autonomy

Creating Goals

# From CLI
agent goal add "Build authentication with OAuth2" --priority 1

# The LLM auto-decomposes it:
# Task 1: Set up OAuth2 dependencies
# Task 2: Create auth routes (depends on: 1)
# Task 3: Implement token exchange (depends on: 1)
# Task 4: Add middleware (depends on: 2, 3)
# Task 5: Write tests (depends on: 4)

The Daemon

The daemon is the heart of autonomous execution. It runs in the background and:

Picks up pending tasks from the queue
Runs up to 3 tasks in parallel (independent tasks only)
Chains outputs — downstream tasks get results from their dependencies
Re-plans on failure — uses LLM to suggest alternative approaches
Loads all project capabilities — skills, scripts, commands, plugins, credentials

agent daemon start        # Start background processing
agent daemon status       # Check health & progress
agent daemon logs         # View execution log
agent daemon stop         # Graceful shutdown

Example Daemon Log

🧠 Auto-decomposing goal #1: "Build data pipeline for GitHub API"
   ✅ Created 5 subtask(s)
🔄 Processing task #1: "Fetch trending repos" 
   📦 Loaded: 2 skills, 3 commands, 6 scripts, 1 plugin, 8 credentials
✅ Task #1 completed
🔄 Processing task #2: "Transform JSON response" [parallel: 2]
🔄 Processing task #3: "Save to file" [parallel: 3]
✅ Task #2 completed
✅ Task #3 completed
🔄 Processing task #4: "Create re-run script"
✅ Task #4 completed — Goal 100% complete

🔑 Credential Vault

The agent has a built-in encrypted credential store so it can use API keys, tokens, and passwords securely.

How It Works

Vault — Secrets stored in .agent/vault.json, encrypted with AES-256-GCM
.env fallback — Credentials from .env are auto-detected
Interactive capture — If the agent needs a credential it doesn't have, it asks you via Studio

Adding Credentials

Via Studio UI:

Open Agent Studio → Credentials
Click "Add Secret"
Enter key name (e.g., GITHUB_TOKEN) and value
Stored encrypted on disk

Via .env file:

GITHUB_TOKEN=ghp_xxxx
OPENAI_API_KEY=sk-xxxx
APIFY_TOKEN=apify_api_xxxx

Via CLI tools: The LLM uses secrets.get({ key: "GITHUB_TOKEN" }) to retrieve credentials during task execution. It never hardcodes them.

📊 Agent Studio (Web Dashboard)

A full web-based management console for your agent:

agent studio
# → Agent Studio running at http://localhost:3333

agent studio --remote
# → Starts a secure tunnel and prints a QR code in terminal for mobile access!

Pages

| Page | What It Shows | |------|--------------| | Console | Real-time terminal with live command relay | | Capabilities | Loaded tools, permissions, provider info | | Goals & Tasks | Create goals, track progress, view task status | | Templates | Pre-built goal templates (blog writer, data pipeline, etc.) | | Credentials | Encrypted vault — add/delete API keys and tokens | | Live Stream | Real-time WebSocket streaming of task execution output | | Skills | Installed skills with success metrics | | Commands | Lightweight automation templates | | Scripts | Project scripts with execution and output viewer | | Plugins | Installed plugin bundles | | Daemon | Start/stop daemon, view logs, health status | | Costs | LLM token usage, spend tracking by model and day | | Memory | Search and browse persistent agent memory |

Goal Templates

Studio includes 6 pre-built goal templates for common workflows:

📊 System Health Monitor — Dashboard with CPU/memory/disk monitoring
✍️ Blog Post Writer — Research + write + SEO optimization
🕷️ Apify Actor Creator — Scaffold a web scraping actor
🔍 Code Review & Refactor — Analyze and improve code quality
🔄 Data Pipeline — Fetch → transform → save with error handling
📅 Recurring Report — Automated daily/weekly reports

🛠️ Extensibility

Skills

Reusable AI capabilities defined by a skill.json manifest + prompt.md:

agent skills list              # List installed skills
agent skills create my-skill   # Create a custom skill
agent skills stats             # View success metrics
agent skills fix my-skill      # Auto-repair with LLM

Example: Create .agent/skills/deploy/skill.json + prompt.md — the agent uses it whenever a deployment goal comes up.

Commands

Lightweight goal templates — just a markdown file with YAML frontmatter:

---
name: deploy-staging
description: Deploy current branch to staging
tools: [cmd.run, git.status]
---
# Deploy to Staging
1. Run `npm test` to verify all tests pass
2. Run `npm run build`
3. Push to staging branch

> /deploy-staging    # Use from interactive mode

Scripts

Direct automation (no LLM needed) — shell, Python, or Node.js:

# .agent/scripts/deploy/script.yaml
name: deploy-staging
description: Build and deploy to staging
entrypoint: run.sh

agent scripts run deploy-staging

The daemon auto-discovers scripts and can execute them via the script.run tool.

Plugins

Bundle native Node.js tools, skills, commands, scripts, and hooks into a single distributable package. The Agent Hub acts as the official registry for community plugins.

# Install the official GitHub plugin from the Hub
agent plugins install github

# Or install from a local path
agent plugins install ./my-plugin

# List installed plugins
agent plugins list

Featured Plugin: GitHub (github)

Grants the agent zero-dependency native control over GitHub.
Can create repos, open PRs, and manipulate Issues.
Unlocks Advanced Global Search natively.
Can view, dispatch, and monitor GitHub Actions CI/CD workflows.

16 Plugins Available — Slack, Notion, Vercel, Supabase, Stripe, AWS, Discord, OpenAI, Linear, Docker, MongoDB, Firebase, Telegram, HuggingFace, Resend.

🐳 Sandboxed Execution

Run commands safely inside Docker containers:

agent sandbox start              # Spin up ephemeral container
agent sandbox status             # Container info
agent sandbox stop               # Destroy sandbox

🐝 Multi-Agent Swarm

Coordinate specialized agents (Planner, Coder, Reviewer, Researcher, Tester):

agent swarm start "Build a REST API with auth"
agent swarm status               # View agents & tasks
agent swarm roles                # List available roles

🖥️ Desktop Automation

Cross-platform desktop control (Linux, macOS, Windows):

agent desktop screenshot         # Capture screen
agent desktop click 500 300      # Mouse click
agent desktop type "Hello" --enter
agent desktop hotkey ctrl+s      # Keyboard shortcut

🌈 Multimodal Interfaces

Voice, vision, and speech powered by OpenAI:

agent multimodal transcribe audio.wav      # Whisper STT
agent multimodal analyze image.png         # GPT-4o Vision
agent multimodal speak "Done!"             # TTS

🌐 Browser Automation (Playwright)

Built-in headless browser control with session persistence:

# Setup (one-time)
npx playwright install chromium

# The agent uses browser tools automatically:
agent run "Open https://example.com and scrape the heading"
agent run "Log into dashboard and download the monthly report"

| Tool | What It Does | |------|--------------| | desktop.browser.open | Navigate to a URL (headless by default) | | desktop.browser.click | Click elements by CSS/XPath selector | | desktop.browser.fill | Type into input fields | | desktop.browser.scrape | Extract text or HTML from the page | | desktop.browser.screenshot | Capture a PNG screenshot | | desktop.browser.close | Close browser and persist session |

Session Persistence — Cookies and localStorage are saved to .agent/browser-session.json on close and restored on next open, so the agent stays authenticated across runs.

☁️ Remote Execution (Agent Cloud)

Offload heavy LLM inference to a remote server while streaming output to your local terminal:

# On the remote server / cloud VM:
agent studio --port 3333

# On your local machine:
agent run "Summarize the entire codebase" --remote http://server:3333

Uses Server-Sent Events (SSE) via POST /api/execute — progress, warnings, and results stream back in real-time.

Lifecycle Hooks

Intercept execution at 10 event points:

{
  "hooks": {
    "after:tool": [{
      "match": "fs.write",
      "command": "npx prettier --write {{path}}"
    }]
  }
}

🤖 Interactive Mode

The conversational REPL with multi-turn context:

agent

> Add rate limiting to the /api/auth endpoint
  ⚡ fs.read(src/routes/auth.ts) ✓
  ⚡ fs.write(src/middleware/rateLimit.ts) ✓
  ✓ Done

> Now write tests for it
  ⚡ fs.write(src/__tests__/rateLimit.test.ts) ✓
  ⚡ cmd.run(npm test) ✓
  ✓ All 5 tests passing

> /deploy-staging
  Running command: deploy-staging...

Slash Commands

| Command | Action | |---------|--------| | /help | Show all available commands | | /skills | List installed skills | | /commands | List available commands | | /scripts | List available scripts | | /model | Display LLM provider info | | /compact | Summarize and free context |

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                     CLI / REPL / Studio                   │
├─────────────────────────────────────────────────────────┤
│                      LLM Router                          │
│   OpenAI │ Anthropic │ Azure │ Ollama (fallback chain)   │
├──────────┬──────────┬──────────┬────────────────────────┤
│  Skills  │ Commands │  Scripts │   Plugins               │
│  prompt  │   .md    │  .yaml   │   bundles               │
├──────────┴──────────┴──────────┴────────────────────────┤
│          Tool Registry & Policy Engine                    │
│  fs.* │ cmd.run │ git.* │ http.* │ secrets.* │ browser.*  │
├─────────────────────────────────────────────────────────┤
│  Goal Decomposer │ Daemon │ Credential Vault │ Memory    │
│  Remote Execute  │ Browser Manager │ Session Persistence  │
└─────────────────────────────────────────────────────────┘

Key Components

| Component | Purpose | |-----------|---------| | LLM Router | Multi-provider routing with fallback chains (OpenAI → Anthropic → Ollama) | | Goal Decomposer | LLM-powered breakdown of goals into dependency-aware task graphs | | Daemon Service | Background task runner with parallel execution, retries, re-planning | | Credential Vault | AES-256-GCM encrypted secret storage with .env fallback | | Tool Registry | Sandboxed execution with permission gates | | Policy Engine | Human-in-the-loop approval for sensitive operations | | Memory Store | SQLite + FTS5 persistent memory across sessions | | Plugin Loader | Discovers and loads sub-packages of skills, commands, scripts, hooks |

📋 Full CLI Reference

Core Commands

agent                           # Interactive REPL
agent run "<goal>"              # One-shot goal execution
agent run "<goal>" --remote URL # Execute on a remote server
agent init                      # Initialize project
agent studio                    # Web dashboard at :3333
agent doctor                    # System health check
agent update                    # Update to latest version

Goal & Daemon

agent goal add "<title>"        # Create a goal
agent goal list                 # List all goals
agent goal decompose <id>       # AI breakdown into tasks
agent goal status <id>          # Task-level progress

agent daemon start              # Start background worker
agent daemon stop               # Stop gracefully
agent daemon status             # Health & uptime
agent daemon logs               # Recent execution logs

Check session status

$ agent sessions list

Resume a session natively

$ agent run --session $ agent sessions resume


#### Multi-Agent Swarm & Remote Delegation
Coordinate multiple specialized agents, or assign tasks to remote instances securely using API keys.

```bash
# Start a multi-agent orchestrated run locally
$ agent swarm start "Refactor the database schema" --max-agents 3

# Add a remote agent instance from another machine
$ agent swarm add-remote http://10.0.0.5:3334 coder --name "MacBook Pro" --key "oas_abcd123"

MCP Server (Model Context Protocol)

Expose the agent's files, memories, and skills (as prompts) to an IDE like Cursor over Stdio or HTTP+SSE.

$ agent mcp                # runs STDIO transport
$ agent mcp --http 3100    # runs HTTP+SSE transport on port 3100

Plugin Management

Install 1st-party or community plugins directly from GitHub.

$ agent plugins search "database"
$ agent plugins install https://github.com/postgres-connector

Config

`agent.yaml` (or `.agent/config.json`)

llm:
  provider: openai          # openai | anthropic | azure | ollama
  model: gpt-4o
  fallback:
    - provider: anthropic
      model: claude-3-sonnet

daemon:
  maxConcurrent: 3          # Parallel task limit
  
policy:
  permissions:
    - "*"                   # Wildcard for full autonomy

LLM Providers

| Provider | Env Variable | Models | |----------|-------------|--------| | OpenAI | OPENAI_API_KEY | gpt-4o, gpt-4o-mini | | Anthropic | ANTHROPIC_API_KEY | claude-3-sonnet, claude-3-opus | | Azure OpenAI | AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT | Any deployed model | | Ollama | None (local) | llama3, codellama, mistral |

📋 Full CLI Reference

Core Commands

agent                           # Interactive REPL
agent run "<goal>"              # One-shot goal execution
agent run "<goal>" --remote URL # Execute on a remote server
agent init                      # Initialize project
agent studio                    # Web dashboard at :3333
agent doctor                    # System health check
agent update                    # Update to latest version

Goal & Daemon

agent goal add "<title>"        # Create a goal
agent goal list                 # List all goals
agent goal decompose <id>       # AI breakdown into tasks
agent goal status <id>          # Task-level progress

agent daemon start              # Start background worker
agent daemon stop               # Stop gracefully
agent daemon status             # Health & uptime
agent daemon logs               # Recent execution logs

Skills, Commands, Scripts, Plugins

agent skills list | create | stats | fix
agent commands list
agent scripts list | run <name> | show <name>
agent plugins list | install <path> | remove <name>
agent hooks list | add <event> <cmd>

Memory & Reports

agent memory search "<query>"   # Semantic search
agent memory add "<fact>"       # Store a fact
agent report generate           # Activity summary

📁 Project Structure

After agent init, your project contains:

your-project/
├── .agent/
│   ├── config.json          # Agent configuration
│   ├── vault.json           # Encrypted credentials (auto-created)
│   ├── memory.db            # SQLite persistent memory
│   ├── daemon.log           # Daemon execution log
│   ├── skills/              # Custom skills
│   │   └── my-skill/
│   │       ├── skill.json
│   │       └── prompt.md
│   ├── commands/            # Lightweight commands
│   │   └── deploy.md
│   ├── scripts/             # Automation scripts
│   │   └── health-check/
│   │       ├── script.yaml
│   │       └── run.sh
│   ├── plugins/             # Installed plugins
│   └── hooks/
│       └── hooks.json       # Lifecycle hooks
└── .env                     # Environment variables (auto-detected)

🔒 Security

Credential encryption — AES-256-GCM with machine-specific keys
Permission gating — Policy engine controls which tools can execute
Human-in-the-loop — Tasks can require manual approval before executing
No credential leaking — Secrets are never logged or included in LLM prompts as raw values
Sandboxed execution — Tools execute within the project directory scope

🆕 What's New in v0.12.0

🤖 Native Anthropic Computer Use — Seamlessly integrates the computer_20241022 tool spec globally.
🖥️ OS-Level UI Parsing — Native accessibility extraction parsing the screen into a true Desktop DOM (macOS, Windows, Linux AT-SPI) via desktop.ui_tree.
🕵️ GUI Operator Persona — The operator swarm agent role takes full autonomous control of your mouse and keyboard using live screen context.
☁️ Persistent Sessions — Agent conversations backed into memory.db for multi-session resuming.
🔗 MCP Server — Exposes built-in tools over HTTP/SSE via agent mcp serve.
🔌 Plugin HUB — Remote 1-click install via URL / UI marketplace.

🆕 What's New in v0.11.0

🌐 Browser Automation — Built-in Playwright browser control (desktop.browser.*) with headless/headed modes and session persistence
☁️ Remote Execution — agent run --remote http://server:3333 offloads LLM inference to a cloud server with real-time SSE streaming
📡 POST /api/execute — New streaming API endpoint for remote goal execution
🚀 Remote Studio Access — agent studio --remote generates a secure tunnel URL + QR code for mobile access
📡 Live Task Streaming — Real-time event timeline of daemon task execution
🔑 Interactive Credential Capture — When the daemon needs a secret mid-task, a modal pops up in Studio
🔔 Notifications Plugin — Auto-notifies on goal completion/failure via Slack, Discord, or Email
💰 Cost Tracker Plugin — Token usage + cost tracking with Studio dashboard
⚡ Parallel Task Execution — Up to 3 independent tasks run simultaneously
🔗 Task Output Chaining — Downstream tasks receive upstream results
🔁 Dynamic Re-decomposition — Failed tasks trigger LLM re-planning
🐝 Multi-Agent Swarm — Coordinate specialized agents (Planner, Coder, Reviewer)
🖥️ Desktop Automation — Cross-platform screen capture, mouse, and keyboard control
🌈 Multimodal — Voice transcription, image analysis, and text-to-speech
🐳 Sandboxed Execution — Run commands in Docker containers

🤝 Contributing

We welcome contributions! Key areas:

Writing new Skills and Commands
Improving LLM prompt engineering
Building Studio UI components
Creating community Plugins
Writing documentation

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🤖 Agent Runtime

What Is This?

🚀 Quick Start (5 minutes)

1. Install

2. Initialize a project

3. Configure your LLM

4. Start using it

📖 How It Works

The Agent Loop

Tool Ecosystem

🎯 Goal-Driven Autonomy

Creating Goals

The Daemon

Example Daemon Log

🔑 Credential Vault

How It Works

Adding Credentials

📊 Agent Studio (Web Dashboard)

Pages

Goal Templates

🛠️ Extensibility

Skills

Commands

Scripts

Plugins

🐳 Sandboxed Execution

🐝 Multi-Agent Swarm

🖥️ Desktop Automation

🌈 Multimodal Interfaces

🌐 Browser Automation (Playwright)

☁️ Remote Execution (Agent Cloud)

Lifecycle Hooks

🤖 Interactive Mode

Slash Commands

🏗️ Architecture

Key Components

📋 Full CLI Reference

Core Commands

Goal & Daemon

Check session status

Resume a session natively

MCP Server (Model Context Protocol)

Plugin Management

Config

agent.yaml (or .agent/config.json)

LLM Providers

📋 Full CLI Reference

Core Commands

Goal & Daemon

Skills, Commands, Scripts, Plugins

Memory & Reports

📁 Project Structure

🔒 Security

🆕 What's New in v0.12.0

🆕 What's New in v0.11.0

🤝 Contributing

License

`agent.yaml` (or `.agent/config.json`)