@open-agent-studio/agent
v0.12.9
Published
CLI agent runtime with Skill Hub, Plan Files, and permissioned tools
Downloads
1,374
Maintainers
Readme
🤖 Agent Runtime
Your autonomous AI employee. Give it a goal, walk away. It decomposes, executes, scripts, and learns — all by itself.
$ npm install -g @open-agent-studio/agent
$ agent init
$ agent
🤖 Agent Runtime v0.11.0
> Build a system health dashboard with monitoring scripts
🧠 Decomposing into 5 subtasks...
⚡ [1/5] Create project structure ✓
⚡ [2/5] Gather system data ✓ ← created .agent/scripts/system-info/
⚡ [3/5] Build HTML dashboard ✓ ← created dashboard.html + dashboard.css
⚡ [4/5] Create update script ✓ ← created .agent/scripts/update-dashboard/
⚡ [5/5] Write README ✓
✓ Goal completed (42.1s) — 5/5 tasks doneWhat Is This?
Agent Runtime is your personal, autonomous AI employee that runs right on your computer.
Have you ever used ChatGPT or Claude and felt annoyed that you have to constantly copy-paste code, fix silly mistakes, or do the actual computer work for them? Agent Runtime fixes that.
Instead of treating the AI like a chatbox, you treat it like an intern. You assign it a massive goal (like "Build a custom React dashboard" or "Write a python script that cleans up my hard drive"), and then you walk away.
Here is what the Agent does while you sleep:
- Breaks your goal into steps — it thinks about what needs to be done.
- Types code by itself — it creates HTML, Python, and Javascript files in your project directory.
- Runs programs — it executes terminal commands safely, installing its own dependencies if it needs them.
- Checks its work — if it breaks something or a script crashes, it literally stares at the error message, plans a fix, and tries again.
- Teamwork (Swarms) — doing a huge task? It will deploy a team of AI sub-workers (a coder, a reviewer) so they can collaborate together for you.
- Remembers you — close your laptop halfway through? All good. The built-in SQLite database remembers everything across sessions indefinitely.
- Visually Stunning App — want to see what it's thinking? Open Agent Studio's web dashboard and watch its glowing, real-time matrix terminal work.
Think of it as a junior developer you can assign tasks to and check on later.
🚀 Quick Start (5 minutes)
1. Install
npm install -g @open-agent-studio/agent2. Initialize a project
cd your-project
agent initThis creates a .agent/ directory with configuration, skills, commands, and scripts.
3. Configure your LLM
# Set your preferred LLM provider
export OPENAI_API_KEY=sk-...
# OR
export ANTHROPIC_API_KEY=sk-ant-...The agent supports OpenAI, Anthropic, Azure OpenAI, and Ollama (local) with automatic fallback.
4. Start using it
# Interactive mode (recommended)
agent
# Or one-shot command
agent run "Add input validation to the signup form"
# Run remotely on a cloud server
agent run "Add input validation" --remote http://server:3333
# Or start the background daemon
agent daemon start📖 How It Works
The Agent Loop
You give a goal
↓
🧠 LLM decomposes it into subtasks with dependencies
↓
⚡ Daemon picks up tasks (up to 3 in parallel)
↓
🔧 Each task uses tools: file system, shell, git, HTTP, scripts, credentials
↓
✅ On success → saves output, triggers dependent tasks
❌ On failure → retries 3x, then re-decomposes with LLM
↓
💾 Everything stored in memory for future contextTool Ecosystem
The agent has access to these tools when executing tasks:
| Tool | What It Does |
|------|-------------|
| fs.read / fs.write | Read and write files |
| fs.mkdir / fs.list | Create directories, list contents |
| cmd.run | Execute shell commands |
| git.status / git.diff / git.commit | Git operations |
| http.request | Make HTTP API calls (GET/POST/PUT/DELETE) |
| secrets.get / secrets.list | Access encrypted credentials |
| script.run | Execute project scripts by name |
| command.execute | Run pre-defined command workflows |
| notify.send | Send alerts via webhook, email, or log |
| cost.summary | Get token usage and cost tracking |
| desktop.browser.open | Open a URL in the agent's Playwright browser |
| desktop.browser.scrape | Extract text/HTML from a web page |
| desktop.browser.click / fill | Interact with web page elements |
| desktop.browser.screenshot | Capture a PNG screenshot of the current page |
🎯 Goal-Driven Autonomy
Creating Goals
# From CLI
agent goal add "Build authentication with OAuth2" --priority 1
# The LLM auto-decomposes it:
# Task 1: Set up OAuth2 dependencies
# Task 2: Create auth routes (depends on: 1)
# Task 3: Implement token exchange (depends on: 1)
# Task 4: Add middleware (depends on: 2, 3)
# Task 5: Write tests (depends on: 4)The Daemon
The daemon is the heart of autonomous execution. It runs in the background and:
- Picks up pending tasks from the queue
- Runs up to 3 tasks in parallel (independent tasks only)
- Chains outputs — downstream tasks get results from their dependencies
- Re-plans on failure — uses LLM to suggest alternative approaches
- Loads all project capabilities — skills, scripts, commands, plugins, credentials
agent daemon start # Start background processing
agent daemon status # Check health & progress
agent daemon logs # View execution log
agent daemon stop # Graceful shutdownExample Daemon Log
🧠 Auto-decomposing goal #1: "Build data pipeline for GitHub API"
✅ Created 5 subtask(s)
🔄 Processing task #1: "Fetch trending repos"
📦 Loaded: 2 skills, 3 commands, 6 scripts, 1 plugin, 8 credentials
✅ Task #1 completed
🔄 Processing task #2: "Transform JSON response" [parallel: 2]
🔄 Processing task #3: "Save to file" [parallel: 3]
✅ Task #2 completed
✅ Task #3 completed
🔄 Processing task #4: "Create re-run script"
✅ Task #4 completed — Goal 100% complete🔑 Credential Vault
The agent has a built-in encrypted credential store so it can use API keys, tokens, and passwords securely.
How It Works
- Vault — Secrets stored in
.agent/vault.json, encrypted with AES-256-GCM .envfallback — Credentials from.envare auto-detected- Interactive capture — If the agent needs a credential it doesn't have, it asks you via Studio
Adding Credentials
Via Studio UI:
- Open Agent Studio → Credentials
- Click "Add Secret"
- Enter key name (e.g.,
GITHUB_TOKEN) and value - Stored encrypted on disk
Via .env file:
GITHUB_TOKEN=ghp_xxxx
OPENAI_API_KEY=sk-xxxx
APIFY_TOKEN=apify_api_xxxxVia CLI tools:
The LLM uses secrets.get({ key: "GITHUB_TOKEN" }) to retrieve credentials during task execution. It never hardcodes them.
📊 Agent Studio (Web Dashboard)
A full web-based management console for your agent:
agent studio
# → Agent Studio running at http://localhost:3333
agent studio --remote
# → Starts a secure tunnel and prints a QR code in terminal for mobile access!Pages
| Page | What It Shows | |------|--------------| | Console | Real-time terminal with live command relay | | Capabilities | Loaded tools, permissions, provider info | | Goals & Tasks | Create goals, track progress, view task status | | Templates | Pre-built goal templates (blog writer, data pipeline, etc.) | | Credentials | Encrypted vault — add/delete API keys and tokens | | Live Stream | Real-time WebSocket streaming of task execution output | | Skills | Installed skills with success metrics | | Commands | Lightweight automation templates | | Scripts | Project scripts with execution and output viewer | | Plugins | Installed plugin bundles | | Daemon | Start/stop daemon, view logs, health status | | Costs | LLM token usage, spend tracking by model and day | | Memory | Search and browse persistent agent memory |
Goal Templates
Studio includes 6 pre-built goal templates for common workflows:
- 📊 System Health Monitor — Dashboard with CPU/memory/disk monitoring
- ✍️ Blog Post Writer — Research + write + SEO optimization
- 🕷️ Apify Actor Creator — Scaffold a web scraping actor
- 🔍 Code Review & Refactor — Analyze and improve code quality
- 🔄 Data Pipeline — Fetch → transform → save with error handling
- 📅 Recurring Report — Automated daily/weekly reports
🛠️ Extensibility
Skills
Reusable AI capabilities defined by a skill.json manifest + prompt.md:
agent skills list # List installed skills
agent skills create my-skill # Create a custom skill
agent skills stats # View success metrics
agent skills fix my-skill # Auto-repair with LLMExample: Create .agent/skills/deploy/skill.json + prompt.md — the agent uses it whenever a deployment goal comes up.
Commands
Lightweight goal templates — just a markdown file with YAML frontmatter:
---
name: deploy-staging
description: Deploy current branch to staging
tools: [cmd.run, git.status]
---
# Deploy to Staging
1. Run `npm test` to verify all tests pass
2. Run `npm run build`
3. Push to staging branch> /deploy-staging # Use from interactive modeScripts
Direct automation (no LLM needed) — shell, Python, or Node.js:
# .agent/scripts/deploy/script.yaml
name: deploy-staging
description: Build and deploy to staging
entrypoint: run.shagent scripts run deploy-stagingThe daemon auto-discovers scripts and can execute them via the script.run tool.
Plugins
Bundle native Node.js tools, skills, commands, scripts, and hooks into a single distributable package. The Agent Hub acts as the official registry for community plugins.
# Install the official GitHub plugin from the Hub
agent plugins install github
# Or install from a local path
agent plugins install ./my-plugin
# List installed plugins
agent plugins listFeatured Plugin: GitHub (github)
- Grants the agent zero-dependency native control over GitHub.
- Can create repos, open PRs, and manipulate Issues.
- Unlocks Advanced Global Search natively.
- Can view, dispatch, and monitor GitHub Actions CI/CD workflows.
16 Plugins Available — Slack, Notion, Vercel, Supabase, Stripe, AWS, Discord, OpenAI, Linear, Docker, MongoDB, Firebase, Telegram, HuggingFace, Resend.
🐳 Sandboxed Execution
Run commands safely inside Docker containers:
agent sandbox start # Spin up ephemeral container
agent sandbox status # Container info
agent sandbox stop # Destroy sandbox🐝 Multi-Agent Swarm
Coordinate specialized agents (Planner, Coder, Reviewer, Researcher, Tester):
agent swarm start "Build a REST API with auth"
agent swarm status # View agents & tasks
agent swarm roles # List available roles🖥️ Desktop Automation
Cross-platform desktop control (Linux, macOS, Windows):
agent desktop screenshot # Capture screen
agent desktop click 500 300 # Mouse click
agent desktop type "Hello" --enter
agent desktop hotkey ctrl+s # Keyboard shortcut🌈 Multimodal Interfaces
Voice, vision, and speech powered by OpenAI:
agent multimodal transcribe audio.wav # Whisper STT
agent multimodal analyze image.png # GPT-4o Vision
agent multimodal speak "Done!" # TTS🌐 Browser Automation (Playwright)
Built-in headless browser control with session persistence:
# Setup (one-time)
npx playwright install chromium
# The agent uses browser tools automatically:
agent run "Open https://example.com and scrape the heading"
agent run "Log into dashboard and download the monthly report"| Tool | What It Does |
|------|--------------|
| desktop.browser.open | Navigate to a URL (headless by default) |
| desktop.browser.click | Click elements by CSS/XPath selector |
| desktop.browser.fill | Type into input fields |
| desktop.browser.scrape | Extract text or HTML from the page |
| desktop.browser.screenshot | Capture a PNG screenshot |
| desktop.browser.close | Close browser and persist session |
Session Persistence — Cookies and localStorage are saved to .agent/browser-session.json on close and restored on next open, so the agent stays authenticated across runs.
☁️ Remote Execution (Agent Cloud)
Offload heavy LLM inference to a remote server while streaming output to your local terminal:
# On the remote server / cloud VM:
agent studio --port 3333
# On your local machine:
agent run "Summarize the entire codebase" --remote http://server:3333Uses Server-Sent Events (SSE) via POST /api/execute — progress, warnings, and results stream back in real-time.
Lifecycle Hooks
Intercept execution at 10 event points:
{
"hooks": {
"after:tool": [{
"match": "fs.write",
"command": "npx prettier --write {{path}}"
}]
}
}🤖 Interactive Mode
The conversational REPL with multi-turn context:
agent
> Add rate limiting to the /api/auth endpoint
⚡ fs.read(src/routes/auth.ts) ✓
⚡ fs.write(src/middleware/rateLimit.ts) ✓
✓ Done
> Now write tests for it
⚡ fs.write(src/__tests__/rateLimit.test.ts) ✓
⚡ cmd.run(npm test) ✓
✓ All 5 tests passing
> /deploy-staging
Running command: deploy-staging...Slash Commands
| Command | Action |
|---------|--------|
| /help | Show all available commands |
| /skills | List installed skills |
| /commands | List available commands |
| /scripts | List available scripts |
| /model | Display LLM provider info |
| /compact | Summarize and free context |
🏗️ Architecture
┌─────────────────────────────────────────────────────────┐
│ CLI / REPL / Studio │
├─────────────────────────────────────────────────────────┤
│ LLM Router │
│ OpenAI │ Anthropic │ Azure │ Ollama (fallback chain) │
├──────────┬──────────┬──────────┬────────────────────────┤
│ Skills │ Commands │ Scripts │ Plugins │
│ prompt │ .md │ .yaml │ bundles │
├──────────┴──────────┴──────────┴────────────────────────┤
│ Tool Registry & Policy Engine │
│ fs.* │ cmd.run │ git.* │ http.* │ secrets.* │ browser.* │
├─────────────────────────────────────────────────────────┤
│ Goal Decomposer │ Daemon │ Credential Vault │ Memory │
│ Remote Execute │ Browser Manager │ Session Persistence │
└─────────────────────────────────────────────────────────┘Key Components
| Component | Purpose |
|-----------|---------|
| LLM Router | Multi-provider routing with fallback chains (OpenAI → Anthropic → Ollama) |
| Goal Decomposer | LLM-powered breakdown of goals into dependency-aware task graphs |
| Daemon Service | Background task runner with parallel execution, retries, re-planning |
| Credential Vault | AES-256-GCM encrypted secret storage with .env fallback |
| Tool Registry | Sandboxed execution with permission gates |
| Policy Engine | Human-in-the-loop approval for sensitive operations |
| Memory Store | SQLite + FTS5 persistent memory across sessions |
| Plugin Loader | Discovers and loads sub-packages of skills, commands, scripts, hooks |
📋 Full CLI Reference
Core Commands
agent # Interactive REPL
agent run "<goal>" # One-shot goal execution
agent run "<goal>" --remote URL # Execute on a remote server
agent init # Initialize project
agent studio # Web dashboard at :3333
agent doctor # System health check
agent update # Update to latest versionGoal & Daemon
agent goal add "<title>" # Create a goal
agent goal list # List all goals
agent goal decompose <id> # AI breakdown into tasks
agent goal status <id> # Task-level progress
agent daemon start # Start background worker
agent daemon stop # Stop gracefully
agent daemon status # Health & uptime
agent daemon logs # Recent execution logsCheck session status
$ agent sessions list
Resume a session natively
$ agent run --session $ agent sessions resume
#### Multi-Agent Swarm & Remote Delegation
Coordinate multiple specialized agents, or assign tasks to remote instances securely using API keys.
```bash
# Start a multi-agent orchestrated run locally
$ agent swarm start "Refactor the database schema" --max-agents 3
# Add a remote agent instance from another machine
$ agent swarm add-remote http://10.0.0.5:3334 coder --name "MacBook Pro" --key "oas_abcd123"MCP Server (Model Context Protocol)
Expose the agent's files, memories, and skills (as prompts) to an IDE like Cursor over Stdio or HTTP+SSE.
$ agent mcp # runs STDIO transport
$ agent mcp --http 3100 # runs HTTP+SSE transport on port 3100Plugin Management
Install 1st-party or community plugins directly from GitHub.
$ agent plugins search "database"
$ agent plugins install https://github.com/postgres-connectorConfig
agent.yaml (or .agent/config.json)
llm:
provider: openai # openai | anthropic | azure | ollama
model: gpt-4o
fallback:
- provider: anthropic
model: claude-3-sonnet
daemon:
maxConcurrent: 3 # Parallel task limit
policy:
permissions:
- "*" # Wildcard for full autonomyLLM Providers
| Provider | Env Variable | Models |
|----------|-------------|--------|
| OpenAI | OPENAI_API_KEY | gpt-4o, gpt-4o-mini |
| Anthropic | ANTHROPIC_API_KEY | claude-3-sonnet, claude-3-opus |
| Azure OpenAI | AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT | Any deployed model |
| Ollama | None (local) | llama3, codellama, mistral |
📋 Full CLI Reference
Core Commands
agent # Interactive REPL
agent run "<goal>" # One-shot goal execution
agent run "<goal>" --remote URL # Execute on a remote server
agent init # Initialize project
agent studio # Web dashboard at :3333
agent doctor # System health check
agent update # Update to latest versionGoal & Daemon
agent goal add "<title>" # Create a goal
agent goal list # List all goals
agent goal decompose <id> # AI breakdown into tasks
agent goal status <id> # Task-level progress
agent daemon start # Start background worker
agent daemon stop # Stop gracefully
agent daemon status # Health & uptime
agent daemon logs # Recent execution logsSkills, Commands, Scripts, Plugins
agent skills list | create | stats | fix
agent commands list
agent scripts list | run <name> | show <name>
agent plugins list | install <path> | remove <name>
agent hooks list | add <event> <cmd>Memory & Reports
agent memory search "<query>" # Semantic search
agent memory add "<fact>" # Store a fact
agent report generate # Activity summary📁 Project Structure
After agent init, your project contains:
your-project/
├── .agent/
│ ├── config.json # Agent configuration
│ ├── vault.json # Encrypted credentials (auto-created)
│ ├── memory.db # SQLite persistent memory
│ ├── daemon.log # Daemon execution log
│ ├── skills/ # Custom skills
│ │ └── my-skill/
│ │ ├── skill.json
│ │ └── prompt.md
│ ├── commands/ # Lightweight commands
│ │ └── deploy.md
│ ├── scripts/ # Automation scripts
│ │ └── health-check/
│ │ ├── script.yaml
│ │ └── run.sh
│ ├── plugins/ # Installed plugins
│ └── hooks/
│ └── hooks.json # Lifecycle hooks
└── .env # Environment variables (auto-detected)🔒 Security
- Credential encryption — AES-256-GCM with machine-specific keys
- Permission gating — Policy engine controls which tools can execute
- Human-in-the-loop — Tasks can require manual approval before executing
- No credential leaking — Secrets are never logged or included in LLM prompts as raw values
- Sandboxed execution — Tools execute within the project directory scope
🆕 What's New in v0.12.0
- 🤖 Native Anthropic Computer Use — Seamlessly integrates the
computer_20241022tool spec globally. - 🖥️ OS-Level UI Parsing — Native accessibility extraction parsing the screen into a true Desktop DOM (macOS, Windows, Linux AT-SPI) via
desktop.ui_tree. - 🕵️ GUI Operator Persona — The
operatorswarm agent role takes full autonomous control of your mouse and keyboard using live screen context. - ☁️ Persistent Sessions — Agent conversations backed into
memory.dbfor multi-session resuming. - 🔗 MCP Server — Exposes built-in tools over HTTP/SSE via
agent mcp serve. - 🔌 Plugin HUB — Remote 1-click install via URL / UI marketplace.
🆕 What's New in v0.11.0
- 🌐 Browser Automation — Built-in Playwright browser control (
desktop.browser.*) with headless/headed modes and session persistence - ☁️ Remote Execution —
agent run --remote http://server:3333offloads LLM inference to a cloud server with real-time SSE streaming - 📡 POST /api/execute — New streaming API endpoint for remote goal execution
- 🚀 Remote Studio Access —
agent studio --remotegenerates a secure tunnel URL + QR code for mobile access - 📡 Live Task Streaming — Real-time event timeline of daemon task execution
- 🔑 Interactive Credential Capture — When the daemon needs a secret mid-task, a modal pops up in Studio
- 🔔 Notifications Plugin — Auto-notifies on goal completion/failure via Slack, Discord, or Email
- 💰 Cost Tracker Plugin — Token usage + cost tracking with Studio dashboard
- ⚡ Parallel Task Execution — Up to 3 independent tasks run simultaneously
- 🔗 Task Output Chaining — Downstream tasks receive upstream results
- 🔁 Dynamic Re-decomposition — Failed tasks trigger LLM re-planning
- 🐝 Multi-Agent Swarm — Coordinate specialized agents (Planner, Coder, Reviewer)
- 🖥️ Desktop Automation — Cross-platform screen capture, mouse, and keyboard control
- 🌈 Multimodal — Voice transcription, image analysis, and text-to-speech
- 🐳 Sandboxed Execution — Run commands in Docker containers
🤝 Contributing
We welcome contributions! Key areas:
- Writing new Skills and Commands
- Improving LLM prompt engineering
- Building Studio UI components
- Creating community Plugins
- Writing documentation
License
MIT
