visionclaw
v0.1.67
Published
A personal assistant agent that runs on your desktop, receives commands from messaging channels, and executes tasks autonomously.
Readme
VisionClaw
A personal assistant agent that runs on your desktop (macOS/Windows). It receives command messages from pre-configured channels (Gmail, Telegram, Discord) and executes tasks autonomously using desktop control and browser automation. Results are sent back through the same channel.
Features
- Autonomous Desktop Agent: Runs continuously as a long-running process on your computer
- Gmail Identity: The agent has its own Gmail account for email and Google Calendar
- Multi-Channel Support: Receives commands via Gmail, Telegram, Discord
- Desktop Control: Takes screenshots, controls mouse/keyboard, runs terminal commands
- Browser Automation: Navigate and interact with web pages via Playwright
- Google Calendar: Manages its own schedule for recurring tasks and reminders
- Fast Responder: Auto-acknowledges messages while the agent is busy working on a task
- Self-Improving: Can add new skills to itself and upgrade to new versions
- Runtime Observability: Built-in HTTP obs page for live logs while the agent is running
Architecture
VisionClaw is built on the Claude Agent SDK V2. It runs as a single-threaded agent with a wake/sleep loop triggered by incoming messages or a periodic heartbeat.
src/
index.ts # CLI entry point
logger.ts # Structured logger
onboarding/ # Interactive setup wizard (Gmail, OAuth, channels)
agent/ # Core agent loop, session, context, fast-responder
tools/ # Custom tools (notify, browser, calendar, screenshot, etc.)
channels/ # Channel adapters (Gmail, Telegram, Discord)
email/ # Gmail email tool implementation
calendar/ # Google Calendar integration
memory/ # Persistent memory store
skills/ # Skill installation logic
config/ # Configuration management
obs/ # Runtime observability HTTP server
gui/ # Electron desktop GUIPrerequisites
- Node.js >= 20
- An Anthropic API key
- A dedicated Gmail account for the agent
- Google Cloud OAuth2 credentials (Client ID + Secret) with Gmail and Calendar API enabled
Setup
# Install dependencies
pnpm install
# Build
pnpm run build
# Run (starts onboarding if not configured)
pnpm start
# Or run in development mode
pnpm run dev
# Reconfigure an existing profile (add/remove channels, rotate keys)
visionclaw reconfigure --profile defaultThe first run triggers an interactive onboarding wizard that will:
- Ask for your Anthropic API key
- Ask for a dedicated Gmail address for the agent
- Walk through Google OAuth2 authorization (Gmail + Calendar scopes)
- Optionally configure Telegram and Discord
Configuration is stored per profile at ~/.visionclaw/profiles/<profile>/config.json.
Observability (HTTP)
When the agent is running, it serves a local observability page showing live logs.
- URL:
http://127.0.0.1:3101/obs - SSE stream:
GET /obs/events - Snapshot:
GET /obs/snapshot
This is controlled via advanced config (not asked during onboarding):
{
"obs": {
"enabled": true,
"host": "127.0.0.1",
"port": 3101,
"bufferSize": 1000
}
}GUI
A desktop GUI (Electron) is available for configuration and monitoring:
cd gui
pnpm install
pnpm startChannels
| Channel | Requirements | Status | |-----------|----------------------------------|-------------| | Gmail | Gmail account (required) | Always on | | Telegram | Bot token from @BotFather | Optional | | Discord | Bot token + channel allowlist | Optional |
Custom Tools
| Tool | Description |
|------------------------|-------------|
| wait | Pause execution for a specified duration |
| notify_user | Send a message back through a channel (text + optional attachments) |
| finish | Signal task completion, return to sleep |
| take_screenshot | Capture desktop screenshot |
| browser | Open a Chrome instance with CDP for Playwright automation |
| manage_email | List, search, read, send, reply, and manage Gmail messages |
| manage_calendar | Manage Google Calendar events |
| manage_skills | Install, list, create, and delete skills |
| memory | Persistent memory storage across wake cycles |
| upgrade | Check for and install updates |
| computer_use_click | Click on a UI element described in natural language |
| computer_use_type | Type text into the focused field |
| computer_use_key | Press a key or key combination |
| computer_use_scroll | Scroll at a target location |
| computer_use_drag | Drag from one element to another |
