@charzhu/openjaw-agent

v0.3.2

Published

7 days ago

OpenJaw Agent — Autonomous desktop AI assistant for the terminal. Rich Ink TUI, 100+ tools, multi-channel bridges (Telegram, Feishu, Teams, WeChat). Standalone, no MCP server required.

0High
0Medium
0Low

charzhu

ai agent assistant terminal tui ink anthropic openai github-copilot automation desktop telegram feishu teams wechat

What is OpenJaw Agent?

OpenJaw Agent is a standalone AI assistant that runs in your Windows terminal and can automate your entire desktop:

📧 Email — Read, compose, reply, forward, search (Outlook via COM)
💬 Teams — Send messages, read chats, monitor conversations (UIA automation)
💬 WeChat — Send messages, read chats via official iLink Bot API
🌐 Browser — Navigate, click, type, screenshot, extract content (Chrome CDP)
📄 Office — Word, Excel, PowerPoint automation (COM)
📁 Files — Read, write, edit, search, list, delete
🖥️ System — Run commands, clipboard, notifications, web search
🧠 Memory — Persistent hybrid-search memory, shared with MCP mode
🔌 MCP — Auto-discovers and connects to external MCP servers
🗣️ Voice — Text-to-speech (edge-tts) and speech-to-text input
🎓 Skills — 19 bundled skills for email drafting, research, document creation, and more

It connects to Claude or GPT via your proxy or API key, reasons about your request, picks the right tools, and executes them autonomously.

Multi-Channel Access

Beyond the terminal, the agent can be accessed through messaging bridges — all running the same agent loop with full tool access:

| Channel | Flag | How It Works | |---------|------|-------------| | Terminal | (default) | New TUI built on React + vendored @openjaw/ink, with streaming, status bars, tool progress, and model/session controls | | Telegram | --telegram | Long-polling bot — text, voice, photos. No public IP needed | | Telegram headless | --telegram --headless | Telegram-only, no terminal UI | | Teams | --teams | Self-chat bridge via Graph API. No bot registration needed | | Feishu (Lark) | --feishu | WebSocket events via official SDK | | WeChat | --wechat | iLink Bot API — QR login from iOS WeChat | | Legacy Ink UI | --legacy-ui | Previous Ink UI (pre-rewrite) for A/B comparison | | Legacy REPL | --legacy | Simple readline REPL |

Bridges run alongside the terminal UI (hybrid mode) or standalone (headless).

Terminal UI Architecture

The default terminal experience is the new TUI rewrite. src/bootstrap.ts initializes the in-process AgentLoop, tools, MCP, memory, voice, and bridge services; src/eventBridge.ts translates agent chunks into typed GatewayEvent objects; and src/agentBus.ts exposes a hermes-compatible GatewayClient event/RPC bus. src/rpcHandlers.ts serves the ported hermes hooks (useMainApp, useSessionLifecycle, useSubmission, and related hooks), while nanostores under src/app/* hold UI state for turns, overlays, selections, and delegation. React components in src/components/* render through the vendored @openjaw/ink renderer, a fork of hermes ink; see packages/openjaw-ink/VENDOR.md for provenance.

Use the default command for the new TUI, --legacy-ui to A/B test against the previous Ink UI, or --legacy for the readline REPL. See docs/TUI.md for a concise map of the UI rewrite.

Quick Start

Prerequisites

Node.js 22.5+ — Download. Uses the built-in node:sqlite module. Node 23.5+ recommended (its bundled SQLite includes FTS5 for faster memory search; older builds fall back to a LIKE-based scan).
Windows 10/11 recommended (some automation tools require Windows; cross-platform basics work elsewhere)
LLM access — Maestro proxy, Anthropic/OpenAI API keys, or GitHub Copilot via /connect

Install from npm (recommended)

# Global install — exposes the `openjaw-agent` command on PATH
npm install -g @charzhu/openjaw-agent
openjaw-agent

# Or run on demand without installing
npx @charzhu/openjaw-agent

The package ships a single bundled dist/main.js (no postinstall build step). After install, run /connect to pick a provider, then /model to pick a model.

Install from source (for contributors)

git clone https://github.com/charzhu/openjaw.git
cd openjaw/openjaw-agent

# One-click install (builds openjaw core + agent, configures proxy)
install.bat

# Or manually:
cd ../openjaw-mcp && npm install && npm run build   # build core first
cd ../openjaw-agent && npm install && npm run build

Configure

For normal interactive use, you do not need to edit provider keys in YAML. Launch OpenJaw Agent, run /connect to pick Maestro, Anthropic, OpenAI, or GitHub Copilot, then run /model to choose from connected providers. /connect stores direct provider credentials in ~/.openjaw-agent/auth.json and each provider context starts with a sensible default model.

config.yaml in the package directory, or ~/.openjaw-agent/config.yaml after first run, is mainly for advanced defaults and non-interactive deployments:

llm:
  provider: anthropic              # updated by /connect
  model: claude-sonnet-4-20250514   # updated by /model
  api_key: proxy-token             # placeholder; direct keys are stored by /connect
  base_url: http://localhost:23333/api/anthropic  # default Maestro proxy URL
  max_tokens: 16384
  temperature: 0.7

Advanced OpenAI-compatible proxy tuning is still available with openai_tool_mode: auto | compact | full and openai_max_tools, but it is not part of the normal setup flow.

llm:
  provider: openai
  model: gpt-5.4
  api_key: proxy-token  # actual key is read from ~/.openjaw-agent/auth.json
  max_tokens: 16384
  temperature: 0.7

For local proxy mode through Agent Maestro (http://localhost:23333/api/anthropic or http://localhost:23333/api/openai), use /connect maestro, then /model to choose a Maestro model. Maestro model selections use the local proxy and do not use Anthropic/OpenAI API keys stored by /connect. For OpenAI-compatible proxy requests, the default openai_tool_mode: auto sends a compact subset of tools instead of the full registry. All tools remain executable locally; the model can request additional tools with openjaw_load_tools. Use openai_tool_mode: full only for endpoints that reliably handle the full tool inventory.

llm:
  provider: anthropic
  model: claude-sonnet-4-20250514
  api_key: proxy-token  # actual key is read from ~/.openjaw-agent/auth.json
  max_tokens: 16384
  temperature: 0.7

telegram:
  token: "YOUR_BOT_TOKEN"           # Get from @BotFather
  allowed_users: [YOUR_USER_ID]     # Get from @userinfobot

feishu:
  app_id: "YOUR_APP_ID"
  app_secret: "YOUR_APP_SECRET"
  allowed_users: ["open_id_1"]      # Optional allowlist

Run

# Launch with the new TUI (default — React + @openjaw/ink)
openjawagent.bat
node dist/main.js

# A/B test against the previous Ink UI (pre-rewrite)
node dist/main.js --legacy-ui

# Launch with messaging bridges
node dist/main.js --telegram          # New TUI + Telegram
node dist/main.js --telegram --headless  # Telegram only
node dist/main.js --teams             # New TUI + Teams self-chat
node dist/main.js --feishu            # New TUI + Feishu
node dist/main.js --wechat            # New TUI + WeChat

# Launch with legacy readline REPL
node dist/main.js --legacy

Usage

Chat Naturally

Just type your request:

❯ read my latest emails and summarize them
❯ send a teams message to PM Team Meeting saying "heading into meeting"
❯ what did Chenlu say in Teams this week?
❯ open the report in Excel and add a SUM formula in B10

The agent reasons about your request, picks the right tools, executes them, and shows results with real-time feedback.

Commands

| Command | Description | |---------|-------------| | /help | Show all commands | | /model | Switch LLM provider/model (interactive picker) | | /connect | Connect/disconnect/list LLM provider contexts | | /schedule "prompt" every 15m | Schedule recurring tasks | | /schedule list | Show active scheduled tasks | | /schedule stop <id> | Cancel a scheduled task | | /workflow <goal> | Start an advisory dynamic workflow with read-only workers | | /workflow status [id] | Open navigable worker status and detail view | | /tools | List all available tools | | /mcp | Manage MCP server connections | | /memory | Search persistent memory | | /voice | Toggle voice input/output | | /clear | Clear conversation history (new session) | | /exit | Quit the agent |

Switch Models at Runtime

❯ /model
Current: anthropic/claude-sonnet-4-20250514

? Select model (↑↓ to navigate, Enter to select):
❯ anthropic/claude-opus-4.6
  anthropic/claude-sonnet-4.6
  anthropic/claude-sonnet-4-20250514
  anthropic/claude-haiku-4.5
  Maestro/gpt-5.4
  Maestro/gpt-4.1
  ...

/model shows models from connected provider contexts only. If Maestro is connected, models are grouped under Maestro but still select the underlying anthropic or openai provider so the correct local proxy endpoint is used. Selection is saved to config and persists across restarts.

Connect Providers

Use /connect after launch to pick the active provider context. Direct provider credentials are stored locally under ~/.openjaw-agent/auth.json; secrets are not written to the repo config or shown in status output. Selecting a provider also applies its default model, and /model can change it afterward.

For direct Anthropic/OpenAI usage, run:

❯ /connect anthropic <api-key>
❯ /connect openai <api-key>

The command stores the API key locally and switches the active provider context. Local Maestro proxy mode does not require these keys:

❯ /connect maestro

OpenJaw Agent also supports GitHub Copilot as a first-class provider. The bundled Copilot client ID currently uses opencode's OAuth app because testing showed it exposes the full Copilot model list. You can override it with your organization's OAuth App client ID if needed.

llm:
  provider: github-copilot
  model: gpt-5.4
  api_key: proxy-token
  copilot_oauth_client_id: Iv1.b507a08c87ecfe98

You can also override it with GITHUB_COPILOT_CLIENT_ID in the environment.

❯ /connect github-copilot

The command prints a GitHub device-login URL and one-time code, waits for authorization, then stores the credential. For GitHub Enterprise:

❯ /connect github-copilot enterprise company.ghe.com

Then switch models:

❯ /model github-copilot gpt-5.4

Useful credential commands:

| Command | Description | |---------|-------------| | /connect | Show connected provider contexts and switch options | | /connect maestro | Use local Maestro proxy with no API key | | /connect anthropic <api-key> | Store an Anthropic API key and select Anthropic | | /connect openai <api-key> | Store an OpenAI API key and select OpenAI | | /connect list | List connected contexts and stored credentials without showing secrets | | /connect status | Show credential store path and provider status | | /connect disconnect | Pick a connected provider, including Maestro, to disconnect |

Copilot model discovery uses GitHub Copilot's /models endpoint when connected and falls back to a static catalog when offline. GPT models use Copilot's OpenAI-compatible endpoints; Copilot Claude models exposed through /v1/messages use the Anthropic-compatible shim.

Schedule Recurring Tasks

❯ /schedule "check my inbox for urgent emails" every 15m
  ✓ Scheduled task #1: runs every 15 minutes

❯ /schedule list
  #1  every 15m  runs: 3  last: 7:35 AM
      "check my inbox for urgent emails"

❯ /schedule stop 1
  ✓ Task #1 stopped

Sessions (Resume Conversations)

Sessions auto-save after each turn. Resume anytime:

# List recent sessions
node dist/main.js --sessions

# Resume the most recent session
node dist/main.js --continue    # or -c

# Resume a specific session
node dist/main.js --resume a1b2c3d4

Skills

19 bundled skills provide multi-step workflows that the agent can invoke:

| Skill | Description | |-------|-------------| | daily-briefing | Morning briefing from emails, calendar, Teams | | deep-research | Multi-source web research with synthesis | | email-drafting | Compose polished emails | | email-with-attachment | Send emails with file attachments | | meeting-summarizer | Summarize meeting transcripts | | create-docx | Create Word documents | | create-pptx | Create PowerPoint presentations | | create-pdf-report | Generate PDF reports | | data-analysis | Analyze datasets and create charts | | web-research | Quick web lookups | | translation | Multi-language translation | | proofreading | Grammar and style review | | summarization | Summarize long documents | | desktop-cleanup | Organize files and desktop | | doc-coauthoring | Co-author documents with edit tracking | | internal-comms | Draft internal communications | | competitive-battlecard | Competitive analysis documents | | skill-creator | Create new custom skills | | refresh-token | Refresh auth tokens for bridges |

Custom skills can be added to ~/.openjaw-agent/skills/ as Markdown files. User skills override bundled skills of the same name.

MCP Integration

OpenJaw Agent auto-discovers and connects to external MCP servers from 5 config sources (in priority order):

~/.openjaw-agent/mcp.json — Agent's own config (always trusted)
.mcp.json + parent directories — Claude Code project config
~/.copilot/mcp-config.json — GitHub Copilot CLI config
~/.copilot/installed-plugins/ — Copilot marketplace plugins (WorkIQ, etc.)
.vscode/mcp.json — VS Code config (normalized to standard format)

Supports stdio, SSE, and Streamable HTTP transports. Tools from external servers appear as mcp__<server>__<tool> and can be used alongside built-in tools.

Use /mcp to interactively manage connections — toggle servers, view tools, reconnect.

Dynamic Workflows

Use /workflow <goal> for complex advisory work that benefits from parallel read-only workers. OpenJaw asks the active model to return a JSON workflow graph, dynamically plans worker count from the goal, schedules workers through an adaptive queue, and shows progress in the /workflow status [id] overlay. Use arrow keys or j/k to select workers, Enter/→ for details, Esc/← to return, s to sort, f to filter, and q to close.

Workflows include a final synthesizer worker. When the workflow completes, the synthesizer's answer is posted back into the transcript and is also visible through /workflow show [id] and the worker details view.

Workflow workers are intentionally non-mutating in this version: they can inspect files and search, but cannot edit files, run shell/code execution, send messages, or update memory. Results are persisted under ~/.openjaw-agent/workflows/ and replayable through the workflow status and spawn-tree archive paths.

Architecture

openjaw-agent/
├── src/
│   ├── main.ts              # CLI dispatch: new TUI default, --legacy-ui, --legacy
│   ├── entry.tsx            # New TUI entry; renders React with @openjaw/ink
│   ├── bootstrap.ts         # Initializes AgentLoop, tools, MCP, memory, voice, bridges
│   ├── agentBus.ts          # In-process GatewayClient event/RPC bus
│   ├── agentEvents.ts       # GatewayEvent type union for the ported UI
│   ├── eventBridge.ts       # Converts AgentLoop chunks into GatewayEvents
│   ├── rpcHandlers.ts       # Hermes-style RPC handlers for UI hooks
│   ├── app.tsx              # New TUI root component
│   ├── app/                 # Ported hermes hooks + nanostores UI state
│   │   ├── useMainApp.ts
│   │   ├── useSessionLifecycle.ts
│   │   ├── useSubmission.ts
│   │   └── uiStore.ts       # Plus overlay/turn/delegation/input stores
│   ├── components/          # New TUI React components
│   │   ├── appChrome.tsx
│   │   ├── streamingAssistant.tsx
│   │   ├── sessionPicker.tsx
│   │   ├── modelPicker.tsx
│   │   └── todoPanel.tsx
│   ├── legacy-ink-ui.tsx    # Previous Ink UI (--legacy-ui)
│   ├── agent-loop.ts        # ReAct orchestrator (parallel tools, usage tracking)
│   ├── mcp-client.ts         # MCP client — auto-discovers external servers
│   ├── bridges/
│   │   ├── telegram.ts       # Telegram bot bridge (long-polling)
│   │   ├── teams.ts          # Teams self-chat bridge (Graph API)
│   │   ├── feishu.ts         # Feishu/Lark bridge (WebSocket)
│   │   ├── wechat.ts         # WeChat iLink bridge (HTTP polling)
│   │   └── format.ts         # Platform-specific message formatting
│   ├── voice/
│   │   ├── index.ts          # Voice manager (enable/disable, settings)
│   │   ├── tts.ts            # Text-to-speech (edge-tts)
│   │   └── stt.ts            # Speech-to-text (Windows Speech Recognition)
│   ├── prompts/
│   │   ├── index.ts          # Structured prompt assembly (static/dynamic boundary)
│   │   ├── sections.ts       # Memoization framework + SYSTEM_PROMPT_DYNAMIC_BOUNDARY
│   │   ├── identity.ts       # Static: agent persona
│   │   ├── reasoning.ts      # Static: ReAct rules
│   │   ├── safety.ts         # Static: safety guardrails
│   │   ├── computerUse.ts    # Static: computer use guidelines
│   │   ├── user.ts           # Dynamic: user preferences
│   │   ├── memory.ts         # Dynamic: persistent memory
│   │   └── context.ts        # Dynamic: runtime context
│   ├── providers/
│   │   ├── types.ts          # LLMProvider + streaming + usage types
│   │   ├── anthropic.ts      # Anthropic (cache_control, streaming, usage)
│   │   ├── openai.ts         # OpenAI (Responses + Completions API, usage)
│   │   ├── cache-control.ts  # System prompt → cache-scoped blocks
│   │   └── index.ts          # Provider factory
│   ├── skills/
│   │   └── registry.ts       # Skill discovery (bundled + user overrides)
│   ├── tools/
│   │   └── skill-tool.ts     # LLM-callable skill invocation tool
│   ├── utils/
│   │   └── frontmatter.ts    # Markdown frontmatter parser for skills
│   ├── cost-tracker.ts       # Per-model pricing + session cost tracking
│   ├── context-manager.ts    # Token estimation + context window warnings
│   ├── cache-monitor.ts      # Prompt cache break detection between turns
│   ├── telemetry.ts          # JSONL event logging (~/.openjaw-agent/telemetry/)
│   ├── config.ts             # Config loader (~/.openjaw-agent/config.yaml)
│   ├── session.ts            # Session persistence + resume
│   ├── scheduler.ts          # Recurring task scheduler
│   ├── fork.ts               # Background sub-agent spawning
│   ├── pet.ts                # Pet companion system (Chinese mythical creatures)
│   ├── computer-use.ts       # Anthropic computer use executor
│   ├── clipboard-image.ts    # Windows clipboard image reader
│   ├── image-resize.ts       # Image processing for vision APIs
│   └── repl.ts               # Legacy readline REPL (--legacy mode)
├── prompts/                   # Markdown prompt files
│   ├── IDENTITY.md
│   ├── REASONING.md
│   ├── SAFETY.md
│   ├── COMPUTER_USE.md
│   └── USER.md
├── skills/                    # Bundled skill definitions (19 skills)
├── docs/
│   └── TUI.md                # Concise TUI rewrite overview
├── packages/
│   └── openjaw-ink/          # Vendored @openjaw/ink renderer forked from @hermes/ink
├── config.yaml               # Bundled default config
├── install.bat               # One-click Windows installer
├── openjawagent.bat          # Launch script
├── package.json
└── tsconfig.json

Key Design Decisions

TUI Rewrite — The default UI is an in-process React TUI rendered by @openjaw/ink. GatewayEvent streaming and RPC handlers isolate AgentLoop/service logic from ported hermes hooks, while nanostores keep UI state small and composable. Use --legacy-ui for the previous Ink UI when comparing behavior.

ReAct Loop — The agent loop implements a Reasoning + Acting pattern: the LLM reasons about the request, selects tools, the loop executes them in parallel, feeds results back, and repeats until done.

Prompt Caching — The system prompt is split into static and dynamic sections separated by a boundary marker. Static sections (identity, reasoning, safety, computer use) get Anthropic's cache_control: { type: 'ephemeral' } for prompt caching. Dynamic sections (user profile, memory, context) are memoized per session. Cache breaks are tracked by CacheMonitor to surface cost spikes.

Tool Permission Model — Sensitive tools (shell commands, file deletion, file writes) trigger an interactive permission dialog. Users can approve once or allow-all for the session.

Fork System — The agent can spawn background sub-agents for parallel work. Forks share the parent's system prompt and tools but run with their own conversation history.

Telemetry — Every agent turn is logged as structured JSONL to ~/.openjaw-agent/telemetry/ with token counts, cost, duration, and cache metrics. Files rotate daily.

Provider Plugin Architecture

Adding a new LLM provider:

Create src/providers/yourprovider.ts implementing LLMProvider
Add case 'yourprovider': to src/providers/index.ts
That's it — config-driven, no other changes needed

Shared Memory

Both OpenJaw Agent and OpenJaw MCP server share the same SQLite database at ~/.openjaw/memory.db:

Hybrid search: FTS5 keyword + Jaccard + HRR semantic similarity
Searchable via /memory command or memory_search tool

No duplication — write from either mode, search from either mode.

No Conflicts with MCP Server

Both can run simultaneously:

| | MCP Server | Agent | |-|-----------|-------| | Transport | stdio (pipe) | terminal (readline/Ink) | | Ports | none | none | | Config | ~/.openjaw/config.yaml | ~/.openjaw-agent/config.yaml | | Memory | ~/.openjaw/memory/ (shared) | ~/.openjaw/memory/ (shared) | | Sessions | managed by host | ~/.openjaw-agent/sessions/ | | Telemetry | — | ~/.openjaw-agent/telemetry/ |

Build from Source

cd projects/openjaw-agent

# Install dependencies (includes openjaw core as local dependency)
npm install

# Build TypeScript
npm run build

# Run
node dist/main.js

# Or use the batch file
openjawagent.bat

Development

# Run in dev mode (tsx, no build needed)
npm run dev

# Rebuild after changes
npm run build

CLI Reference

openjaw-agent                              Start new session (new TUI)
openjaw-agent --legacy-ui                  Start previous Ink UI (pre-rewrite) for A/B comparison
openjaw-agent --legacy                     Start legacy readline REPL
openjaw-agent --resume <session-id>        Resume a specific session
openjaw-agent --continue (-c)              Resume the most recent session
openjaw-agent --sessions                   List recent sessions
openjaw-agent --telegram                   New TUI + Telegram bridge (hybrid)
openjaw-agent --telegram --headless        Telegram only (no terminal UI)
openjaw-agent --teams                      New TUI + Teams self-chat bridge
openjaw-agent --feishu                     New TUI + Feishu bot bridge
openjaw-agent --wechat                     New TUI + WeChat iLink bridge
openjaw-agent --help                       Show help

Environment Variables (Optional Overrides)

| Variable | Description | |----------|-------------| | ANTHROPIC_API_KEY | Anthropic API key (overrides config) | | ANTHROPIC_AUTH_TOKEN | Anthropic auth token (for proxy) | | ANTHROPIC_BASE_URL | Anthropic proxy URL | | OPENAI_API_KEY | OpenAI API key (overrides config) | | OPENAI_BASE_URL | OpenAI proxy URL |

Config file values are the default. Environment variables override when set.

License

Internal use only. Part of the BravoPM monorepo.