automatey

v0.1.16

Published

a month ago

Lean & mean MCP-powered CLI agent — Nemotron, OpenAI, Anthropic, Perplexity

0High
0Medium
0Low

ai agent mcp cli llm openai anthropic perplexity nemotron automatey

Automatey is a minimal, MCP-powered CLI agent that lets any LLM wield real tools — file ops, shell commands, web search, memory, planning — through a clean ReAct loop. No bloat. No framework lock-in. Just a sharp hook and a fast ship.

══════════════════════════════════════════════════════════════════════
  🤖  automatey — lean & mean agent
══════════════════════════════════════════════════════════════════════
  Provider:  nemotron
  Model:     nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
  Session:   session-2026-03-12
  Sandbox:   /workspace/my-project/sandbox

  Type /help for commands  |  Ctrl+C to exit
  Logs: tail -f ~/.automatey/logs/agent.log
══════════════════════════════════════════════════════════════════════

Features

Providers: Nemotron (vLLM/OpenAI-compatible), OpenAI, Anthropic, Perplexity
MCP tools: Any stdio or HTTP MCP server; auto-loaded from ./mcp.json, ~/.automatey/mcp.json, or bundled mcp.json
ReAct loop: Up to 50 tool-call rounds per message (configurable)
Chain-of-thought: /think toggle for Nemotron / Anthropic reasoning tokens
Sandbox: Isolated directory with timestamped subdirectories for agent file I/O and code execution; configurable via --sandbox or SANDBOX_MODE env var
Sub-agents: run_subagent built-in tool spawns stateless nested ReAct loops with full MCP access; supports AgentConfig declarations for reusable agent definitions
Ag-loop: Iterative eval-and-refine runner — runs a worker, judges output via LLM-as-judge, refines instructions, repeats until score threshold is met
LLM-as-Judge: Structured evaluation via generateObject() with Zod schemas; supports on-failure / always modes with Perplexity, OpenAI, Anthropic, or Nemotron as judge
Model builder: Shared buildModel() utility constructs AI SDK model instances from a simple { provider, model } spec — used by judge, refiner, and ag-loop
Prompt refiner: Rewrites agent instructions using judge feedback; enforces brevity constraints for smaller worker models
Sessions: Save/load conversation sessions in ~/.automatey/sessions/
Checkpoints: /checkpoint — full conversation snapshots with BM25 keyword search
Auto-compact: LLM summarises older context when usage ≥ 90%
Skills: Progressive SKILL.md loading from the canonical universal path .agents/skills/, plus extra directories via AUTOMATEY_SKILLS_DIRS
Eval system: /eval <file.jsonl> runs MCQ, keyword, and regex benchmarks with per-category accuracy reports
Planner MCP: Bundled mcp/planner — todos + plans, prompts for task breakdown
Coder MCP: Bundled mcp/coder — read/write/edit files, run commands, glob files, prompts for analysis
Agent MCP: Bundled mcp/agent — expose automatey as an MCP tool server (run tasks, check status, sampling, prompts, resources)

Quick Start

Install from npm (recommended)

npm install -g automatey
automatey chat

Or run without installing:

npx automatey chat
npx automatey "summarise this repo"
npx ay "what time is it?"

From source

git clone https://github.com/automatey-org/automatey.git
cd automatey
npm install
cp mcp.example.json mcp.json   # edit to add your API keys
npm run build
node dist/index.js chat

Override provider, model, and sandbox:

node dist/index.js chat --provider openai --model gpt-4o --sandbox ./my-sandbox

Non-interactive one-shot (quiet by default, --verbose to see diagnostics):

node dist/index.js chat --message "List all TODO comments in this repo"
node dist/index.js chat --message "Summarise this file" --verbose

CLI Installation

Option A — `npm link` (recommended)

npm run build
npm run link:cli   # registers "automatey" globally via symlink

After linking:

automatey chat
automatey --help

Unlink: npm run unlink:cli

Option B — Shell alias (`ay`)

./scripts/setup-alias.sh   # adds 'ay' alias to ~/.bashrc (or ~/.zshrc / fish)

Or manually:

alias ay="automatey"           # bash / zsh
abbr -a ay automatey           # fish

PowerShell (add to $PROFILE):

Set-Alias -Name ay -Value automatey

After setup:

ay chat                          # interactive session
ay chat --message "do something"  # one-shot
ay "do something"                 # one-shot shorthand
ay --help

Option C — Docker alias (`ay`)

For environments without Node.js, run automatey via Docker and alias it:

bash / zsh (add to ~/.bashrc or ~/.zshrc):

ay() {
  docker run -it --rm \
    --env-file "${AUTOMATEY_ENV:-$HOME/.automatey/.env}" \
    -v "$HOME/.automatey:/home/automatey/.automatey" \
    -v "$(pwd):/work" -w /work \
    maxgolovanov/automatey "$@"
}

PowerShell (add to $PROFILE):

function ay {
    docker run -it --rm `
        --env-file "$env:USERPROFILE\.automatey\.env" `
        -v "$env:USERPROFILE\.automatey:/home/automatey/.automatey" `
        -v "${PWD}:/work" -w /work `
        maxgolovanov/automatey @args
}

fish (add to ~/.config/fish/functions/ay.fish):

function ay
    docker run -it --rm \
        --env-file "$HOME/.automatey/.env" \
        -v "$HOME/.automatey:/home/automatey/.automatey" \
        -v (pwd):/work -w /work \
        maxgolovanov/automatey $argv
end

Usage is identical to a native install:

ay                               # interactive chat, history persisted
ay "summarise this repo"          # one-shot against the current directory

How persistence works: The -v ~/.automatey:/home/automatey/.automatey mount shares the host's ~/.automatey directory with the container. Everything the agent stores there — chat history, sessions, config, checkpoints, logs — survives across container runs.

The `~/.automatey` data directory

| Path | Contents | |------|----------| | ~/.automatey/config.json | LLM provider, model, base URL | | ~/.automatey/mcp.json | Global MCP server definitions | | ~/.automatey/mcp.chat.json | Chat-scoped subset of MCP servers | | ~/.automatey/.env | API keys and env overrides (Docker --env-file) | | ~/.automatey/history | Persistent readline history (up to 500 entries) | | ~/.automatey/sessions/ | Saved conversation sessions | | ~/.automatey/checkpoints/ | Full conversation snapshots | | ~/.automatey/planner/ | Planner MCP server data | | ~/.automatey/logs/ | Agent log files |

MCP Config — `mcp.json`

The agent looks for MCP config in this order:

./mcp.json (current working directory / project root)
~/.automatey/mcp.json (global fallback)
Bundled mcp.json from the package install directory (npx / global install)

Copy mcp.example.json from this repo as your starting point:

cp mcp.example.json mcp.json          # project-local
# OR
cp mcp.example.json ~/.automatey/mcp.json  # global

Prefer strict JSON with no comments. Field descriptions and examples live in docs/MCP-CONFIG.md.

Example mcp.json:

{
  "mcpServers": {
    "planner": {
      "transport": "stdio",
      "command": "node",
      "args": ["./mcp/planner/dist/server.js"]
    },
    "brave-search": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@brave/brave-search-mcp-server", "--transport", "stdio"],
      "env": { "BRAVE_API_KEY": "${env:BRAVE_API_KEY}" },
      "requiresEnv": "BRAVE_API_KEY"
    },
    "memento": {
      "transport": "http",
      "url": "http://localhost:3500/mcp",
      "portCheck": true
    }
  }
}

Config

Config lives in ~/.automatey/config.json (auto-created on first run):

{
  "provider": "nemotron",
  "defaultModel": "auto",
  "llm": {
    "baseUrl": "http://localhost:8002"
  }
}

For nemotron, defaultModel may be:

an exact model ID
auto to use the first model returned by /v1/models
a wildcard mask such as nvidia/NVIDIA-Nemotron-3-Super*

Environment variables (copy .env.defaults → .env to override):

| Variable | Default | Description | |---|---|---| | LLM_PROVIDER | nemotron | nemotron | openai | anthropic | perplexity | | LLM_MODEL | auto | Model ID, auto, or wildcard mask for nemotron | | LLM_BASE_URL | http://localhost:8002 | vLLM / OpenAI-compatible endpoint | | LLM_MAX_TOKENS | 200000 | Context window budget (chars, ~4/token) | | LLM_MAX_OUTPUT_TOKENS | 16384 | Per-call output limit | | TEMPERATURE | 0.6 | Sampling temperature | | AGENT_MAX_TOOL_ROUNDS | 50 | Max ReAct rounds | | AGENT_MAX_EMPTY_RETRIES | 2 | Retries on empty LLM response | | AGENT_COMPACT_THRESHOLD | 0.9 | Auto-compact at 90% context fill |

Commands

| Command | Description | |---------|-------------| | /help | Show all commands | | /model | List / switch model | | /think [on\|off\|budget N] | Toggle CoT reasoning | | /save [name] | Save session | | /load [name] | Load session | | /servers | Manage MCP connections | | /config | Show config | | /cost | Show estimated token usage | | /compact | Manually compact context via LLM summarization | | /checkpoint [save\|list\|restore N\|search q\|delete N] | Manage checkpoints | | /eval <file.jsonl> | Run a JSONL eval file against the current LLM | | /clear | Clear context | | /exit | Quit |

Context Management

Auto-compact

When estimated context usage reaches AGENT_COMPACT_THRESHOLD (default 90%), the older portion of the conversation is automatically summarized by the LLM and replaced with a concise summary message. This keeps the token count manageable without discarding knowledge.

Disable per-session: the /compact command can be used to trigger compaction manually at any time.

Checkpoints

Checkpoints are full conversation snapshots saved to ~/.automatey/checkpoints/ as JSON:

/checkpoint save              # save current conversation
/checkpoint list              # list checkpoints (newest first)
/checkpoint restore 2         # restore checkpoint #2 into context
/checkpoint search "bm25"     # BM25 keyword search across all checkpoints
/checkpoint delete 3          # delete checkpoint #3

The BM25 search indexes the full message history of every checkpoint and ranks them by keyword relevance.

Sub-Agents

Sub-agents let the main agent delegate focused tasks to a nested ReAct loop — with its own tools, model, and system prompt. The parent spawns the sub-agent, waits for its result, and continues.

How it works

The LLM calls the built-in run_subagent tool:

run_subagent(
  description: "Write and test a Node.js hello-world script",
  prompt: "Create hello.js that prints Hello World, run it with node, and confirm the output.",
  agentName: "nemotron-coder"   // optional — uses a named .agent.md definition
)

The sub-agent runs its own ReAct loop (up to maxToolRounds) and returns a plain-text result to the main agent.

Named agents — `.agent.md` files

Define reusable agents as .agent.md files in .agents/agents/ (project-local) or ~/.agents/agents/ (global). Format is YAML frontmatter + system prompt body:

---
name: nemotron-coder
description: >
  Code generation sub-agent on the local Nemotron vLLM instance.
  Use for writing, editing, and running code — offloads code tasks
  to a local model without burning cloud API quota.
model: auto
provider: nemotron
baseUrl: http://192.168.0.58:8002/v1
tools:
  - read_file
  - write_file
  - execute_command
  - glob_files
  - list_dir
  - search_text
  - edit_file
maxToolRounds: 12
---

You are a code-generation agent running on a local Nemotron model.
Write complete, working code and save it to disk.
Use execute_command to run and verify output. Never truncate code.

Place in:

.agents/agents/nemotron-coder.agent.md — project-local (resolved from cwd at runtime)
~/.agents/agents/nemotron-coder.agent.md — global (available in every directory)

Extra directories: set AUTOMATEY_AGENTS_DIRS (path-separator-delimited) to add more search paths.

Frontmatter fields

| Field | Description | |---|---| | name | Agent identifier (falls back to filename stem) | | description | When to use this agent — shown to the LLM in the system prompt | | model | Model ID, auto, or wildcard mask (e.g. *Super*, nvidia/*) | | provider | nemotron | openai | anthropic | perplexity | | baseUrl | Base URL for OpenAI-compatible endpoints | | apiKey | API key — supports ${ENV_VAR} interpolation | | tools | Allowlist of tool names (absent = inherit all parent tools) | | mcpServers | Allowlist of MCP server names (absent = inherit all) | | maxToolRounds | Max ReAct rounds (absent = parent's setting) |

Model resolution

For nemotron agents, model supports the same resolution as the main config:

model: auto              # picks the first model from /v1/models
model: "*Super*"         # glob-matches the first model containing "Super"
model: nvidia/*          # any nvidia/ model
model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4  # exact ID

automatey probes the baseUrl's /v1/models endpoint at sub-agent spawn time to resolve wildcards.

Listing agents

/agents list            # show all discovered agents + scope (project / global)
/agents info <name>     # show frontmatter + system prompt for an agent
/agents help            # usage and file format reference

Example prompts

Write a CLI tool in Go that downloads a URL to a file — use nemotron-coder

Run nemotron-coder to generate a Python FastAPI stub for a todo list API, then run it and show me the /docs URL

Delegate to nemotron-coder: read package.json, bump the patch version, and commit it

Generic sub-agents (no `.agent.md`)

Omit agentName and the parent's own LLM client and full tool set are used:

Split this task into three parallel sub-agents: one for backend tests,
one for frontend tests, one for linting. Run each and summarise.

Sub-agents are stateless — they do not share session, context, or compaction with the parent. Each sub-agent starts fresh with an empty context containing only its system prompt and the prompt passed by the parent.

Built-in MCP Servers

🗂 Coder (`mcp/coder`)

| Tool | What it does | |------|-------------| | read_file | Read file contents with optional line range | | write_file | Write / create a file | | edit_file | Replace an exact string in a file | | execute_command | Run a shell command (default cwd: sandbox) | | search_text | Grep-style text search | | list_dir | List directory contents | | glob_files | Find files by glob pattern (**/*.ts, src/**) |

Prompts: analyze-file, search-and-replace, scaffold

📋 Planner (`mcp/planner`)

Todos and multi-step plans persisted to ~/.automatey/planner/.

Prompts: break-down-task, daily-standup, project-plan

🤖 Agent (`mcp/agent`)

Exposes automatey itself as an MCP tool server — lets a host (e.g. VS Code Copilot) delegate tasks to automatey.

Tools:

| Tool | What it does | |------|-------------| | automatey_run_task | Run a task: spawns automatey as a child process with the given prompt, model, provider, sandbox, and timeout | | automatey_status | Check LLM reachability and return config info | | automatey_sample | Request an LLM completion from the connected client via MCP sampling |

Prompts: code-review, refactor, write-tests, explain, run-task

Resources: automatey://config (agent config), automatey://health (LLM reachability)

Sampling: The automatey_sample tool uses MCP sampling to request completions from the client's LLM. The client must advertise sampling capability.

Logging: Diagnostic messages sent to the client via MCP logging notifications.

All configuration comes from environment variables — zero hardcoded defaults:

| Variable | Required | Description | |---|---|---| | AUTOMATEY_CMD | ✅ | Command to invoke automatey (e.g. npx tsx /path/to/src/index.ts) | | LLM_BASE_URL | ✅ | vLLM / OpenAI-compatible endpoint | | LLM_MODEL | ✅ | Model ID | | LLM_PROVIDER | ✅ | nemotron | openai | anthropic | perplexity | | AUTOMATEY_TIMEOUT | | Default task timeout in ms (default: 300000) |

The agent MCP server has its own test suite (23 tests — unit, functional, e2e) in mcp/agent/tests/.

Docker

Run as a container

# Interactive chat (with persistent history + config)
docker run -it --rm \
  -e LLM_PROVIDER=openai \
  -e OPENAI_API_KEY=sk-... \
  -v ~/.automatey:/home/automatey/.automatey \
  maxgolovanov/automatey chat

# One-shot (non-interactive)
docker run --rm \
  -e LLM_PROVIDER=openai \
  -e OPENAI_API_KEY=sk-... \
  maxgolovanov/automatey --message "Summarise this repo" --verbose

Persistent data: The -v ~/.automatey:/home/automatey/.automatey volume mount persists history, sessions, checkpoints, config, and logs between container runs. See The ~/.automatey data directory for what's stored there.

Using Docker Model Runner as the LLM backend

Docker Model Runner (DMR) lets you run LLMs locally via an OpenAI-compatible API on localhost:12434. It ships with Docker Desktop and Docker Engine 28+ — no API key needed.

# Pull and start a model on the host first
docker model pull ai/llama3.2

The networking challenge: when automatey runs in a container localhost:12434 resolves inside the container (nowhere). You need to point it at the host.

Option A — `host.docker.internal` (Docker Desktop on Mac/Windows, Docker Engine 28+ on Linux)

docker run -it --rm \
  -e LLM_PROVIDER=openai \
  -e LLM_BASE_URL=http://host.docker.internal:12434/engines/v1 \
  -e LLM_MODEL=ai/llama3.2 \
  -e LLM_API_KEY=ignored \
  --add-host=host.docker.internal:host-gateway \
  maxgolovanov/automatey chat

--add-host=host.docker.internal:host-gateway is required on Linux Docker Engine (it's automatic on Docker Desktop).

Option B — `--network host` (Linux only, simplest)

docker run -it --rm \
  --network host \
  -e LLM_PROVIDER=openai \
  -e LLM_BASE_URL=http://localhost:12434/engines/v1 \
  -e LLM_MODEL=ai/llama3.2 \
  -e LLM_API_KEY=ignored \
  maxgolovanov/automatey chat

Option C — Docker Compose (recommended for persistent setups)

# docker-compose.yml
services:
  automatey:
    image: maxgolovanov/automatey
    stdin_open: true
    tty: true
    extra_hosts:
      - "host.docker.internal:host-gateway"  # Linux only; remove on Docker Desktop
    environment:
      LLM_PROVIDER: openai
      LLM_BASE_URL: http://host.docker.internal:12434/engines/v1
      LLM_MODEL: ai/llama3.2
      LLM_API_KEY: ignored
    volumes:
      - ~/.automatey:/home/automatey/.automatey

docker compose run --rm automatey chat

Available DMR models

See docker model list or hub.docker.com/u/ai. Common choices:

| Model | Size | Use case | |---|---|---| | ai/llama3.2 | 2GB | Fast, general purpose | | ai/smollm2 | 500MB | Lightweight, low RAM | | ai/phi4-mini | 4GB | Strong reasoning | | ai/qwen2.5-coder | 5GB | Code tasks |

DMR uses llama.cpp under the hood — CPU-only works, GPU (NVIDIA/Apple Silicon) is used automatically when available.

Development

npm run dev          # tsx watch (no build needed)
npm run build        # tsc + build all MCP servers
npm test             # 189 tests (Vitest)
npm run test:watch   # watch mode

VS Code tasks (Ctrl+Shift+B / Ctrl+Shift+P → Run Task):

Build: All
Run: CLI (automatey) — builds first, then launches
Run: CLI (dev, no build) — tsx, faster iteration
Test: All
Test: Hello World (verbose)

Tests

Tests  189 passed | 2 skipped
  ├── unit/           chat-engine, coder-server, command-parser, config-manager,
  │                   context-manager, eval-runner, llm-client,
  │                   markdown-rendering, mcp-config-manager,
  │                   session-manager, skills-manager, sub-agent
  ├── integration/    openai, anthropic, nemotron, planner,
  │                   coder-hello-world (all 3 providers), cli-message,
  │                   markdown-streaming
  └── mcp/agent/      config, runner (unit + functional), e2e server

The coder-hello-world integration tests drive a full ReAct loop per provider — LLM writes index.js, executes it, output is verified. Results live in sandbox/<provider>/:

[openai]    execute_command output: Hello, World!
[anthropic] execute_command output: Hello, World!
[nemotron]  execute_command output: Hello, World!

Artwork

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme