automatey
v0.1.16
Published
Lean & mean MCP-powered CLI agent — Nemotron, OpenAI, Anthropic, Perplexity
Maintainers
Readme
Automatey is a minimal, MCP-powered CLI agent that lets any LLM wield real tools — file ops, shell commands, web search, memory, planning — through a clean ReAct loop. No bloat. No framework lock-in. Just a sharp hook and a fast ship.
══════════════════════════════════════════════════════════════════════
🤖 automatey — lean & mean agent
══════════════════════════════════════════════════════════════════════
Provider: nemotron
Model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
Session: session-2026-03-12
Sandbox: /workspace/my-project/sandbox
Type /help for commands | Ctrl+C to exit
Logs: tail -f ~/.automatey/logs/agent.log
══════════════════════════════════════════════════════════════════════Features
- Providers: Nemotron (vLLM/OpenAI-compatible), OpenAI, Anthropic, Perplexity
- MCP tools: Any stdio or HTTP MCP server; auto-loaded from
./mcp.json,~/.automatey/mcp.json, or bundledmcp.json - ReAct loop: Up to 50 tool-call rounds per message (configurable)
- Chain-of-thought:
/thinktoggle for Nemotron / Anthropic reasoning tokens - Sandbox: Isolated directory with timestamped subdirectories for agent file I/O and code execution; configurable via
--sandboxorSANDBOX_MODEenv var - Sub-agents:
run_subagentbuilt-in tool spawns stateless nested ReAct loops with full MCP access; supportsAgentConfigdeclarations for reusable agent definitions - Ag-loop: Iterative eval-and-refine runner — runs a worker, judges output via LLM-as-judge, refines instructions, repeats until score threshold is met
- LLM-as-Judge: Structured evaluation via
generateObject()with Zod schemas; supportson-failure/alwaysmodes with Perplexity, OpenAI, Anthropic, or Nemotron as judge - Model builder: Shared
buildModel()utility constructs AI SDK model instances from a simple{ provider, model }spec — used by judge, refiner, and ag-loop - Prompt refiner: Rewrites agent instructions using judge feedback; enforces brevity constraints for smaller worker models
- Sessions: Save/load conversation sessions in
~/.automatey/sessions/ - Checkpoints:
/checkpoint— full conversation snapshots with BM25 keyword search - Auto-compact: LLM summarises older context when usage ≥ 90%
- Skills: Progressive SKILL.md loading from the canonical universal path
.agents/skills/, plus extra directories viaAUTOMATEY_SKILLS_DIRS - Eval system:
/eval <file.jsonl>runs MCQ, keyword, and regex benchmarks with per-category accuracy reports - Planner MCP: Bundled
mcp/planner— todos + plans, prompts for task breakdown - Coder MCP: Bundled
mcp/coder— read/write/edit files, run commands, glob files, prompts for analysis - Agent MCP: Bundled
mcp/agent— expose automatey as an MCP tool server (run tasks, check status, sampling, prompts, resources)
Quick Start
Install from npm (recommended)
npm install -g automatey
automatey chatOr run without installing:
npx automatey chat
npx automatey "summarise this repo"
npx ay "what time is it?"From source
git clone https://github.com/automatey-org/automatey.git
cd automatey
npm install
cp mcp.example.json mcp.json # edit to add your API keys
npm run build
node dist/index.js chatOverride provider, model, and sandbox:
node dist/index.js chat --provider openai --model gpt-4o --sandbox ./my-sandboxNon-interactive one-shot (quiet by default, --verbose to see diagnostics):
node dist/index.js chat --message "List all TODO comments in this repo"
node dist/index.js chat --message "Summarise this file" --verboseCLI Installation
Option A — npm link (recommended)
npm run build
npm run link:cli # registers "automatey" globally via symlinkAfter linking:
automatey chat
automatey --helpUnlink: npm run unlink:cli
Option B — Shell alias (ay)
./scripts/setup-alias.sh # adds 'ay' alias to ~/.bashrc (or ~/.zshrc / fish)Or manually:
alias ay="automatey" # bash / zsh
abbr -a ay automatey # fishPowerShell (add to $PROFILE):
Set-Alias -Name ay -Value automateyAfter setup:
ay chat # interactive session
ay chat --message "do something" # one-shot
ay "do something" # one-shot shorthand
ay --helpOption C — Docker alias (ay)
For environments without Node.js, run automatey via Docker and alias it:
bash / zsh (add to ~/.bashrc or ~/.zshrc):
ay() {
docker run -it --rm \
--env-file "${AUTOMATEY_ENV:-$HOME/.automatey/.env}" \
-v "$HOME/.automatey:/home/automatey/.automatey" \
-v "$(pwd):/work" -w /work \
maxgolovanov/automatey "$@"
}PowerShell (add to $PROFILE):
function ay {
docker run -it --rm `
--env-file "$env:USERPROFILE\.automatey\.env" `
-v "$env:USERPROFILE\.automatey:/home/automatey/.automatey" `
-v "${PWD}:/work" -w /work `
maxgolovanov/automatey @args
}fish (add to ~/.config/fish/functions/ay.fish):
function ay
docker run -it --rm \
--env-file "$HOME/.automatey/.env" \
-v "$HOME/.automatey:/home/automatey/.automatey" \
-v (pwd):/work -w /work \
maxgolovanov/automatey $argv
endUsage is identical to a native install:
ay # interactive chat, history persisted
ay "summarise this repo" # one-shot against the current directoryHow persistence works: The
-v ~/.automatey:/home/automatey/.automateymount shares the host's~/.automateydirectory with the container. Everything the agent stores there — chat history, sessions, config, checkpoints, logs — survives across container runs.
The ~/.automatey data directory
| Path | Contents |
|------|----------|
| ~/.automatey/config.json | LLM provider, model, base URL |
| ~/.automatey/mcp.json | Global MCP server definitions |
| ~/.automatey/mcp.chat.json | Chat-scoped subset of MCP servers |
| ~/.automatey/.env | API keys and env overrides (Docker --env-file) |
| ~/.automatey/history | Persistent readline history (up to 500 entries) |
| ~/.automatey/sessions/ | Saved conversation sessions |
| ~/.automatey/checkpoints/ | Full conversation snapshots |
| ~/.automatey/planner/ | Planner MCP server data |
| ~/.automatey/logs/ | Agent log files |
MCP Config — mcp.json
The agent looks for MCP config in this order:
./mcp.json(current working directory / project root)~/.automatey/mcp.json(global fallback)- Bundled
mcp.jsonfrom the package install directory (npx / global install)
Copy mcp.example.json from this repo as your starting point:
cp mcp.example.json mcp.json # project-local
# OR
cp mcp.example.json ~/.automatey/mcp.json # globalPrefer strict JSON with no comments. Field descriptions and examples live in docs/MCP-CONFIG.md.
Example mcp.json:
{
"mcpServers": {
"planner": {
"transport": "stdio",
"command": "node",
"args": ["./mcp/planner/dist/server.js"]
},
"brave-search": {
"transport": "stdio",
"command": "npx",
"args": ["-y", "@brave/brave-search-mcp-server", "--transport", "stdio"],
"env": { "BRAVE_API_KEY": "${env:BRAVE_API_KEY}" },
"requiresEnv": "BRAVE_API_KEY"
},
"memento": {
"transport": "http",
"url": "http://localhost:3500/mcp",
"portCheck": true
}
}
}Config
Config lives in ~/.automatey/config.json (auto-created on first run):
{
"provider": "nemotron",
"defaultModel": "auto",
"llm": {
"baseUrl": "http://localhost:8002"
}
}For nemotron, defaultModel may be:
- an exact model ID
autoto use the first model returned by/v1/models- a wildcard mask such as
nvidia/NVIDIA-Nemotron-3-Super*
Environment variables (copy .env.defaults → .env to override):
| Variable | Default | Description |
|---|---|---|
| LLM_PROVIDER | nemotron | nemotron | openai | anthropic | perplexity |
| LLM_MODEL | auto | Model ID, auto, or wildcard mask for nemotron |
| LLM_BASE_URL | http://localhost:8002 | vLLM / OpenAI-compatible endpoint |
| LLM_MAX_TOKENS | 200000 | Context window budget (chars, ~4/token) |
| LLM_MAX_OUTPUT_TOKENS | 16384 | Per-call output limit |
| TEMPERATURE | 0.6 | Sampling temperature |
| AGENT_MAX_TOOL_ROUNDS | 50 | Max ReAct rounds |
| AGENT_MAX_EMPTY_RETRIES | 2 | Retries on empty LLM response |
| AGENT_COMPACT_THRESHOLD | 0.9 | Auto-compact at 90% context fill |
Commands
| Command | Description |
|---------|-------------|
| /help | Show all commands |
| /model | List / switch model |
| /think [on\|off\|budget N] | Toggle CoT reasoning |
| /save [name] | Save session |
| /load [name] | Load session |
| /servers | Manage MCP connections |
| /config | Show config |
| /cost | Show estimated token usage |
| /compact | Manually compact context via LLM summarization |
| /checkpoint [save\|list\|restore N\|search q\|delete N] | Manage checkpoints |
| /eval <file.jsonl> | Run a JSONL eval file against the current LLM |
| /clear | Clear context |
| /exit | Quit |
Context Management
Auto-compact
When estimated context usage reaches AGENT_COMPACT_THRESHOLD (default 90%), the older portion of the conversation is automatically summarized by the LLM and replaced with a concise summary message. This keeps the token count manageable without discarding knowledge.
Disable per-session: the /compact command can be used to trigger compaction manually at any time.
Checkpoints
Checkpoints are full conversation snapshots saved to ~/.automatey/checkpoints/ as JSON:
/checkpoint save # save current conversation
/checkpoint list # list checkpoints (newest first)
/checkpoint restore 2 # restore checkpoint #2 into context
/checkpoint search "bm25" # BM25 keyword search across all checkpoints
/checkpoint delete 3 # delete checkpoint #3The BM25 search indexes the full message history of every checkpoint and ranks them by keyword relevance.
Sub-Agents
Sub-agents let the main agent delegate focused tasks to a nested ReAct loop — with its own tools, model, and system prompt. The parent spawns the sub-agent, waits for its result, and continues.
How it works
The LLM calls the built-in run_subagent tool:
run_subagent(
description: "Write and test a Node.js hello-world script",
prompt: "Create hello.js that prints Hello World, run it with node, and confirm the output.",
agentName: "nemotron-coder" // optional — uses a named .agent.md definition
)The sub-agent runs its own ReAct loop (up to maxToolRounds) and returns a plain-text result to the main agent.
Named agents — .agent.md files
Define reusable agents as .agent.md files in .agents/agents/ (project-local) or ~/.agents/agents/ (global). Format is YAML frontmatter + system prompt body:
---
name: nemotron-coder
description: >
Code generation sub-agent on the local Nemotron vLLM instance.
Use for writing, editing, and running code — offloads code tasks
to a local model without burning cloud API quota.
model: auto
provider: nemotron
baseUrl: http://192.168.0.58:8002/v1
tools:
- read_file
- write_file
- execute_command
- glob_files
- list_dir
- search_text
- edit_file
maxToolRounds: 12
---
You are a code-generation agent running on a local Nemotron model.
Write complete, working code and save it to disk.
Use execute_command to run and verify output. Never truncate code.Place in:
.agents/agents/nemotron-coder.agent.md— project-local (resolved fromcwdat runtime)~/.agents/agents/nemotron-coder.agent.md— global (available in every directory)
Extra directories: set AUTOMATEY_AGENTS_DIRS (path-separator-delimited) to add more search paths.
Frontmatter fields
| Field | Description |
|---|---|
| name | Agent identifier (falls back to filename stem) |
| description | When to use this agent — shown to the LLM in the system prompt |
| model | Model ID, auto, or wildcard mask (e.g. *Super*, nvidia/*) |
| provider | nemotron | openai | anthropic | perplexity |
| baseUrl | Base URL for OpenAI-compatible endpoints |
| apiKey | API key — supports ${ENV_VAR} interpolation |
| tools | Allowlist of tool names (absent = inherit all parent tools) |
| mcpServers | Allowlist of MCP server names (absent = inherit all) |
| maxToolRounds | Max ReAct rounds (absent = parent's setting) |
Model resolution
For nemotron agents, model supports the same resolution as the main config:
model: auto # picks the first model from /v1/models
model: "*Super*" # glob-matches the first model containing "Super"
model: nvidia/* # any nvidia/ model
model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 # exact IDautomatey probes the baseUrl's /v1/models endpoint at sub-agent spawn time to resolve wildcards.
Listing agents
/agents list # show all discovered agents + scope (project / global)
/agents info <name> # show frontmatter + system prompt for an agent
/agents help # usage and file format referenceExample prompts
Write a CLI tool in Go that downloads a URL to a file — use nemotron-coderRun nemotron-coder to generate a Python FastAPI stub for a todo list API, then run it and show me the /docs URLDelegate to nemotron-coder: read package.json, bump the patch version, and commit itGeneric sub-agents (no .agent.md)
Omit agentName and the parent's own LLM client and full tool set are used:
Split this task into three parallel sub-agents: one for backend tests,
one for frontend tests, one for linting. Run each and summarise.Sub-agents are stateless — they do not share session, context, or compaction with the parent. Each sub-agent starts fresh with an empty context containing only its system prompt and the prompt passed by the parent.
Built-in MCP Servers
🗂 Coder (mcp/coder)
| Tool | What it does |
|------|-------------|
| read_file | Read file contents with optional line range |
| write_file | Write / create a file |
| edit_file | Replace an exact string in a file |
| execute_command | Run a shell command (default cwd: sandbox) |
| search_text | Grep-style text search |
| list_dir | List directory contents |
| glob_files | Find files by glob pattern (**/*.ts, src/**) |
Prompts: analyze-file, search-and-replace, scaffold
📋 Planner (mcp/planner)
Todos and multi-step plans persisted to ~/.automatey/planner/.
Prompts: break-down-task, daily-standup, project-plan
🤖 Agent (mcp/agent)
Exposes automatey itself as an MCP tool server — lets a host (e.g. VS Code Copilot) delegate tasks to automatey.
Tools:
| Tool | What it does |
|------|-------------|
| automatey_run_task | Run a task: spawns automatey as a child process with the given prompt, model, provider, sandbox, and timeout |
| automatey_status | Check LLM reachability and return config info |
| automatey_sample | Request an LLM completion from the connected client via MCP sampling |
Prompts: code-review, refactor, write-tests, explain, run-task
Resources: automatey://config (agent config), automatey://health (LLM reachability)
Sampling: The automatey_sample tool uses MCP sampling to request completions from the client's LLM. The client must advertise sampling capability.
Logging: Diagnostic messages sent to the client via MCP logging notifications.
All configuration comes from environment variables — zero hardcoded defaults:
| Variable | Required | Description |
|---|---|---|
| AUTOMATEY_CMD | ✅ | Command to invoke automatey (e.g. npx tsx /path/to/src/index.ts) |
| LLM_BASE_URL | ✅ | vLLM / OpenAI-compatible endpoint |
| LLM_MODEL | ✅ | Model ID |
| LLM_PROVIDER | ✅ | nemotron | openai | anthropic | perplexity |
| AUTOMATEY_TIMEOUT | | Default task timeout in ms (default: 300000) |
The agent MCP server has its own test suite (23 tests — unit, functional, e2e) in mcp/agent/tests/.
Docker
Run as a container
# Interactive chat (with persistent history + config)
docker run -it --rm \
-e LLM_PROVIDER=openai \
-e OPENAI_API_KEY=sk-... \
-v ~/.automatey:/home/automatey/.automatey \
maxgolovanov/automatey chat
# One-shot (non-interactive)
docker run --rm \
-e LLM_PROVIDER=openai \
-e OPENAI_API_KEY=sk-... \
maxgolovanov/automatey --message "Summarise this repo" --verbosePersistent data: The
-v ~/.automatey:/home/automatey/.automateyvolume mount persists history, sessions, checkpoints, config, and logs between container runs. See The~/.automateydata directory for what's stored there.
Using Docker Model Runner as the LLM backend
Docker Model Runner (DMR) lets you run LLMs locally via an OpenAI-compatible API on localhost:12434. It ships with Docker Desktop and Docker Engine 28+ — no API key needed.
# Pull and start a model on the host first
docker model pull ai/llama3.2The networking challenge: when automatey runs in a container localhost:12434 resolves inside the container (nowhere). You need to point it at the host.
Option A — host.docker.internal (Docker Desktop on Mac/Windows, Docker Engine 28+ on Linux)
docker run -it --rm \
-e LLM_PROVIDER=openai \
-e LLM_BASE_URL=http://host.docker.internal:12434/engines/v1 \
-e LLM_MODEL=ai/llama3.2 \
-e LLM_API_KEY=ignored \
--add-host=host.docker.internal:host-gateway \
maxgolovanov/automatey chat
--add-host=host.docker.internal:host-gatewayis required on Linux Docker Engine (it's automatic on Docker Desktop).
Option B — --network host (Linux only, simplest)
docker run -it --rm \
--network host \
-e LLM_PROVIDER=openai \
-e LLM_BASE_URL=http://localhost:12434/engines/v1 \
-e LLM_MODEL=ai/llama3.2 \
-e LLM_API_KEY=ignored \
maxgolovanov/automatey chatOption C — Docker Compose (recommended for persistent setups)
# docker-compose.yml
services:
automatey:
image: maxgolovanov/automatey
stdin_open: true
tty: true
extra_hosts:
- "host.docker.internal:host-gateway" # Linux only; remove on Docker Desktop
environment:
LLM_PROVIDER: openai
LLM_BASE_URL: http://host.docker.internal:12434/engines/v1
LLM_MODEL: ai/llama3.2
LLM_API_KEY: ignored
volumes:
- ~/.automatey:/home/automatey/.automateydocker compose run --rm automatey chatAvailable DMR models
See docker model list or hub.docker.com/u/ai. Common choices:
| Model | Size | Use case |
|---|---|---|
| ai/llama3.2 | 2GB | Fast, general purpose |
| ai/smollm2 | 500MB | Lightweight, low RAM |
| ai/phi4-mini | 4GB | Strong reasoning |
| ai/qwen2.5-coder | 5GB | Code tasks |
DMR uses llama.cpp under the hood — CPU-only works, GPU (NVIDIA/Apple Silicon) is used automatically when available.
Development
npm run dev # tsx watch (no build needed)
npm run build # tsc + build all MCP servers
npm test # 189 tests (Vitest)
npm run test:watch # watch modeVS Code tasks (Ctrl+Shift+B / Ctrl+Shift+P → Run Task):
Build: AllRun: CLI (automatey)— builds first, then launchesRun: CLI (dev, no build)— tsx, faster iterationTest: AllTest: Hello World (verbose)
Tests
Tests 189 passed | 2 skipped
├── unit/ chat-engine, coder-server, command-parser, config-manager,
│ context-manager, eval-runner, llm-client,
│ markdown-rendering, mcp-config-manager,
│ session-manager, skills-manager, sub-agent
├── integration/ openai, anthropic, nemotron, planner,
│ coder-hello-world (all 3 providers), cli-message,
│ markdown-streaming
└── mcp/agent/ config, runner (unit + functional), e2e serverThe coder-hello-world integration tests drive a full ReAct loop per provider —
LLM writes index.js, executes it, output is verified. Results live in sandbox/<provider>/:
[openai] execute_command output: Hello, World!
[anthropic] execute_command output: Hello, World!
[nemotron] execute_command output: Hello, World!Artwork
Logos in extra/logo/ are from the Automatey terminal project,
licensed CC BY 4.0 — Copyright © 2024–2026 Top-5 And Contributors.
Used here with attribution as permitted by the license.
License
MIT — see LICENSE.
