@tjamescouch/gro
v1.7.16
Published
Provider-agnostic LLM runtime with context management
Downloads
4,090
Readme
gro
Provider-agnostic LLM agent runtime with virtual memory and context management.
gro is a headless CLI that runs persistent agent loops against any LLM provider, with automatic context management, MCP tool-use, and AgentChat integration.
Install
npm install -g @tjamescouch/groRequires Node.js 18+.
Quick start
# One-shot prompt (Anthropic by default)
export ANTHROPIC_API_KEY=sk-...
gro "explain the CAP theorem in two sentences"
# Interactive conversation with virtual memory
gro -i
# Use OpenAI
export OPENAI_API_KEY=sk-...
gro -m gpt-4o "hello"
# Pipe mode
echo "summarize this" | gro -p
# Resume last session
gro -i -cProviders
| Provider | Models | Env var |
|----------|--------|---------|
| Anthropic (default) | claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4 | ANTHROPIC_API_KEY |
| OpenAI | gpt-4o, o3, gpt-4o-mini | OPENAI_API_KEY |
| Local | llama3, mistral, qwen, etc. | none (Ollama / LM Studio) |
Provider is auto-inferred from model name. -m claude-sonnet-4-5 → Anthropic. -m gpt-4o → OpenAI.
Virtual Memory
gro includes a swim-lane VirtualMemory system that manages context as a sliding window, allowing agents to work with arbitrarily long histories without burning tokens on stale context.
# Enable virtual memory (default in persistent mode)
gro -i --gro-memory virtual
# Explicit simple mode (unbounded buffer, no paging)
gro -i --gro-memory simpleHow it works:
- Messages are partitioned into swim lanes: assistant / user / system / tool
- When working memory exceeds the high watermark, old messages are summarized and paged to disk
- Summaries include `` markers — load any page back with a marker
- High-importance messages (tagged ``) survive compaction
- Summarization uses a configurable cheaper model
# Use Haiku for compression, Sonnet for reasoning
gro -i -m claude-sonnet-4-5 --summarizer-model claude-haiku-4-5Extended Thinking
Control model reasoning depth dynamically with the `` stream marker. The thinking level selects the model tier and allocates extended thinking tokens (Anthropic only).
gro -i -m claude-sonnet-4-5 "solve this complex problem"
# Agent can emit to escalate to Opus with high thinking budget| Level | Tier | Use case | |-------|------|----------| | 0.0–0.24 | Haiku | Fast, cheap — formatting, lookups, routine transforms | | 0.25–0.64 | Sonnet | Balanced — most tasks requiring judgment or code | | 0.65–1.0 | Opus | Deep reasoning — architecture, when stuck, low confidence |
The thinking budget decays ×0.6 per idle round unless renewed with another `` marker. Agents naturally step down from Opus to Haiku when not actively working complex problems.
Token reservation (v1.5.10): 30% of max_tokens is reserved for completion output to prevent truncation. Example: maxTokens=4096, thinking=0.8 → ~2293 thinking tokens, ~1803 output tokens.
Prompt Caching
Anthropic prompt caching is enabled by default. System prompts and tool definitions are cached automatically, reducing cost by ~90% on repeat calls. Cache hits are logged: [cache read:7993].
Disable with --no-prompt-caching.
Batch Summarization
When enableBatchSummarization is set, context compaction queues summarization requests to the Anthropic Batch API (50% cost discount, async). The agent continues immediately with a placeholder summary. A background worker polls for completion and updates pages on disk.
Stream Markers
gro parses inline @@marker@@ directives from model output and acts on them:
| Marker | Effect |
|--------|--------|
| | Hot-swap to a different model mid-conversation |
| | Set thinking level — controls model tier and thinking budget |
| | Tag message importance (0–1) for compaction priority |
| `@@important@@` | Line is reproduced verbatim in all summaries |
| `@@ephemeral@@` | Line may be omitted from summaries |
| | Load a paged memory block into context |
| `` | Release a loaded page |
Markers are stripped before display — users never see them. Models use them as a control plane.
MCP Support
gro discovers MCP servers from Claude Code's config (~/.claude/settings.json) automatically. Provide an explicit config with --mcp-config.
gro --mcp-config ./my-servers.json "use the filesystem tool to list files"
gro --no-mcp "no tools"AgentChat Integration
Run gro as a persistent agent connected to an AgentChat network:
gro -i --persistent --system-prompt-file _base.md --mcp-config agentchat-mcp.jsonPersistent mode (--persistent) keeps the agent in a continuous tool-calling loop. If the model stops calling tools, gro injects a system nudge to resume listening. The agent loops indefinitely: agentchat_listen → process messages → respond → agentchat_listen.
An external process manager (systemd, supervisor, etc.) maintains the gro process lifecycle. Auto-save triggers every 10 tool rounds to survive crashes.
Shell Tool
Enable a built-in shell tool for executing commands:
gro -i --bash "help me debug this"Commands run with a 120s timeout and 30KB output cap. The tool must be explicitly enabled — it is not available by default.
Built-in Tools
These tools are always available (no flags required):
| Tool | Description |
|------|-------------|
| Read | Read file contents with optional line range |
| Write | Write content to a file (creates dirs) |
| Glob | Find files by glob pattern (.gitignore aware) |
| Grep | Search file contents with regex |
| apply_patch | Apply unified patches to files |
| gro_version | Runtime identity and version info |
| memory_status | VirtualMemory statistics |
| compact_context | Force immediate context compaction |
Options
-P, --provider openai | anthropic | local (default: anthropic)
-m, --model model name (auto-infers provider)
--base-url API base URL override
--system-prompt system prompt text
--system-prompt-file read system prompt from file
--append-system-prompt append to system prompt
--append-system-prompt-file append system prompt from file
--context-tokens working memory budget in tokens (default: 8192)
--max-turns max tool rounds per turn (default: 10)
--summarizer-model model for context summarization
--gro-memory virtual | simple (default: virtual in -i mode)
--mcp-config MCP servers config (JSON file or string)
--no-mcp disable MCP server connections
--no-prompt-caching disable Anthropic prompt caching
--bash enable built-in shell tool
--persistent persistent agent mode (continuous loop)
--output-format text | json | stream-json (default: text)
-p, --print print response and exit (non-interactive)
-c, --continue continue most recent session
-r, --resume [id] resume session by ID
-i, --interactive interactive conversation mode
--verbose verbose output
-V, --version show version
-h, --help show helpSession Persistence
Sessions are saved to .gro/context/<session-id>/:
.gro/
context/
a1b2c3d4/
messages.json # full message history
meta.json # model, provider, timestamps
pages/ # VirtualMemory paged summariesResume with -c (most recent) or -r <id> (specific). Disable with --no-session-persistence.
Architecture
src/
main.ts # CLI entry, flag parsing, agent loop
session.ts # Session persistence and tool-pair sanitization
errors.ts # Typed error hierarchy (GroError)
logger.ts # Logger with ANSI color support
stream-markers.ts # Stream marker parser and dispatcher
drivers/
anthropic.ts # Native Anthropic Messages API driver (no SDK)
streaming-openai.ts # OpenAI-compatible streaming driver
types.ts # ChatDriver interface, message types
batch/
anthropic-batch.ts # Anthropic Batch API client
memory/
agent-memory.ts # AgentMemory interface
virtual-memory.ts # Swim-lane paged context (VirtualMemory)
simple-memory.ts # Unbounded buffer (SimpleMemory)
advanced-memory.ts # Extended memory with vector index
summarization-queue.ts # Queue for async batch summarization
summarizer-prompt.md # Externalized summarizer system prompt
batch-worker.ts # Background batch summarization worker
batch-worker-manager.ts # Worker lifecycle manager
mcp/
client.ts # MCP client manager
tools/
bash.ts # Built-in shell tool (--bash flag)
read.ts # File reader
write.ts # File writer
glob.ts # Glob file finder
grep.ts # Regex content search
agentpatch.ts # Unified patch application
version.ts # gro_version introspection tool
memory-status.ts # VirtualMemory stats tool
compact-context.ts # Manual compaction trigger
utils/
rate-limiter.ts # Token bucket rate limiter
timed-fetch.ts # Fetch with configurable timeout
retry.ts # Exponential backoff retry logicDevelopment
git clone https://github.com/tjamescouch/gro.git
cd gro
npm install
npm run build
npm testLicense
MIT
For Agents
See _base.md for boot context and stream marker reference.
