rune-agent
v0.1.1
Published
Autonomous AI agent platform with multi-channel support, proactive execution, and cross-session memory
Readme
ᚱ RUNE
Autonomous AI agent that codes, runs, verifies, and remembers.
Plan-execute-observe-correct loop with OS-level sandboxing, cross-session memory, and multi-channel deployment.
Quick Start · What RUNE Does · How It Works · Docs · Contributing

What Problem Does RUNE Solve?
AI coding agents have gotten good at reading code and suggesting changes. But there's a gap between "here's what I'd do" and actually getting it done reliably:
- Context limits end your session. A large refactoring task hits the token ceiling and the agent stops. You start over, re-explain everything, and hope it picks up where it left off.
- Every session starts from zero. The agent doesn't remember what it learned about your codebase yesterday. You re-explain your conventions, your architecture, your preferences — every time.
- "Done" doesn't mean done. The agent writes code and declares success without running tests or verifying the build. You discover it's broken after the session ends.
- Agents stall silently. The agent reads the same file over and over, runs the same failing command in a loop, burns tokens — and there's no circuit breaker.
RUNE is an autonomous agent built around solving these four problems. It continues past context limits via deterministic checkpointing, maintains memory across sessions, enforces execution evidence before declaring completion, and detects stalls with physical tool removal.
It also does more — multi-channel deployment (Telegram, Discord, Slack, ...), multi-agent orchestration, browser automation, cron scheduling — but the core value proposition is: an agent loop that doesn't lie, doesn't forget, and doesn't give up at 200K tokens.
Status
RUNE is alpha software. It's in active development with 93K LOC, 1886 tests across 240 test files, and 46 built-in tools. The core agent loop, safety system, and memory architecture are functional and tested. But:
- There is no npm package yet (install from source)
- APIs may change between versions
- Some channel adapters are more mature than others (TUI and Telegram are the most tested)
- Performance benchmarks are internal — we publish the test code, not polished benchmark suites
If you're looking for production-ready stability, this isn't there yet. If you want to explore what an agent loop looks like when you engineer seriously around context limits, completion verification, and stall detection — read on.
Quick Start
git clone https://github.com/rune-ai/rune.git
cd rune && pnpm install && pnpm build && pnpm link --global# At least one LLM provider required
rune env set OPENAI_API_KEY "sk-..."
# and/or
rune env set ANTHROPIC_API_KEY "sk-ant-..."rune # Interactive TUI
rune exec "analyze this repo" --verbose # One-shot execution
rune web # Web dashboard + daemonPrerequisites: Node.js >= 18, pnpm. Browser tools require
npx playwright install chromium.
What RUNE Does
The Agent Loop
RUNE runs a plan-execute-observe-correct loop built on AI SDK v6. Each step goes through:
Goal classification → Prompt assembly → LLM call → Guardian check
→ Sandbox execution → Cache check → Stall detection → Budget check
→ Evidence tracking → Continue or completeThis isn't a thin wrapper around generateText. It's an orchestration layer that validates, optimizes, and enforces at every step.
46 Built-in Tools
| Group | Tools | Notes |
|---|---|---|
| file | read write edit delete list search | Structured output, optional regex |
| bash | sandboxed shell | Guardian validation + OS sandbox |
| browser | navigate observe act batch extract screenshot ... | Playwright, with relay mode for real browser sessions |
| code | analyze findDef findRefs impact | TypeScript Compiler API + SQLite code graph |
| web | search fetch | Brave API + URL→markdown |
| memory | search save | Vector + keyword hybrid, cross-session |
| delegate | task orchestrate | Sub-agent or multi-agent pipeline |
| project | map | Aider-style repo map with ref counts |
| cron | create list delete | Scheduled autonomous tasks |
| mcp | dynamic | Any MCP server → native capability |
Plus think, ask_user, task.*, skill.*, and more. 5 policy profiles (safe → full) control which tools are available.
Multi-Channel
Run the same agent from TUI, Web UI, Telegram, Discord, Slack, Mattermost, LINE, WhatsApp, or Google Chat. Cross-channel identity resolution lets you start a task in one channel and continue in another — same memory, same conversation context.
LLM Providers
OpenAI (gpt-5.2), Anthropic (claude-sonnet-4-5), Ollama (any local model). Task-aware router picks the right model based on complexity and risk. 3-level automatic failover. Override in ~/.rune/config.yaml.
How It Works
This section explains the engineering behind RUNE's four core promises. All mechanisms described below have corresponding test suites — paths are linked so you can read and run them yourself.
1. Deterministic Checkpoint — Past Context Limits
When the token budget reaches 80%, RUNE doesn't ask the LLM to summarize what it's done. Instead, the host extracts a checkpoint from its own state — zero LLM calls, < 1ms CPU:
Modified files (max 128) + recent tool calls (max 15) + execution evidence
+ hard failures + stall state + last natural language response (max 1500 chars)
→ Serialized checkpoint → New segment starts with this contextThe insight: the host already tracks everything in closure variables. Modified file lists, tool call history, evidence counters, stall state — it's all there. Asking the LLM to summarize is slow, expensive, and lossy.
4-phase wind-down prevents abrupt cuts:
| Budget | Action | |---|---| | 70% | Snapshot to disk (non-blocking, fire-and-forget) | | 80% | Deterministic checkpoint → new segment | | 90% | Emergency: immediate checkpoint | | 97% | Hard stop: all tools disabled |
Up to 5 seamless continuations. The agent doesn't know it was interrupted.
Source:
src/agent/checkpoint.tsTests:tests/unit/agent/checkpoint.test.ts(13 tests),tests/unit/agent/loop-rollover.test.ts(5 tests)
2. Execution Evidence Gate — No Fake Completions
Every tool call is classified into 5 evidence categories: reads, writes, executions, verifications, browser actions.
When generateText completes, RUNE checks: did the task require code (requiresCode || complexity !== 'simple')? If yes, but writes + executions + browserActions === 0 — the model planned but never executed. RUNE forces continuation with an [EXECUTION EVIDENCE GATE] system message, up to 50 additional steps.
Still no execution evidence after forced continuation? → success: false. Honest failure, not fake success.
This also tracks requirement-level traces for auditable reporting — each gate decision records what was expected, what was found, and why it passed or failed.
Source:
src/agent/completion-gate.ts,src/agent/evidence-gate.tsTests:tests/unit/agent/execution-intent-gate.test.ts
3. Stall Detection — Physical Tool Removal
Four layers detect non-productive loops:
| Layer | Detection | Enforcement |
|---|---|---|
| Sequential | Same tool 4-5+ consecutive calls | Nudge message (max 3/session) |
| Cumulative | file.read 15+ total across session | Warning → hard stop at 18 |
| Bash | Same command 3+ times, or 2+ slow commands (>30s avg) | Tool removed from active set |
| Failure | Same error signature 4+ times | SYSTEM STOP |
"Physical enforcement" means the tool is removed from activeTools — the model literally cannot call it. This is stronger than any prompt-level instruction because the tool doesn't exist in the schema anymore.
Bash commands are normalized via extractBashIntent() (heredocs get djb2-hashed) to catch semantically identical but syntactically different repeats.
Source:
src/agent/loop.ts—StallState,buildToolSet(),prepareStep()
4. Cross-Session Memory — Three Tiers
| Tier | Mechanism | Trigger |
|---|---|---|
| Auto-inject | Top-K similar episodes (vector + keyword hybrid) injected into system prompt | Every request, before the agent starts |
| Agent tools | memory.search / memory.save — the agent actively queries and stores mid-task | Agent's own judgment |
| User Model | Work profile (languages, tools, hours), autonomy preferences, long-term goals | Accumulated over time |
Goal-aware budgeting prevents irrelevant memory from polluting context:
| Goal Type | Context Budget | |---|---| | Casual chat | 12K tokens | | Web lookup | 20K | | Default | 24K | | Complex coding | 36K |
3-layer casual defense: unrecognized intent → domain='general', similarity threshold 0.4, fresh chat gate blocks injection entirely. Your React project history won't leak into "what's the weather" conversations.
Source:
src/memory/,src/agent/memory-bridge.tsTests:tests/unit/agent/memory-bridge-context.test.ts
Additional Systems
Safety — Three Independent Layers
Guardian: Every bash command analyzed twice — raw and normalized (hex escapes decoded, $HOME/~ expanded). Higher risk score wins. Blocks git destructive ops (--force, reset --hard, branch -D), SQL destructive (DROP, TRUNCATE), encoding bypass attempts.
OS Sandbox: Commands execute inside macOS Seatbelt (sandbox-exec) or Linux bubblewrap (bwrap). Kernel-enforced path restrictions — writable paths, readable paths, blocked paths, network access are all configurable.
Cross-Channel Approval: High-risk commands trigger native approval UI (TUI dialog, Telegram InlineKeyboard, Discord ActionRow). 60s timeout → auto-deny.
Source:
src/safety/
Token Optimization
Multiple CPU-only (zero LLM cost) techniques reduce token usage:
| Technique | What it does | Source |
|---|---|---|
| Goal-aware prompt sectioning | Chat skips CODE/WEB/BROWSER prompt sections | prompts.ts |
| Tool subset filtering | Chat: 4 tools, web: 9, full: 46 | tool-adapter.ts |
| Cognitive Cache | Repeated file reads return compact preview (head 3 + tail 3 lines). Invalidated on write | cognitive-cache.ts |
| Observation masking | 3-tier step-distance masking (inspired by arXiv:2508.21433) | loop.ts |
| Phase-adaptive window | Masking window adjusts by phase (exploration/implementation/verification) | loop.ts |
| Active tools reduction | Step 6+, unused tools hidden from schema | loop.ts |
Benchmark methodology: Token savings are measured in tests/unit/agent/token-savings-benchmark.test.ts using three simulated session scenarios (13-step code analysis, 8-step browser session, 3-step chat). The test uses countTokens() from gpt-tokenizer on raw vs. masked message arrays. Run pnpm test -- tests/unit/agent/token-savings-benchmark.test.ts to see results with your own setup.
Multi-Agent Orchestration
Complex tasks can be decomposed across 4 specialized roles:
| Role | Access | Max Steps | Timeout | |---|---|---|---| | researcher | read-only | 15 | 300s | | planner | think + read + task + memory | 10 | 180s | | executor | all capabilities | 20 | 600s | | communicator | think + read + memory + ask_user | 8 | 120s |
Pipeline: Plan → Static Validate (< 1ms, catches cycles/dangling deps) → Execute (worker pool, failure propagation) → Quality Gate (hollow answer detection, no-action executor detection) → Integrate.
Source:
src/agent/orchestrator.ts,src/agent/roles.ts,src/agent/plan-validator.ts,src/agent/quality-gate.ts
Goal Classifier
Two-tier hybrid classification routes every input before the agent loop starts:
- Tier 1 (Regex): < 1ms, $0. Returns if confidence >= 0.8
- Tier 2 (LLM): ~500ms, ~$0.00003.
generateObject()+ Zod schema for ambiguous inputs
Classification controls: prompt sections, tool subsets, context budgets, evidence gate activation. LRU cached (100 entries, 5-min TTL).
Source:
src/agent/goal-classifier.tsTests:tests/unit/agent/goal-classifier.test.ts
CLI Reference
rune # Interactive TUI
rune exec "task" [--verbose] # One-shot autonomous execution
rune web [--no-open] # Daemon + Web UI
rune daemon <start|stop|status> # Background daemon
rune env <set|get|list|unset> # Environment config
rune env quick "connect telegram" # Natural language setup
rune token <create|list|revoke> # API token managementConfiguration
# LLM providers (at least one)
rune env set OPENAI_API_KEY "sk-..."
rune env set ANTHROPIC_API_KEY "sk-ant-..."
# Channels (optional)
rune env set TELEGRAM_BOT_TOKEN "..."
rune env set DISCORD_BOT_TOKEN "..."
# Web search (optional)
rune env set BRAVE_API_KEY "..."Runtime config: ~/.rune/config.yaml — safety presets, LLM routing overrides, proactive behavior, hooks.
MCP servers: ~/.rune/mcp.json (user) or .rune/mcp.json (project). Tools appear as mcp.{server}.{tool} with full policy and approval support.
Development
pnpm dev # Watch mode (tsup)
pnpm build # Full build (backend + web)
pnpm typecheck # tsc --noEmit
pnpm lint && pnpm format
pnpm test # 1886 tests / 240 files
pnpm test:coverage
pnpm probe:all # Live integration probes (API key required)Source Layout
src/
├── agent/ # Loop, classifier, prompt, checkpoint, orchestrator, scheduler
├── capabilities/ # 46 tools
├── safety/ # Guardian, sandbox, policy, approval
├── memory/ # Episodic (JSONL + vector), user model, promotion
├── intelligence/ # AST analyzer, code graph (SQLite)
├── conversation/ # Cross-channel conversation manager
├── llm/ # Multi-provider router + failover
├── channels/ # 9 adapters
├── daemon/ # Daemon, gateway, heartbeat, scheduler
├── browser/ # Playwright + relay mode
├── mcp/ # MCP client + bridge
├── api/ # REST / RPC / SSE / NDJSON
├── ui/ # React/Ink TUI
└── utils/ # Logger, tokenizer, retry, errors
web/ # React + Vite dashboardDocumentation
| | | |---|---| | Architecture | System diagrams, request lifecycle, module deep-dives | | Safety Rollout | Policy tuning and auto-promotion | | Evaluation | Mock provider, live probes, grading | | Skills | SKILL.md spec, auto-generation | | Timeouts | Timeout design across subsystems | | CONTRIBUTING | How to contribute | | SECURITY | Vulnerability disclosure |
Troubleshooting
rune env set OPENAI_API_KEY "sk-..."Only configured providers are health-checked.
npx playwright install chromiumexport RUNE_EMBEDDING_GPU=falserune daemon stop && rune daemon startMIT © RUNE Contributors
