@chinmaymk/ra

v0.0.3

Published

2 days ago

One loop. Infinite agents. Type definitions for the ra AI agent framework.

0High
0Medium
0Low

chinmaymk

ai agent cli llm mcp anthropic openai google ollama types

What is ra?

ra is a small, hackable agent. Nothing hidden behind abstractions you can't reach. It doesn't even ship with a system prompt. Every part of the loop is exposed via config and can be extended by writing scripts or plain TypeScript. Middleware hooks let you intercept every step — model calls, tool execution, streaming, all of it.

It comes with built-in tools for filesystem, shell, network, and user interaction, connects to MCP servers for additional tools. Provides persistent resumable sessions. Has An FTS5 memory backed by SQLite. It speaks MCP both ways — use external MCP servers, or expose ra itself as an MCP server so you can use it from Cursor, Claude Desktop, or anything else that speaks the protocol. It talks to Anthropic, OpenAI, Google, Ollama, Bedrock, and Azure. Switch providers with ease. Extended thinking for models that support it.

It gives you real control over context. Deterministic discovery for common formats (CLAUDE.md, AGENTS.md, README.md), pattern resolution, prompt caching, compaction, token tracking. A skill system that can pull skills from GitHub repos or npm packages.

It runs as a CLI, REPL, HTTP server, or MCP server. No runtime dependencies. Fully observable via structured logs and traces for each session, so you can actually see what your agent is doing.

All of this is configurable via a layered config system — env vars, config files (JSON, YAML, TOML), or CLI flags. Each layer overrides the last.

ra "What is the capital of France?"
ra --provider openai --model gpt-4.1 "Explain this error"
ra --skill code-review --file diff.patch "Review this diff"
cat server.log | ra "Find the root cause of these errors"
ra   # interactive REPL

Install

curl -fsSL https://raw.githubusercontent.com/chinmaymk/ra/main/install.sh | bash
ra --help

Quick Start

export RA_ANTHROPIC_API_KEY="sk-..."

ra "Summarize the key points of this file" --file report.pdf   # one-shot with file attachment
ra                                                              # interactive REPL
cat error.log | ra "Explain this error"                         # pipe stdin
git diff | ra --skill code-review "Review these changes"        # pipe + skill
ra --http                                                       # streaming HTTP API
ra --mcp-stdio                                                  # MCP server for Cursor / Claude Desktop

The Agent Loop

ra's core loop is simple: send messages to the model, stream the response, execute any tool calls, repeat. Every step fires a middleware hook you can intercept. The loop handles iteration, token tracking, and tool execution — you control everything else through system prompts, skills, and middleware.

┌─────────────────────────────────────────────────┐
│                  beforeLoopBegin                │
└──────────────────────┬──────────────────────────┘
                       ▼
         ┌─── beforeModelCall ◄────────────┐
         │                                 │
         ▼                                 │
    Stream response                        │
    (onStreamChunk)                        │
         │                                 │
         ▼                                 │
   afterModelResponse                      │
         │                                 │
         ├── No tool calls? ──► afterLoopComplete
         │
         ▼
   beforeToolExecution
         │
         ▼
    Execute tools
         │
         ├── ask_user? ──► suspend (loop exits without afterLoopComplete)
         │
         ▼
   afterToolExecution
         │
         ▼
   afterLoopIteration ────────────────────►┘

The loop tracks token usage per iteration, enforces maxIterations, and supports an AbortController — any middleware can call ctx.stop() to halt the loop cleanly.

Context Control

Smart context compaction

When conversations grow, ra compacts automatically. It splits the history into three zones — pinned messages (system prompt, first user message), compactable middle, and recent turns — then summarizes the middle with a cheap model. You keep the context that matters.

compaction:
  enabled: true
  threshold: 0.8               # trigger at 80% of context window
  model: claude-haiku-4-5-20251001  # cheap model for summarization

Uses real token counts when available, never splits tool call boundaries, and picks a cheap default compaction model per provider (Haiku for Anthropic, GPT-4o-mini for OpenAI, Gemini Flash for Google).

Token tracking & prompt caching

ra tracks input and output tokens across every iteration. Your middleware can read cumulative usage via ctx.loop.usage and enforce budgets, log costs, or trigger compaction early. On Anthropic, cache hints are automatically added to system prompts and tool definitions — no config needed.

Extended thinking

ra --thinking high "Design a database schema for a social network"

Three budget levels (low, medium, high) control how much the model reasons before responding. Thinking output streams to the terminal in real time.

Context discovery

ra can discover and inject project context files into the conversation before your prompt. Configure which files to look for via the context.patterns config:

context:
  enabled: true
  patterns:
    - "CLAUDE.md"
    - "AGENTS.md"
    - "CONVENTIONS.md"

ra walks the directory tree upward to the git root, finds matching files, and injects them as context.

Pattern resolution

Reference files and URLs inline in your prompts — ra resolves them before the model sees the message.

ra "explain what @src/auth.ts does"            # file contents injected
ra "review @src/utils/*.ts for consistency"     # glob expansion
ra "summarize url:https://example.com/api-docs" # fetched page content

Two built-in resolvers (@ for files/globs, url: for URLs) are enabled by default. Add custom resolvers for GitHub issues, database records, or anything else via context.resolvers in your config.

Providers

Same config, any backend. Switch with a flag.

ra --provider anthropic --model claude-sonnet-4-6 "Review this PR"
ra --provider openai --model gpt-4.1 "Explain this error"
ra --provider google --model gemini-2.5-pro "Summarize this doc"
ra --provider ollama --model llama3 "Write a haiku"
ra --provider bedrock --model anthropic.claude-sonnet-4-6 "Triage this bug"
ra --provider azure --azure-deployment my-gpt4o "Analyze this log"

| Provider | Env vars | |----------|---------| | anthropic | RA_ANTHROPIC_API_KEY | | openai | RA_OPENAI_API_KEY | | google | RA_GOOGLE_API_KEY | | bedrock | RA_BEDROCK_REGION | | ollama | RA_OLLAMA_HOST | | azure | RA_AZURE_ENDPOINT, RA_AZURE_DEPLOYMENT, RA_AZURE_API_KEY (optional) |

Bedrock falls back to the standard AWS credential chain. Azure falls back to DefaultAzureCredential. Anthropic, OpenAI, and Google support --<provider>-base-url flags. Ollama uses --ollama-host, Azure uses --azure-endpoint.

Interfaces

Same agent, multiple entry points.

| Interface | Flag | Use case | |-----------|------|----------| | CLI | --interface cli (default with a prompt) | Pipe it, chain it, cron it | | REPL | --interface repl (default without a prompt) | Interactive sessions with tool use and history | | HTTP | --http | Streaming SSE or sync JSON for your product | | MCP | --mcp-stdio / --mcp | Expose ra as a tool for Cursor, Claude Desktop, other agents |

CLI

Streams to stdout and exits. Supports piped stdin — when input is piped, ra reads it and auto-switches to CLI mode.

ra "What's wrong with this code?" --file buggy.ts
cat error.log | ra "Explain this error"
git diff | ra "Summarize these changes"
echo "hello world" | ra                             # stdin becomes the prompt
ra --resume <session-id> "Continue from where we left off"

REPL

Full interactive sessions with slash commands.

ra
> How does the auth module work?
> /skill code-review          # activate a skill for next message
> /attach diff.patch          # attach a file to next message
> /context                    # show discovered context files
> /memories                   # see what the agent remembers
> /forget dark mode           # delete memories matching a query
> /resume abc-123             # resume a previous session
> /clear                      # start fresh

HTTP API

ra --http --http-port 8080 --http-token secret

| Endpoint | Method | Description | |----------|--------|-------------| | /chat | POST | SSE stream — data: {"type":"text","delta":"..."} | | /chat/sync | POST | Blocking JSON — {"response":"..."} | | /sessions | GET | List stored sessions |

Both endpoints accept {"messages": [...], "sessionId": "..."}. The streaming endpoint also emits ask_user events when the agent needs input.

MCP server

Expose the full agent loop as a tool for other agents.

ra --mcp-stdio   # stdio for Cursor / Claude Desktop
ra --mcp         # HTTP transport

{
  "mcpServers": {
    "ra": {
      "command": "ra",
      "args": ["--mcp-stdio"]
    }
  }
}

When built-in tools are enabled, they're also exposed as individual MCP tools — so other agents get access to ra's filesystem, shell, and network tools directly.

Built-in Tools

Tools are enabled by default and are self-describing — each includes a detailed schema and description so the model knows when and how to use them. Shell execution is platform-specific (execute_bash on Linux/macOS, execute_powershell on Windows).

| Category | Tools | |----------|-------| | Filesystem | read_file, write_file, update_file, append_file, list_directory, search_files, glob_files, move_file, copy_file, delete_file | | Shell | execute_bash (Linux/macOS) / execute_powershell (Windows) | | Network | web_fetch | | Agent | ask_user, checklist, subagent |

update_file does exact string replacement (like Claude Code's Edit). ask_user suspends the loop and returns control to the caller — the REPL prints the question, CLI prints a session ID for --resume, HTTP emits an SSE event. subagent forks parallel copies of the agent for independent tasks, with token usage rolling up into the parent.

To bring your own tools via MCP instead, set builtinTools: false.

Permissions

Control what tools can do with regex-based allow/deny rules per tool, per field. Deny always takes priority.

# ra.config.yml
permissions:
  rules:
    - tool: execute_bash
      command:
        allow: ["^git ", "^bun "]
        deny: ["--force", "--hard", "--no-verify"]
    - tool: write_file
      path:
        allow: ["^src/", "^tests/"]
        deny: ["\\.env"]

Each rule key (other than tool) matches a field from the tool's input schema. When a call is denied, the model gets a clear error and can adjust. Set default_action: deny for an allowlist-only approach.

Skills

Reusable instruction bundles — roles, behaviors, scripts, and reference docs packaged as directories. Skills use progressive disclosure: the model sees skill names and descriptions first, then reads the full SKILL.md on demand.

skills/
  code-review/
    SKILL.md           # frontmatter + instructions
    scripts/
      gather-diff.sh   # on-demand — model runs when needed
    references/
      style-guide.md   # on-demand — read via /skill-ref

---
name: code-review
description: Reviews code for bugs, style, and best practices
---

You are a senior code reviewer. Focus on:
- Correctness and edge cases
- Performance implications
- Naming and readability

ra --skill code-review "Review the latest changes"   # CLI — always-on
ra skill install github:user/repo                     # install from GitHub
ra skill install npm:[email protected]                # install from npm
ra skill install https://example.com/skills.tgz       # install from URL
ra skill list                                         # list installed skills
ra skill remove code-review                           # remove a skill

Scripts and references are loaded on demand — not eagerly at activation. In the REPL, use /skill-run <skill> <script> and /skill-ref <skill> <reference> to load them into context when needed. Skills support multi-runtime scripts (bash, python, typescript, javascript, go) with shebang detection.

Built-in skills

ra ships with ready-to-use skills:

| Skill | Purpose | |-------|---------| | code-review | Reviews code for bugs, security, style, and correctness | | architect | Designs systems and evaluates architecture decisions | | planner | Breaks work into concrete steps before implementation | | debugger | Systematically diagnoses bugs and unexpected behavior | | code-style | Reviews and writes code for clarity, simplicity, and correctness | | writer | Writes clear technical documentation, READMEs, and guides |

ra --skill architect "Design a queue system for email notifications"
ra --skill debugger --file crash.log "Find the root cause"

Sessions

ra persists every conversation as JSONL. Resume any session from any interface.

ra --resume <session-id> "Continue with the next step"

storage:
  path: .ra/sessions
  maxSessions: 100     # auto-prune oldest
  ttlDays: 30           # auto-expire

Sessions are auto-saved after each turn. Each session directory contains messages, logs, and traces. Resume from the REPL (/resume <id>), CLI (--resume <id>), or HTTP API (sessionId field).

MCP

ra speaks MCP in both directions.

As a client — connect to external MCP servers. Their tools become available to the model.

mcp:
  client:
    - name: filesystem
      transport: stdio
      command: npx
      args: ["-y", "@anthropic/mcp-filesystem"]
    - name: database
      transport: sse
      url: http://localhost:8080/mcp
    - name: github
      transport: stdio
      command: npx
      args: ["-y", "@modelcontextprotocol/server-github"]
      env:
        GITHUB_PERSONAL_ACCESS_TOKEN: "${GITHUB_TOKEN}"
  lazySchemas: true   # default — strip schemas, reveal on first call

All MCP tools get server-prefixed names (github__search, database__query) to avoid conflicts across servers. With lazy schema loading (default), only the inputSchema is stripped. The first call to each tool returns the full parameter schema instead of executing — the model retries with correct parameters. You only pay for schemas of tools actually used.

As a server — ra --mcp-stdio exposes the full agent loop as a single MCP tool, plus all built-in tools as individual MCP tools. See Interfaces → MCP server for config examples.

You can also run the MCP server alongside another interface — for example, a REPL with an MCP sidecar:

ra --mcp-server-enabled --mcp-server-port 4000 --repl

Middleware

Hook into every step of the agent loop. Define hooks as inline expressions in config or as TypeScript files when you need real logic. Every hook gets the full conversation history and can call ctx.stop() to halt the loop.

| Hook | When | Context | |------|------|---------| | beforeLoopBegin | Once at start | messages, iteration, usage | | beforeModelCall | Before each LLM call | request (messages, model, tools), loop state | | onStreamChunk | Per streaming token | chunk (text/thinking/tool_call), loop state | | afterModelResponse | After model finishes | request, loop state | | beforeToolExecution | Before each tool call | toolCall (name, arguments, id), loop state | | afterToolExecution | After each tool returns | toolCall, result (content, isError), loop state | | afterLoopIteration | After each full iteration | messages, iteration, usage | | afterLoopComplete | After the loop ends | messages, iteration, usage | | onError | On exceptions | error, phase (model_call/tool_execution/stream), loop state |

// middleware/token-budget.ts — stop if we've used too many tokens
export default async (ctx) => {
  if (ctx.loop.usage.inputTokens + ctx.loop.usage.outputTokens > 100_000) {
    ctx.stop()
  }
}

Hooks can also be inline expressions in config for simple cases. All hooks support a configurable timeout via toolTimeout (default: 30s).

Observability

Structured JSON logging and tracing are built in. By default, logs and traces are written to the session directory alongside conversation messages — keeping stdout/stderr clean.

.ra/sessions/{session-id}/
  meta.json          # session metadata
  messages.jsonl     # conversation messages
  logs.jsonl         # structured logs
  traces.jsonl       # trace spans

# ra.config.yml
observability:
  enabled: true          # default: true
  logs:
    level: info          # debug | info | warn | error
    output: session      # session | stderr | stdout | file
    filePath: .ra/logs.jsonl  # only used when output is 'file'
  traces:
    output: session      # session | stderr | stdout | file
    filePath: .ra/traces.jsonl

Every startup event, model call, tool execution, compaction, and error is logged. Traces follow an OpenTelemetry-inspired span hierarchy (agent.loop → agent.iteration → agent.model_call / agent.tool_execution), each recording duration, status, and attributes. Both emit JSONL — pipe through jq to explore. See docs/observability.md for the full event reference.

Memory

SQLite-backed memory with FTS5 full-text search. The agent gets memory_save, memory_search, and memory_forget tools, and recent memories are injected at the start of each loop.

ra --memory                       # enable memory for this session
ra --list-memories                # list all stored memories
ra --memories "typescript"        # search memories
ra --forget "dark mode"           # delete matching memories

memory:
  enabled: true
  path: .ra/memory.db
  maxMemories: 1000
  ttlDays: 90
  injectLimit: 5           # inject top-N recent memories (0 to disable)

The agent decides when to save and forget — tool descriptions guide it to capture user preferences, project decisions, and corrections.

Recipes

Pre-built agent configurations you can use directly or fork.

Coding Agent

A general-purpose coding agent with file editing, shell execution, codebase navigation, extended thinking, and smart context compaction. Uses 200 max iterations and high thinking budget.

ra --config recipes/coding-agent/ra.config.yaml

Code Review Agent

Reviews diffs for correctness, style, and performance. Connects to GitHub via MCP, includes a diff-gathering script and style guide, and enforces a token budget via middleware.

ra --config recipes/code-review-agent/ra.config.yaml --file diff.patch "Review this"

Configuration

Layered config. Each layer overrides the previous.

defaults → config file → env vars → CLI flags

# ra.config.yml — all sections are optional
provider: anthropic
model: claude-sonnet-4-6
systemPrompt: You are a helpful coding assistant.
maxIterations: 50
thinking: medium
skills: [code-review]

Every option shown in the sections above (compaction, permissions, memory, mcp, middleware, etc.) goes in this file. Environment variables use the RA_ prefix (RA_PROVIDER, RA_MODEL, RA_ANTHROPIC_API_KEY), and CLI flags override everything:

ra --provider openai --model gpt-4.1 --thinking high --max-iterations 10 "Review this"

Scripting

Use --exec to run a TypeScript or JavaScript file that imports ra's internals programmatically.

ra --exec ./scripts/batch-review.ts

GitHub Actions

Use ra directly in your CI/CD workflows. No install step needed — the action downloads the binary automatically.

- uses: chinmaymk/ra@latest
  with:
    prompt: "Review this PR for bugs and security issues"
    provider: anthropic
    model: claude-sonnet-4-6
  env:
    RA_ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

The action exposes the same configuration as the CLI — provider, model, skills, thinking, file attachments, and custom config files. See the GitHub Actions docs for full usage.

Building from Source

git clone https://github.com/chinmaymk/ra.git && cd ra
bun install
bun run compile   # → dist/ra
bun tsc           # type-check
bun test          # run tests

License

MIT