ai-rlm

v2.1.0

Published

14 days ago

Recursive Language Model (RLM) implementation using the Vercel AI SDK. Process long contexts through iterative code execution and sub-LLM queries.

0High
0Medium
0Low

jhsu

ai rlm recursive language-model llm ai-sdk context large-context agent tool

ai-rlm

RLM (Recursive Language Model) provided via ai-sdk Agent or tool.

Based on the paper "Recursive Language Models" by Zhang, Kraska, and Khattab (2025).

Overview

RLM is an inference strategy where LLMs treat long contexts as part of an external environment rather than feeding them directly to the model. The LLM writes JavaScript code to programmatically examine, decompose, and recursively call sub-LLMs over snippets.

Key Features

Iterative Code Execution: The model writes JavaScript code, sees output, then writes more code
Sub-LLM Queries: Access to llm_query() and llm_query_batched() for semantic analysis
Context Management: Efficient handling of large contexts through chunking
Sandboxed REPL: JavaScript execution in a sandboxed QuickJS WebAssembly context
Pluggable Sandbox Interface: Swap the execution environment with your own sandbox implementation
AI SDK Integration: Works as an Agent or Tool with the Vercel AI SDK
Multiple Usage Patterns: Use as standalone agent or as a tool in larger workflows

Installation

npm install ai-rlm ai zod @ai-sdk/openai

ai and zod are peer dependencies and must be installed in your project.

The model and subModel settings accept any AI SDK LanguageModel — use any provider (OpenAI, Anthropic, Google, etc.).

Usage

As Agent (Recommended)

The RLMAgent class provides a clean, agent-based API that integrates seamlessly with the AI SDK:

import { RLMAgent } from 'ai-rlm';
import { openai } from '@ai-sdk/openai';

// Create agent
const agent = new RLMAgent({
  model: openai('gpt-4.1'),              // Root agent model
  subModel: openai('gpt-4.1-mini'),      // Sub-LLM model for queries
  maxIterations: 20,                      // Max REPL iterations
  maxLLMCalls: 50,                        // Max sub-LLM calls
});

// Process a context
const context = `
  The quick brown fox jumps over the lazy dog.
  The magic number is 42.
`;

const query = 'What is the magic number?';

const result = await agent.generate({
  prompt: query,
  options: { context },
});

const rlmResult = result.output;

console.log('Answer:', result.text);
console.log('Iterations:', rlmResult.iterations);
console.log('LLM Calls:', rlmResult.llmCallCount);
console.log('Steps:', rlmResult.steps); // Full trajectory

As Tool

Use createRLMTool to create an AI SDK-compatible tool for use with generateText or ToolLoopAgent:

import { createRLMTool } from 'ai-rlm';
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

// Create the tool
const rlmTool = createRLMTool({
  model: openai('gpt-4.1'),
  subModel: openai('gpt-4.1-mini'),
  description:
    'Use for repository-scale security review when the relevant files are too large to inspect directly.',
});

// Use in generateText
const result = await generateText({
  model: openai('gpt-4.1'),
  tools: { analyzeLargeContext: rlmTool },
  prompt: 'Analyze this large codebase for security vulnerabilities',
});

With ToolLoopAgent

import { ToolLoopAgent } from 'ai';
import { createRLMTool } from 'ai-rlm';
import { openai } from '@ai-sdk/openai';

const agent = new ToolLoopAgent({
  model: openai('gpt-4.1'),
  tools: {
    analyzeLargeContext: createRLMTool({
      model: openai('gpt-4.1'),
      subModel: openai('gpt-4.1-mini'),
    }),
    // ... other tools
  },
});

const result = await agent.generate({
  prompt: 'Check this document for compliance issues',
});

Streaming Support

const stream = await agent.stream({
  prompt: 'Analyze this',
  options: { context: largeDocument },
});

// textStream emits the final text after generate() completes
const reader = stream.textStream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(value);
}

How It Works

The RLM agent writes JavaScript code to explore the context in an iterative loop:

// First, explore the context
console.log('Context length:', context.length);
console.log('First 200 chars:', context.substring(0, 200));

// Search for specific patterns
const lines = context.split('\n');
const targetLine = lines.find(line => line.includes('magic number'));
console.log('Found:', targetLine);

// Store result for later
const answer = targetLine?.match(/magic number is (\d+)/)?.[1];

// Submit answer
FINAL_VAR("answer")

Context Loading: The context is loaded into a sandboxed JavaScript REPL environment
Iterative Reasoning: The root LLM writes JavaScript code to explore the context
Code Execution: Code is executed in a QuickJS WebAssembly sandbox with a 30s timeout
Sub-LLM Queries: For semantic analysis, llm_query() delegates to a sub-model
Result Accumulation: The model iterates until it finds an answer
Final Answer: The model submits an answer using FINAL(answer) or FINAL_VAR("variable_name")

System Prompt

The RLM system prompt instructs the model to:

EXPLORE FIRST - Look at data before processing
ITERATE - Write small code snippets, observe outputs
VERIFY BEFORE SUBMITTING - Check results are correct
USE llm_query FOR SEMANTICS - Code finds WHERE; LLM understands WHAT
CHUNK SMARTLY - Feed substantial chunks to sub-LLMs (~500K chars)

REPL Sandbox

The JavaScript REPL runs code in a QuickJS WebAssembly sandboxed context:

Available in the Sandbox:

context: The input context (string or object)
console.log() / console.error(): Output logging
llm_query(prompt): Query a sub-LLM for semantic analysis
llm_query_batched(prompts): Query multiple sub-LLMs
FINAL(answer): Submit final answer directly
FINAL_VAR("varName"): Submit a variable from the REPL by name
Standard JavaScript: All ES6+ features, Array methods, String methods, Math, JSON, etc.

Security Features:

30-second timeout on code execution
No access to Node.js built-in modules or file system
No network access
Sandboxed console output capture

Custom Sandbox Implementations

RLMAgent supports user-defined sandboxes through sandboxFactory.

import {
  RLMAgent,
  createQuickJSSandbox,
  type RLMSandbox,
  type RLMSandboxFactoryOptions,
} from 'ai-rlm';
import { openai } from '@ai-sdk/openai';

const sandboxFactory = (options: RLMSandboxFactoryOptions): RLMSandbox => {
  // Wrap the default QuickJS sandbox, or return your own implementation.
  return createQuickJSSandbox(options);
};

const agent = new RLMAgent({
  model: openai('gpt-4.1'),
  subModel: openai('gpt-4.1-mini'),
  sandboxFactory,
});

Logging

Library diagnostics are silent by default. If you want internal agent logs, pass an explicit logger and log level:

const agent = new RLMAgent({
  model: openai('gpt-4.1'),
  subModel: openai('gpt-4.1-mini'),
  logger: console,
  logLevel: 'debug',
});

Use this for local debugging. In application code, prefer wiring logger to your app's logging system rather than relying on console.

Your sandbox must implement:

interface RLMSandbox {
  loadContext(context: RLMContext): Promise<void>;
  executeJavaScript(code: string): Promise<{
    stdout: string;
    stderr: string;
    error?: string;
    result?: unknown;
  }>;
  getVariable(name: string): unknown;
  getLLMCallCount(): number;
  getUsageSummary(): RLMUsageSummary;
  cleanup(): void;
}

Custom sandbox factories are also propagated to recursive sub_rlm() calls.

API Reference

RLMAgent

The primary class for using RLM as an agent.

`constructor(settings: RLMAgentSettings)`

import type { LanguageModel } from 'ai';

interface RLMAgentSettings {
  model: LanguageModel;     // Required: Root agent model
  subModel?: LanguageModel; // Optional: Sub-LLM model (defaults to model)
  maxIterations?: number;   // Max REPL iterations (default: 20)
  maxLLMCalls?: number;     // Max sub-LLM calls (default: 50)
  maxOutputChars?: number;  // Max REPL output chars (default: 100000)
  maxHistoryPreview?: number; // Max output preview chars in model history (default: 500)
  prepareIteration?: (ctx) => PrepareIterationResult | void | Promise<PrepareIterationResult | void>;
  prepareSubAgent?: (ctx) => PrepareSubAgentResult | void | Promise<PrepareSubAgentResult | void>;
  logger?: RLMLogger;       // Optional injected logger
  logLevel?: RLMLogLevel;   // Log level for internal diagnostics (default: "silent")
  sandboxFactory?: RLMSandboxFactory; // Optional custom sandbox factory
}

`async generate(options): Promise<RLMGenerateResult>`

Generate an answer by iteratively analyzing the context.

Parameters:

interface RLMAgentCallParameters {
  context: RLMContext;                    // The large context to analyze
  query: string;                          // The question or task
  abortSignal?: AbortSignal;              // Optional abort signal
  timeout?: number;                       // Optional timeout in ms
  onStepFinish?: (step: REPLStep) => void; // Callback for each step
}

Returns:

interface RLMGenerateResult {
  text: string;             // The generated answer
  steps: REPLStep[];        // Array of REPL steps taken
  llmCallCount: number;     // Total LLM calls made
  iterations: number;       // Total iterations performed
  usage: RLMUsageSummary;   // Aggregated token usage across root + sub-calls
}

interface REPLStep {
  iteration: number;
  reasoning: string;        // The model's reasoning before code
  code: string;             // JavaScript code executed
  output: string;           // Console output and results
}

`async stream(options): Promise<RLMStreamResult>`

Run generate() and emit AI SDK-style stream parts for iteration progress and final text output.

Returns:

interface RLMStreamResult extends RLMGenerateResult {
  textStream: ReadableStream<string>;  // Emits text-delta content
  fullStream: ReadableStream<TextStreamPart<ToolSet>>; // Emits start/start-step/finish-step/text/finish events
}

createRLMTool

Factory function to create RLM as an AI SDK-compatible tool.

`createRLMTool(config?: RLMToolConfig)`

import type { LanguageModel } from 'ai';

function createRLMTool(config?: {
  model?: LanguageModel;    // Root agent model
  subModel?: LanguageModel; // Sub-LLM model
  maxIterations?: number;   // Max iterations (default: 20)
  maxLLMCalls?: number;     // Max LLM calls (default: 50)
  maxOutputChars?: number;  // Max output chars (default: 100000)
  description?: string;     // Extra guidance appended to the tool description
  logger?: RLMLogger;       // Optional injected logger
  logLevel?: RLMLogLevel;   // Log level for internal diagnostics
}): Tool

Tool Input Schema:

{
  context: string | string[] | Record<string, unknown>;
  query: string;
  maxIterations?: number;   // Optional override
  maxLLMCalls?: number;     // Optional override
}

Tool Output:

{
  answer: string;           // The generated answer
  iterations: number;       // Number of iterations
  stepsTaken: number;       // Number of steps executed
}

RLMContext

Context can be any of these formats:

type RLMContext = string | string[] | Record<string, unknown>;

string: Raw text document
string[]: Array of lines or documents
Record<string, unknown>: JSON/structured data

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      RLMAgent Class                         │
├─────────────────────────────────────────────────────────────┤
│  ┌───────────────────────────────────────────────────────┐  │
│  │              REPL Environment (QuickJS)               │  │
│  │  - Sandboxed JavaScript execution                     │  │
│  │  - llm_query() for sub-LLM semantic analysis          │  │
│  │  - 30s timeout protection                             │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │              generate() Method                        │  │
│  │  1. Generate reasoning + JS code                      │  │
│  │  2. Execute in sandboxed context                      │  │
│  │  3. Process llm_query markers → real LLM calls        │  │
│  │  4. Check for FINAL() answer                          │  │
│  │  5. Repeat or return answer                           │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │              stream() Method                          │  │
│  │  - Delegates to generate()                            │  │
│  │  - Emits start-step / finish-step progress events     │  │
│  │  - Emits text-start / text-delta / text-end / finish  │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ createRLMTool()
                              ▼
                    ┌──────────────────────┐
                    │    AI SDK Tool        │
                    │ - Tool interface      │
                    │ - Input validation    │
                    │ - Auto-execution      │
                    └──────────────────────┘

Examples

Run the examples:

# Basic agent examples
bun run examples/basic-usage.ts

# Tool integration examples
bun run examples/tool-usage.ts

# Individual examples
bun run -e "import { example1SimpleTextSearch } from './examples/basic-usage.ts'; example1SimpleTextSearch()"

CLI Codebase Search

This repo includes a local CLI script for searching a codebase with RLMAgent.

The CLI now uses a ToolLoopAgent orchestrator with tools:

list_files
search_files
read_file
analyze_with_rlm (deep analysis on selected files)

This avoids preloading the entire repository into one context window.

npm run code-search -- ./path/to/codebase "Where is authentication handled?"

You can also run the bin directly:

node ./bin/rlm-codebase-search.js ./path/to/codebase "How are API routes defined?"

Required environment variable:

export OPENAI_API_KEY="your_key_here"

Example Files

examples/basic-usage.ts: Agent API examples (generate, stream, callbacks)
examples/tool-usage.ts: Tool API examples (with generateText, ToolLoopAgent)
examples/document-comparison.ts: Document diffing example
examples/data-transformation.ts: Data extraction and transformation

License

MIT

References

Paper: "Recursive Language Models" (Zhang, Kraska, Khattab, 2025)
AI SDK Documentation: https://sdk.vercel.ai/docs

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-rlm

Overview

Key Features

Installation

Usage

As Agent (Recommended)

As Tool

With ToolLoopAgent

Streaming Support

How It Works

System Prompt

REPL Sandbox

Available in the Sandbox:

Security Features:

Custom Sandbox Implementations

Logging

API Reference

RLMAgent

constructor(settings: RLMAgentSettings)

async generate(options): Promise<RLMGenerateResult>

async stream(options): Promise<RLMStreamResult>

createRLMTool

createRLMTool(config?: RLMToolConfig)

RLMContext

Architecture

Examples

CLI Codebase Search

Example Files

License

References

`constructor(settings: RLMAgentSettings)`

`async generate(options): Promise<RLMGenerateResult>`

`async stream(options): Promise<RLMStreamResult>`

`createRLMTool(config?: RLMToolConfig)`