rllm

v1.3.0

Published

21 days ago

TypeScript port of Recursive Language Models (RLMs) - a task-agnostic inference paradigm for handling near-infinite length contexts

0High
0Medium
0Low

nitayrabi

llm recursive language-model ai openai anthropic inference

RLLM: Recursive Large Language Models (TypeScript)

A TypeScript implementation of Recursive Language Models for processing large contexts with LLMs.

Inspired by Cloudflare's Code Mode approach.

Key differences from the Python version:

V8 isolates instead of subprocess/TCP
Zod schema support for typed context
TypeScript-native

Installation

pnpm add rllm
# or
npm install rllm

Demo

RLLM analyzing a node_modules directory — the LLM writes JavaScript to parse dependencies, query sub-LLMs in parallel, and synthesize a final answer:

https://private-user-images.githubusercontent.com/24541325/535005656-2c6f2381-3af4-439d-83b7-1e686a51fa94.mp4

Built with Gemini Flash 3. See the full interactive example in examples/node-modules-viz/.

Quick Start

LLM writes JavaScript code that runs in a secure V8 isolate:

import { createRLLM } from 'rllm';

const rlm = createRLLM({
  model: 'gpt-4o-mini',
  verbose: true,
});

// Full RLM completion - prompt first, context in options
const result = await rlm.completion(
  "What are the key findings in this research?",
  { context: hugeDocument }
);

console.log(result.answer);
console.log(`Iterations: ${result.iterations}, Sub-LLM calls: ${result.usage.subCalls}`);

Structured Context with Zod Schema

For structured data, you can provide a Zod schema. The LLM will receive type information, enabling it to write better code:

import { z } from 'zod';
import { createRLLM } from 'rllm';

// Define schema for your data
const DataSchema = z.object({
  users: z.array(z.object({
    id: z.string(),
    name: z.string(),
    role: z.enum(['admin', 'user', 'guest']),
    activity: z.array(z.object({
      date: z.string(),
      action: z.string(),
    })),
  })),
  settings: z.record(z.string(), z.boolean()),
});

const rlm = createRLLM({ model: 'gpt-4o-mini' });

const result = await rlm.completion(
  "How many admin users are there? What actions did they perform?",
  {
    context: myData,
    contextSchema: DataSchema,  // LLM sees the type structure!
  }
);

The LLM will know it can access context.users, context.settings, etc. with full type awareness.

Structured Output with Zod (`generateObject`)

If you want schema-validated JSON output directly (without REPL/code execution), use generateObject. RLLM will retry when output is invalid JSON or fails Zod validation.

import { z } from 'zod';
import { createRLLM } from 'rllm';

const rlm = createRLLM({ model: 'gpt-4o-mini' });

const OutputSchema = z.object({
  summary: z.string(),
  keyPoints: z.array(z.string()),
  confidence: z.number().min(0).max(1),
});
const InputSchema = z.object({
  reportText: z.string(),
  locale: z.string(),
});

const result = await rlm.generateObject(
  "Summarize this report and provide key points with confidence",
  {
    input: {
      reportText: hugeDocument,
      locale: "en-US",
    },
    inputSchema: InputSchema,
    outputSchema: OutputSchema,
  },
  {
    maxRetries: 2, // total attempts = 3
    onRetry: (event) => {
      console.log(`Retry ${event.attempt}/${event.maxRetries + 1}: ${event.errorType}`);
    },
  }
);

console.log(result.object.summary);
console.log(result.attempts, result.usage.tokenUsage.totalTokens);

generateObject differs from completion():

generateObject asks for one JSON object and validates it against your schema.
completion() runs the full recursive REPL workflow where the model writes and executes JS code.

The LLM will write code like:

// LLM-generated code runs in V8 isolate
const chunks = [];
for (let i = 0; i < context.length; i += 50000) {
  chunks.push(context.slice(i, i + 50000));
}

const findings = await llm_query_batched(
  chunks.map(c => `Extract key findings from:\n${c}`)
);

const summary = await llm_query(`Combine findings:\n${findings.join('\n')}`);
print(summary);
giveFinalAnswer({ message: summary });

API Reference

`createRLLM(options)`

Create an RLLM instance with sensible defaults.

const rlm = createRLLM({
  model: 'gpt-4o-mini',      // Model name
  provider: 'openai',         // 'openai' | 'anthropic' | 'gemini' | 'openrouter' | 'cerebras' | 'custom'
  apiKey: process.env.KEY,    // Optional, uses env vars by default
  baseUrl: undefined,         // Optional, required for 'custom' provider
  verbose: true,              // Enable logging
});

Custom Provider (OpenAI-Compatible APIs)

Use the custom provider to connect to any OpenAI-compatible API (e.g., vLLM, Ollama, LM Studio, Azure OpenAI):

const rlm = createRLLM({
  provider: 'custom',
  model: 'llama-3.1-8b',
  baseUrl: 'http://localhost:8000/v1',  // Required for custom provider
  apiKey: 'your-api-key',               // Optional, depends on your API
  verbose: true,
});

Note: When using provider: 'custom', the baseUrl parameter is required. An error will be thrown if it's not provided.

Cerebras Provider

Use Cerebras with the built-in cerebras provider:

const rlm = createRLLM({
  provider: 'cerebras',
  model: 'gpt-oss-120b',
  // optional if CEREBRAS_API_KEY is set
  apiKey: process.env.CEREBRAS_API_KEY,
});

Defaults:

API key env var: CEREBRAS_API_KEY
Base URL: https://api.cerebras.ai/v1

`RLLM` Methods

| Method | Description | |--------|-------------| | rlm.completion(prompt, options) | Full RLM completion with code execution | | rlm.generateObject(prompt, { input?, inputSchema?, outputSchema }, options?) | Structured output with Zod validation + retries | | rlm.chat(messages) | Direct LLM chat | | rlm.getClient() | Get underlying LLM client |

`CompletionOptions`

| Option | Type | Description | |--------|------|-------------| | context | string \| T | The context data available to LLM-generated code | | contextSchema | ZodType<T> | Optional Zod schema describing context structure |

`GenerateObjectOptions`

| Option | Type | Description | |--------|------|-------------| | maxRetries | number | Retries after first attempt (default 2) | | temperature | number | Optional generation temperature | | maxTokens | number | Optional max completion tokens | | onRetry | (event) => void | Called when parse/validation fails and a retry is scheduled |

`GenerateObject` schema config

| Field | Type | Description | |-------|------|-------------| | input | TInput | Optional structured input value | | inputSchema | ZodType<TInput> | Optional input schema used for pre-validation + prompt typing | | outputSchema | ZodType<TOutput> | Required output schema used for retry validation |

Sandbox Bindings

The V8 isolate provides these bindings to LLM-generated code:

| Binding | Description | |---------|-------------| | context | The loaded context data | | llm_query(prompt, model?) | Query sub-LLM | | llm_query_batched(prompts, model?) | Batch query sub-LLMs | | giveFinalAnswer({ message, data? }) | Return final answer | | print(...) | Console output |

Real-time Events

Subscribe to execution events for visualizations, debugging, or streaming UIs:

const result = await rlm.completion("Analyze this data", {
  context: myData,
  onEvent: (event) => {
    switch (event.type) {
      case "iteration_start":
        console.log(`Starting iteration ${event.iteration}`);
        break;
      case "llm_query_start":
        console.log("LLM thinking...");
        break;
      case "code_execution_start":
        console.log(`Executing:\n${event.code}`);
        break;
      case "final_answer":
        console.log(`Answer: ${event.answer}`);
        break;
    }
  }
});

| Event Type | Description | |------------|-------------| | iteration_start | New iteration beginning | | llm_query_start | Main LLM query starting | | llm_query_end | Main LLM response received | | code_execution_start | V8 isolate executing code | | code_execution_end | Code execution finished | | final_answer | giveFinalAnswer() called with answer |

Architecture

┌─────────────────────────────────────────────────────────────┐
│  RLLM TypeScript                                            │
│                                                             │
│  ┌─────────────┐    ┌──────────────────────────────────┐   │
│  │   RLLM      │    │  V8 Isolate (Sandbox)            │   │
│  │   Class     │───▶│                                  │   │
│  └─────────────┘    │  • context (injected data)       │   │
│        │            │  • llm_query() ──┐               │   │
│        │            │  • llm_query_batched()           │   │
│        ▼            │  • print() / console             │   │
│  ┌─────────────┐    │  • giveFinalAnswer()             │   │
│  │  LLMClient  │◀───┼──────────────────┘               │   │
│  │  (OpenAI)   │    │                                  │   │
│  └─────────────┘    │  LLM-generated JS code runs here │   │
│                     └──────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

No TCP. No subprocess. Direct function calls via bindings.

Why V8 Isolates? (Not TCP/Containers)

The Python RLLM uses subprocess + TCP sockets for code execution. We use V8 isolates instead:

Python RLLM:  LLM → Python exec() → subprocess → TCP socket → LMHandler
TypeScript:   LLM → V8 isolate (same process) → direct function calls

Benefits:

No TCP/network - Direct function calls via bindings
Fast startup - Isolates spin up in milliseconds
Secure - V8's built-in memory isolation
Simple - No containers, no socket servers

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run example
pnpm example

# Run tests
pnpm test

License

MIT - Same as the original Python RLLM.

Credits

Based on the Recursive Language Models paper and Python implementation by Alex Zhang et al.

Reference: RLM Blogpost

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme