@kevinjosethomas/rlm
v0.1.1
Published
Recursive Language Models in TypeScript — browser-native RLM
Readme
rlm-ts
A TypeScript re-implementation of Recursive Language Models that runs entirely in the browser. Based on Prime Intellect's RLM paper and reference implementation.
What is an RLM?
Standard LLMs degrade on long contexts — performance drops and cost scales linearly. Recursive Language Models flip this: instead of dumping all data into one prompt, the model gets a persistent JavaScript REPL and can spawn sub-LLM calls to process data in parallel. The main model's context stays small; heavy data processing is delegated.
Three primitives:
- Persistent REPL — the model writes JavaScript in
```repl```blocks. Variables declared withvarpersist across executions, building up state incrementally. llm_query(prompt)— spawns a sub-LLM call for summarization, extraction, or interpretation. The sub-LLM handles the data; the main model only sees the summary.llm_batch(prompts)— runs multiple sub-LLM calls concurrently for parallel analysis.
The model iterates — writing code, inspecting results, delegating to sub-LLMs, refining its analysis — until it produces a final answer.
How it works
Browser Your Server
┌─────────────────────────────────┐ ┌──────────────────────┐
│ │ │ │
│ RLM Core (main thread) │ │ /api/rlm │
│ ├── Iteration loop │ │ ├── Main calls │
│ ├── Parse ```repl``` blocks ├──fetch──┤ │ (Sonnet) │
│ ├── Build message history │ │ ├── Sub-LLM calls │
│ └── Detect FINAL answer │ │ │ (Haiku) │
│ │ │ │ └── Batch calls │
│ ▼ │ │ │
│ Web Worker (sandbox) │ └──────────┬───────────┘
│ ├── eval() code execution │ │
│ ├── Persistent var state │ ▼
│ ├── llm_query() via Atomics │ LLM Provider API
│ └── Console capture │
│ │
└─────────────────────────────────┘The sandbox runs in a Web Worker — an isolated thread with no DOM, no network access, and no main-thread scope. When the model calls llm_query() inside the sandbox, it blocks synchronously via SharedArrayBuffer + Atomics.wait() while the main thread fetches from your API route and writes the response back to shared memory.
This means the model can write natural synchronous code:
var data = sleep_data.slice(0, 30)
var analysis = llm_query("Analyze these sleep records: " + JSON.stringify(data))
console.log(analysis)Even though the LLM call is async under the hood.
Install
npm install rlm-tsUsage
1. Create an API route
The RLM needs a server-side proxy to keep API keys off the client. Here's a Next.js example using the Vercel AI SDK:
// app/api/rlm/route.ts
import { generateText } from "ai";
import { gateway } from "@ai-sdk/gateway";
export async function POST(req: Request) {
const body = await req.json();
if (body.type === "batch") {
const results = await Promise.all(
body.prompts.map((prompt: string) =>
generateText({
model: gateway(body.model || "anthropic/claude-haiku-4.5"),
prompt,
}).then((r) => r.text)
)
);
return Response.json({ results });
}
if (body.messages) {
const { text } = await generateText({
model: gateway(body.model || "anthropic/claude-sonnet-4-5"),
messages: body.messages,
});
return Response.json({ text });
}
const { text } = await generateText({
model: gateway(body.model || "anthropic/claude-haiku-4.5"),
prompt: body.prompt,
});
return Response.json({ text });
}The route handles three request shapes:
- Messages array — main RLM loop calls (uses the strong model)
- Prompt string — sub-LLM calls from the sandbox (uses the cheap model)
- Batch — parallel sub-LLM calls
2. Add required headers
SharedArrayBuffer requires these HTTP headers:
// next.config.mjs
headers: async () => [
{
source: "/(.*)",
headers: [
{ key: "Cross-Origin-Opener-Policy", value: "same-origin" },
{ key: "Cross-Origin-Embedder-Policy", value: "require-corp" },
],
},
],3. Run it
import { RLM } from "rlm-ts";
const rlm = new RLM({
endpoint: "/api/rlm",
tools: {
users: {
tool: [{ name: "Alice", age: 30 }, { name: "Bob", age: 25 }],
description: "User records: {name, age}",
},
},
onIteration: (iteration) => {
console.log(`Iteration ${iteration.index + 1}:`, iteration.response);
},
});
const result = await rlm.run("Who is the oldest user?");
console.log(result.answer);API
new RLM(options)
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| endpoint | string | required | URL of your LLM proxy route |
| model | string | "anthropic/claude-sonnet-4-5" | Model for the main RLM loop |
| subModel | string | "anthropic/claude-haiku-4.5" | Model for sub-LLM calls |
| maxIterations | number | 20 | Max loop iterations before forcing a final answer |
| maxOutputChars | number | 20000 | Truncate execution output beyond this length |
| maxTimeout | number | — | Max total run time in ms |
| maxErrors | number | 5 | Stop after this many consecutive sandbox errors |
| systemPrompt | string | Built-in prompt | Override the system prompt |
| tools | Record<string, ToolDefinition> | {} | Data injected into the sandbox |
| onIteration | (iteration: RLMIteration) => void | — | Callback after each iteration |
| onSubLLMCall | (req: { prompt, model? }) => void | — | Callback when a sub-LLM call is made |
rlm.run(prompt): Promise<RLMResult>
Returns:
interface RLMResult {
answer: string; // The final answer
iterations: RLMIteration[]; // Full iteration history
totalTime: number; // Total execution time in ms
}Tools
Tools are data values injected into the sandbox as global variables. The model can access them by name in its code.
tools: {
// With description (shown in system prompt)
my_data: {
tool: [1, 2, 3],
description: "An array of numbers",
},
// Plain value
config: { threshold: 0.5 },
}Tools must be JSON-serializable (no functions — they can't be sent to a Web Worker via postMessage).
Architecture
Why eval() in a Web Worker?
The Python RLM uses exec() with restricted builtins. We use eval() in a Web Worker, which gives us:
- Variable persistence —
vardeclarations ineval()become Worker globals and survive acrossexecuteCode()calls. This is critical for the RLM loop where the model builds up state incrementally. - Natural sandboxing — Web Workers have no DOM, no
document, nowindow, nolocalStorage, no cookies. Network access (fetch,XMLHttpRequest) is explicitly blocked. - No infrastructure — the Python version needs Docker containers or Modal sandboxes. The Worker IS the sandbox.
Why SharedArrayBuffer + Atomics?
The model writes synchronous code (var result = llm_query("...")), but LLM calls require async fetch(). Atomics.wait() blocks the Worker thread while the main thread handles the fetch and writes the response to shared memory. This lets the model write natural code without async/await.
This requires Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers.
Differences from the Python implementation
| | Python RLM | rlm-ts |
|---|---|---|
| Sandbox | exec() + restricted builtins, Docker, Modal | Web Worker + eval() |
| IPC | TCP sockets (LMHandler) | postMessage + SharedArrayBuffer |
| State persistence | dill serialization, self.locals dict | Worker globals via eval() |
| Sub-LLM sync | Blocking TCP request | Atomics.wait() |
| Recursion | rlm_query() spawns child RLM with own REPL | Not yet (v0.1) |
| Filesystem | tempfile, open(), os | None (browser) |
Development
npm run build # Build worker + main package
npm run sync # Build and copy to ../dashboard/node_modules/rlm-ts
npm test # Run testsLicense
MIT
