kado-rlm
v0.1.1
Published
Recursive Language Model library for handling arbitrarily long contexts
Maintainers
Readme
Kado RLM - Recursive Language Model Library
A production-ready Node.js/TypeScript library implementing Recursive Language Models (RLMs) for handling arbitrarily long contexts. Based on the RLM research paper, this library enables LLMs to process inputs up to two orders of magnitude beyond their native context windows.
Features
- RLM Orchestration: Treats long prompts as external environment data, allowing LLMs to programmatically examine, decompose, and recursively call themselves over context snippets
- Pluggable Tool System: Register any RAG, knowledge base, database, or API as callable functions
- Multi-Provider Support: OpenAI, Anthropic, and Google AI out of the box
- Secure Sandbox: V8 isolates for safe execution of LLM-generated code
- Full Observability: Prometheus metrics, Loki logging, and Tempo tracing via Grafana stack
- Built-in Benchmarking: Compare RLM performance against base LLM calls
- Production Ready: Circuit breakers, retry logic, rate limiting, and health checks
Installation
As a Library (Recommended)
npm install kado-rlm
# or
pnpm add kado-rlm
# or
yarn add kado-rlmFrom Source
git clone https://github.com/your-org/kado-rlm.git
cd kado-rlm
pnpm install
pnpm buildQuick Start
Library Usage
import {
RLMOrchestrator,
ContextManager,
createLLMClient,
defineTools
} from 'kado-rlm';
// 1. Create an LLM client
const llmClient = createLLMClient('openai', { model: 'gpt-4o' });
// 2. Create a context manager with your long content
const contextManager = new ContextManager(yourLongDocument);
// 3. (Optional) Define custom tools for RAG, databases, etc.
const tools = defineTools([
{
name: 'search_docs',
description: 'Search the knowledge base for relevant information',
parameters: [
{ name: 'query', type: 'string', description: 'Search query', required: true },
{ name: 'limit', type: 'number', description: 'Max results', default: 5 },
],
handler: async (query: string, limit = 5) => {
return await yourVectorDB.search(query, { topK: limit });
},
},
]);
// 4. Create the orchestrator
const orchestrator = new RLMOrchestrator({
llmClient,
contextManager,
customTools: tools,
maxIterations: 20,
maxDepth: 2,
});
// 5. Run!
const result = await orchestrator.run('What are the key findings in this document?');
console.log(result.answer);
console.log(`Completed in ${result.usage.iterations} iterations`);Running as a Service
# Set up environment
cp env.example .env
# Edit .env with your API keys
# Start development server
pnpm dev
# Or production
pnpm build
pnpm startThen make HTTP requests:
curl -X POST http://localhost:3000/v1/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is the secret code mentioned in the text?",
"context": "... your long context here ...",
"provider": "openai",
"model": "gpt-4o"
}'Custom Tools
The pluggable tool system lets you register any external service as a function the LLM can call during reasoning.
Defining Tools
import { defineTools } from 'kado-rlm';
const tools = defineTools([
// RAG / Vector Search
{
name: 'rag_search',
description: 'Search the vector database for semantically similar documents',
parameters: [
{ name: 'query', type: 'string', description: 'Natural language search query', required: true },
{ name: 'topK', type: 'number', description: 'Number of results to return', default: 10 },
],
returns: 'Array of { content, score, metadata }',
handler: async (query: string, topK = 10) => {
const embedding = await embeddings.embed(query);
return await pinecone.query({ vector: embedding, topK });
},
},
// Database Lookup
{
name: 'get_customer',
description: 'Fetch customer details from the database',
parameters: [
{ name: 'customerId', type: 'string', description: 'Customer ID', required: true },
],
handler: async (customerId: string) => {
return await db.customers.findById(customerId);
},
},
// External API
{
name: 'check_weather',
description: 'Get current weather for a location',
parameters: [
{ name: 'city', type: 'string', description: 'City name', required: true },
],
handler: async (city: string) => {
const response = await fetch(`https://api.weather.com/v1/current?city=${city}`);
return response.json();
},
},
]);The LLM can then use these tools in its generated code:
// LLM-generated sandbox code
const docs = await rag_search("authentication flow", 5);
const customer = await get_customer("cust_12345");
for (const doc of docs) {
print(`Found: ${doc.content.slice(0, 100)}...`);
}
giveFinalAnswer({
message: "Based on the documentation and customer data...",
data: { sources: docs.map(d => d.metadata.source) }
});See the Tools Guide for detailed documentation on registering tools, and the RAG Integration Guide for RAG-specific patterns.
API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| POST | /v1/completion | Run RLM completion with context |
| POST | /v1/chat | Direct LLM call (baseline comparison) |
| POST | /v1/benchmark | Start benchmark run |
| GET | /v1/benchmark/:id | Get benchmark results |
| GET | /v1/models | List available models |
| GET | /health | Liveness probe |
| GET | /ready | Readiness probe |
| GET | /metrics | Prometheus metrics |
| GET | /docs | Swagger UI documentation |
Benchmarking
The built-in benchmark system compares RLM performance against direct LLM calls.
Via API
curl -X POST http://localhost:3000/v1/benchmark \
-H "Content-Type: application/json" \
-d '{
"tasks": ["sniah", "multi-niah", "aggregation"],
"sizes": [8000, 16000, 32000, 64000],
"provider": "openai",
"model": "gpt-4o",
"runs": 3
}'Via CLI
# Run benchmark suite
pnpm benchmark --tasks sniah,aggregation --sizes 8000,16000,32000 --provider openai --model gpt-4o
# Output as JSON
pnpm benchmark --output json > results.jsonTask Types
| Task | Description | Complexity | |------|-------------|------------| | sniah | Single needle-in-haystack | Constant | | multi-niah | Multiple needles | Linear | | aggregation | Count/sum across context | Linear | | pairwise | Find matching pairs | Quadratic |
Observability
Local Development with Grafana Stack
cd docker
docker-compose up -d
# Access services:
# - Kado RLM: http://localhost:3000
# - Grafana: http://localhost:3001 (admin/admin)
# - Prometheus: http://localhost:9090Metrics
Key metrics exposed at /metrics:
rlm_request_duration_seconds- Request latency histogramrlm_iterations_total- RLM iteration countrlm_recursion_depth- Recursion depth distributionrlm_tokens_total- Token usage by typerlm_errors_total- Error counts by typerlm_circuit_breaker_state- Circuit breaker status
Logging
Structured JSON logs with correlation IDs. In development, pretty-printed via pino-pretty.
Tracing
OpenTelemetry traces exported to Tempo:
- Span per API request
- Child spans for LLM calls, sandbox executions, recursive sub-calls
- Automatic trace ID propagation
Configuration
Configure via environment variables (see env.example):
# Server
PORT=3000
NODE_ENV=development
# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4o
# RLM Settings
MAX_ITERATIONS=20
MAX_RECURSION_DEPTH=3
SANDBOX_TIMEOUT_MS=5000
SANDBOX_MEMORY_MB=128
# Observability
METRICS_ENABLED=true
LOKI_ENABLED=false
TRACING_ENABLED=falseArchitecture
┌─────────────────────────────────────────────────────────┐
│ API Layer │
│ (Fastify + Rate Limiting + Auth + Swagger) │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ RLM Orchestrator │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Context │ │ Sandbox │ │ Custom │ │
│ │ Manager │ │ (V8 Isolate)│ │ Tools │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ LLM Providers │
│ ┌────────┐ ┌───────────┐ ┌────────┐ │
│ │ OpenAI │ │ Anthropic │ │ Google │ │
│ └────────┘ └───────────┘ └────────┘ │
└─────────────────────────────────────────────────────────┘Development
# Install dependencies
pnpm install
# Run in development mode
pnpm dev
# Type check
pnpm typecheck
# Run tests
pnpm test
# Run tests with coverage
pnpm test:coverage
# Build for production
pnpm build
# Start production server
pnpm startPublishing to npm
First-Time Setup
Create an npm account at npmjs.com
Login to npm:
npm loginUpdate
package.json:- Change
nameifkado-rlmis taken (e.g.,@your-org/kado-rlm) - Update
repository.urlto your actual repo - Set
authorfield
- Change
Publishing
# 1. Make sure tests pass
pnpm test:run
# 2. Build the package
pnpm build
# 3. Verify what will be published
npm pack --dry-run
# 4. Publish (first time)
npm publish
# 5. For scoped packages (@your-org/kado-rlm)
npm publish --access publicVersioning
# Patch release (bug fixes): 0.1.0 → 0.1.1
npm version patch
# Minor release (new features): 0.1.0 → 0.2.0
npm version minor
# Major release (breaking changes): 0.1.0 → 1.0.0
npm version major
# Then publish
npm publishAutomated Publishing (GitHub Actions)
Create .github/workflows/publish.yml:
name: Publish to npm
on:
release:
types: [created]
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v2
with:
version: 8
- uses: actions/setup-node@v4
with:
node-version: '20'
registry-url: 'https://registry.npmjs.org'
- run: pnpm install
- run: pnpm test:run
- run: pnpm build
- run: npm publish
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}Add your npm token as a GitHub secret named NPM_TOKEN.
Production Deployment
Docker
# Build image
docker build -f docker/Dockerfile -t kado-rlm .
# Run container
docker run -p 3000:3000 \
-e OPENAI_API_KEY=sk-... \
-e NODE_ENV=production \
kado-rlmHealth Checks
/health- Basic liveness (process running)/ready- Readiness (providers configured, memory OK)
Configure Kubernetes probes:
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5Stability Features
Circuit Breaker
Automatic circuit breaking for LLM provider failures:
- Opens after 5 consecutive failures
- Half-open after 30s cooldown
- Per-provider tracking
Retry Logic
Exponential backoff with jitter for transient errors:
- 3 retries by default
- Handles rate limits, timeouts, 5xx errors
Resource Limits
| Resource | Default | Configurable | |----------|---------|--------------| | Max iterations | 20 | Yes | | Max recursion depth | 3 | Yes | | Sandbox CPU time | 5s | Yes | | Sandbox memory | 128MB | Yes | | Request timeout | 300s | Yes | | Max context size | 10MB | Yes |
References
- Recursive Language Models (RLM) Paper
- RLM Research Paper PDF
- Technical Design Document
- Tools Guide — Detailed guide on registering custom tools
- RAG Integration Guide — Patterns for RAG integration
License
MIT
