kado-rlm

v0.1.1

Published

23 days ago

Recursive Language Model library for handling arbitrarily long contexts

0High
0Medium
0Low

adfury

llm recursive language-model long-context ai rag retrieval-augmented-generation openai anthropic google-ai

Kado RLM - Recursive Language Model Library

A production-ready Node.js/TypeScript library implementing Recursive Language Models (RLMs) for handling arbitrarily long contexts. Based on the RLM research paper, this library enables LLMs to process inputs up to two orders of magnitude beyond their native context windows.

Features

RLM Orchestration: Treats long prompts as external environment data, allowing LLMs to programmatically examine, decompose, and recursively call themselves over context snippets
Pluggable Tool System: Register any RAG, knowledge base, database, or API as callable functions
Multi-Provider Support: OpenAI, Anthropic, and Google AI out of the box
Secure Sandbox: V8 isolates for safe execution of LLM-generated code
Full Observability: Prometheus metrics, Loki logging, and Tempo tracing via Grafana stack
Built-in Benchmarking: Compare RLM performance against base LLM calls
Production Ready: Circuit breakers, retry logic, rate limiting, and health checks

Installation

As a Library (Recommended)

npm install kado-rlm
# or
pnpm add kado-rlm
# or
yarn add kado-rlm

From Source

git clone https://github.com/your-org/kado-rlm.git
cd kado-rlm
pnpm install
pnpm build

Quick Start

Library Usage

import { 
  RLMOrchestrator, 
  ContextManager, 
  createLLMClient, 
  defineTools 
} from 'kado-rlm';

// 1. Create an LLM client
const llmClient = createLLMClient('openai', { model: 'gpt-4o' });

// 2. Create a context manager with your long content
const contextManager = new ContextManager(yourLongDocument);

// 3. (Optional) Define custom tools for RAG, databases, etc.
const tools = defineTools([
  {
    name: 'search_docs',
    description: 'Search the knowledge base for relevant information',
    parameters: [
      { name: 'query', type: 'string', description: 'Search query', required: true },
      { name: 'limit', type: 'number', description: 'Max results', default: 5 },
    ],
    handler: async (query: string, limit = 5) => {
      return await yourVectorDB.search(query, { topK: limit });
    },
  },
]);

// 4. Create the orchestrator
const orchestrator = new RLMOrchestrator({
  llmClient,
  contextManager,
  customTools: tools,
  maxIterations: 20,
  maxDepth: 2,
});

// 5. Run!
const result = await orchestrator.run('What are the key findings in this document?');

console.log(result.answer);
console.log(`Completed in ${result.usage.iterations} iterations`);

Running as a Service

# Set up environment
cp env.example .env
# Edit .env with your API keys

# Start development server
pnpm dev

# Or production
pnpm build
pnpm start

Then make HTTP requests:

curl -X POST http://localhost:3000/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is the secret code mentioned in the text?",
    "context": "... your long context here ...",
    "provider": "openai",
    "model": "gpt-4o"
  }'

Custom Tools

The pluggable tool system lets you register any external service as a function the LLM can call during reasoning.

Defining Tools

import { defineTools } from 'kado-rlm';

const tools = defineTools([
  // RAG / Vector Search
  {
    name: 'rag_search',
    description: 'Search the vector database for semantically similar documents',
    parameters: [
      { name: 'query', type: 'string', description: 'Natural language search query', required: true },
      { name: 'topK', type: 'number', description: 'Number of results to return', default: 10 },
    ],
    returns: 'Array of { content, score, metadata }',
    handler: async (query: string, topK = 10) => {
      const embedding = await embeddings.embed(query);
      return await pinecone.query({ vector: embedding, topK });
    },
  },

  // Database Lookup
  {
    name: 'get_customer',
    description: 'Fetch customer details from the database',
    parameters: [
      { name: 'customerId', type: 'string', description: 'Customer ID', required: true },
    ],
    handler: async (customerId: string) => {
      return await db.customers.findById(customerId);
    },
  },

  // External API
  {
    name: 'check_weather',
    description: 'Get current weather for a location',
    parameters: [
      { name: 'city', type: 'string', description: 'City name', required: true },
    ],
    handler: async (city: string) => {
      const response = await fetch(`https://api.weather.com/v1/current?city=${city}`);
      return response.json();
    },
  },
]);

The LLM can then use these tools in its generated code:

// LLM-generated sandbox code
const docs = await rag_search("authentication flow", 5);
const customer = await get_customer("cust_12345");

for (const doc of docs) {
  print(`Found: ${doc.content.slice(0, 100)}...`);
}

giveFinalAnswer({
  message: "Based on the documentation and customer data...",
  data: { sources: docs.map(d => d.metadata.source) }
});

See the Tools Guide for detailed documentation on registering tools, and the RAG Integration Guide for RAG-specific patterns.

API Endpoints

| Method | Path | Description | |--------|------|-------------| | POST | /v1/completion | Run RLM completion with context | | POST | /v1/chat | Direct LLM call (baseline comparison) | | POST | /v1/benchmark | Start benchmark run | | GET | /v1/benchmark/:id | Get benchmark results | | GET | /v1/models | List available models | | GET | /health | Liveness probe | | GET | /ready | Readiness probe | | GET | /metrics | Prometheus metrics | | GET | /docs | Swagger UI documentation |

Benchmarking

The built-in benchmark system compares RLM performance against direct LLM calls.

Via API

curl -X POST http://localhost:3000/v1/benchmark \
  -H "Content-Type: application/json" \
  -d '{
    "tasks": ["sniah", "multi-niah", "aggregation"],
    "sizes": [8000, 16000, 32000, 64000],
    "provider": "openai",
    "model": "gpt-4o",
    "runs": 3
  }'

Via CLI

# Run benchmark suite
pnpm benchmark --tasks sniah,aggregation --sizes 8000,16000,32000 --provider openai --model gpt-4o

# Output as JSON
pnpm benchmark --output json > results.json

Task Types

| Task | Description | Complexity | |------|-------------|------------| | sniah | Single needle-in-haystack | Constant | | multi-niah | Multiple needles | Linear | | aggregation | Count/sum across context | Linear | | pairwise | Find matching pairs | Quadratic |

Observability

Local Development with Grafana Stack

cd docker
docker-compose up -d

# Access services:
# - Kado RLM: http://localhost:3000
# - Grafana: http://localhost:3001 (admin/admin)
# - Prometheus: http://localhost:9090

Metrics

Key metrics exposed at /metrics:

rlm_request_duration_seconds - Request latency histogram
rlm_iterations_total - RLM iteration count
rlm_recursion_depth - Recursion depth distribution
rlm_tokens_total - Token usage by type
rlm_errors_total - Error counts by type
rlm_circuit_breaker_state - Circuit breaker status

Logging

Structured JSON logs with correlation IDs. In development, pretty-printed via pino-pretty.

Tracing

OpenTelemetry traces exported to Tempo:

Span per API request
Child spans for LLM calls, sandbox executions, recursive sub-calls
Automatic trace ID propagation

Configuration

Configure via environment variables (see env.example):

# Server
PORT=3000
NODE_ENV=development

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4o

# RLM Settings
MAX_ITERATIONS=20
MAX_RECURSION_DEPTH=3
SANDBOX_TIMEOUT_MS=5000
SANDBOX_MEMORY_MB=128

# Observability
METRICS_ENABLED=true
LOKI_ENABLED=false
TRACING_ENABLED=false

Architecture

┌─────────────────────────────────────────────────────────┐
│                      API Layer                          │
│  (Fastify + Rate Limiting + Auth + Swagger)            │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                   RLM Orchestrator                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │   Context   │  │   Sandbox   │  │   Custom    │     │
│  │   Manager   │  │ (V8 Isolate)│  │   Tools     │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                    LLM Providers                         │
│  ┌────────┐  ┌───────────┐  ┌────────┐                 │
│  │ OpenAI │  │ Anthropic │  │ Google │                 │
│  └────────┘  └───────────┘  └────────┘                 │
└─────────────────────────────────────────────────────────┘

Development

# Install dependencies
pnpm install

# Run in development mode
pnpm dev

# Type check
pnpm typecheck

# Run tests
pnpm test

# Run tests with coverage
pnpm test:coverage

# Build for production
pnpm build

# Start production server
pnpm start

Publishing to npm

First-Time Setup

Create an npm account at npmjs.com
Login to npm:
```
npm login
```
Update package.json:
- Change name if kado-rlm is taken (e.g., @your-org/kado-rlm)
- Update repository.url to your actual repo
- Set author field

Publishing

# 1. Make sure tests pass
pnpm test:run

# 2. Build the package
pnpm build

# 3. Verify what will be published
npm pack --dry-run

# 4. Publish (first time)
npm publish

# 5. For scoped packages (@your-org/kado-rlm)
npm publish --access public

Versioning

# Patch release (bug fixes): 0.1.0 → 0.1.1
npm version patch

# Minor release (new features): 0.1.0 → 0.2.0
npm version minor

# Major release (breaking changes): 0.1.0 → 1.0.0
npm version major

# Then publish
npm publish

Automated Publishing (GitHub Actions)

Create .github/workflows/publish.yml:

name: Publish to npm

on:
  release:
    types: [created]

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
        with:
          version: 8
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          registry-url: 'https://registry.npmjs.org'
      - run: pnpm install
      - run: pnpm test:run
      - run: pnpm build
      - run: npm publish
        env:
          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

Add your npm token as a GitHub secret named NPM_TOKEN.

Production Deployment

Docker

# Build image
docker build -f docker/Dockerfile -t kado-rlm .

# Run container
docker run -p 3000:3000 \
  -e OPENAI_API_KEY=sk-... \
  -e NODE_ENV=production \
  kado-rlm

Health Checks

/health - Basic liveness (process running)
/ready - Readiness (providers configured, memory OK)

Configure Kubernetes probes:

livenessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 5

Stability Features

Circuit Breaker

Automatic circuit breaking for LLM provider failures:

Opens after 5 consecutive failures
Half-open after 30s cooldown
Per-provider tracking

Retry Logic

Exponential backoff with jitter for transient errors:

3 retries by default
Handles rate limits, timeouts, 5xx errors

Resource Limits

| Resource | Default | Configurable | |----------|---------|--------------| | Max iterations | 20 | Yes | | Max recursion depth | 3 | Yes | | Sandbox CPU time | 5s | Yes | | Sandbox memory | 128MB | Yes | | Request timeout | 300s | Yes | | Max context size | 10MB | Yes |

References

Recursive Language Models (RLM) Paper
RLM Research Paper PDF
Technical Design Document
Tools Guide — Detailed guide on registering custom tools
RAG Integration Guide — Patterns for RAG integration

License

MIT