npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@automagik/rlmx

v0.260331.5

Published

RLM algorithm CLI for coding agents — prompt externalization, Python REPL with symbolic recursion, code-driven navigation

Downloads

2,392

Readme

rlmx

RLM algorithm CLI for coding agents — prompt externalization, Python REPL with symbolic recursion, code-driven navigation.

Based on the RLM paper (REPL-based LLM Method). Uses pi/ai as the multi-provider LLM client.

Install

npm install -g rlmx

Quick Start

# Scaffold config files in current directory
rlmx init

# Run a query
rlmx "What is the meaning of life?"

# Query with context (directory of docs)
rlmx "How does IPC work?" --context ./docs/

# Query with a single file as context
rlmx "Summarize this paper" --context paper.md --output json

# Pipe data in
cat data.csv | rlmx "Analyze this dataset"

How It Works

rlmx implements the RLM (REPL-LM) algorithm:

  1. Prompt externalization — Your context (files, directories) is loaded into a Python REPL as the context variable. Only metadata (type, size, chunk lengths) appears in the LLM message history. The LLM never sees the raw context in its messages.

  2. Iterative REPL loop — The LLM writes Python code in ```repl``` blocks. rlmx executes each block in a persistent Python subprocess, feeds results back, and the LLM iterates until it calls FINAL() or FINAL_VAR().

  3. Recursive sub-calls — Inside REPL code, the LLM can call:

    • llm_query(prompt) — single LLM completion (fast, one-shot)
    • llm_query_batched(prompts) — concurrent LLM calls
    • rlm_query(prompt) — spawn a child RLM session (full iterative loop)
    • rlm_query_batched(prompts) — parallel child RLM sessions
  4. Termination — The loop ends when the LLM calls FINAL("answer") or FINAL_VAR("variable_name"), or when max iterations (default 30) is reached.

CAG Mode (Cache-Augmented Generation)

CAG mode bakes your full context into the system prompt and leverages provider-level caching so that subsequent queries against the same context are dramatically cheaper and faster.

When to use --cache vs default RLM

| Mode | Best for | How it works | |------|----------|-------------| | Default (RLM) | Large corpora, exploratory analysis | Context loaded into REPL context variable; LLM navigates it programmatically | | --cache | Repeated questions on same docs, study sessions, batch Q&A | Full context injected into system prompt and cached at the provider |

Use --cache when you plan to ask multiple questions about the same set of documents. Use default RLM when the context is too large for a single system prompt or you need programmatic navigation.

Cost comparison

| Query | Cost | |-------|------| | First query (cache miss) | Full input token cost (context + prompt) | | Subsequent queries (cache hit) | 50-90% cheaper -- only cache-read tokens are billed |

The exact savings depend on your provider. Google and Anthropic both offer significant discounts on cached input tokens.

Batch usage

Process a list of questions against cached context:

rlmx batch questions.txt --context ./docs/
rlmx batch questions.txt --context ./docs/ --output json

Each question in the file is run sequentially, reusing the cached context. The first question pays full cost; subsequent questions benefit from the cache.

Cache warmup and estimation

Warm the cache and estimate costs before running queries:

rlmx cache --context ./docs/ --estimate

This loads your context, calculates token counts, and shows estimated costs for cached vs uncached queries without making any LLM calls.

YAML configuration

Enable cache in your rlmx.yaml:

cache:
  enabled: true              # or use --cache flag per-invocation
  retention: long            # short|long -- maps to provider cache retention
  ttl: 3600                  # seconds -- provider-specific TTL
  expire-time: ""            # ISO 8601 -- for Google explicit caching
  session-prefix: "myproject" # prepended to content hash for sessionId

For detailed provider-specific TTL behavior (Google, Anthropic, Bedrock, OpenAI), see docs/TTL_CONTROL.md.

Gemini 3 Native (v0.4)

rlmx v0.4 integrates 14 Gemini 3 native features, making it the cheapest and most capable context agent available. All features are opt-in, additive, and silently ignored on non-Google providers.

Quick Start

# rlmx.yaml
model:
  provider: google
  model: gemini-3.1-flash-lite-preview

gemini:
  thinking-level: medium      # Control thinking depth
  google-search: true          # Web search in REPL
  url-context: true            # Fetch URLs in REPL
  code-execution: true         # Server-side Python
  media-resolution:
    images: high               # ~1120 tokens/image
    pdfs: medium               # ~560 tokens/page
    video: low                 # ~70 tokens/frame
rlmx "Research latest AI developments" --context ./notes/ --tools standard --thinking high

Features

| Feature | Config | CLI Flag | Description | |---------|--------|----------|-------------| | Thinking levels | gemini.thinking-level | --thinking | minimal/low/medium/high — controls reasoning depth | | Thought signatures | automatic | — | Multi-turn quality via pi/ai signature circulation | | Structured output | output.schema | — | JSON Schema enforcement via API (not text parsing) | | Google Search | gemini.google-search | — | web_search() battery in REPL | | URL Context | gemini.url-context | — | fetch_url() battery in REPL | | Code Execution | gemini.code-execution | — | Server-side Python alongside local REPL | | Image Generation | gemini.image-gen | — | generate_image() via Nano Banana | | Media Resolution | gemini.media-resolution | — | Per-type token cost control | | Batch API | — | --batch-api | 50% cost reduction for bulk operations | | Context Caching | cache.enabled | --cache | 90% discount on cached tokens | | Computer Use | gemini.computer-use | — | Planned for v0.5 | | Maps Grounding | gemini.maps-grounding | — | Planned for v0.5 | | File Search | gemini.file-search | — | Planned for v0.5 | | Function + Tools | automatic | — | Custom functions + built-in tools in one API call |

Cost Comparison

| Mode | Cost (per 1M tokens) | Savings | |------|---------------------|---------| | Base (flash-lite) | $0.075 input / $0.30 output | — | | + Context caching | ~$0.0075 input (cached) | 90% on input | | + Batch API | ~$0.0375 input / $0.15 output | 50% on all | | Cache + Batch | ~$0.00375 input (cached+batch) | 95% on cached input |

100 queries over 500K context: < $2.00 with cache + batch stacking.

Provider Compatibility

| Feature | Google | Anthropic | OpenAI | Others | |---------|--------|-----------|--------|--------| | Thinking levels | native | ignored | ignored | ignored | | Thought signatures | native | ignored | ignored | ignored | | Structured output | API-enforced | FINAL() fallback | FINAL() fallback | FINAL() fallback | | Web search/URL | native | error msg | error msg | error msg | | Code execution | native | local only | local only | local only | | Media resolution | native | ignored | ignored | ignored | | Batch API | native | standard batch | standard batch | standard batch | | Context caching | native | native | native | provider-dependent |

Gemini Batteries (REPL Functions)

Available with --tools standard or --tools full when provider is Google:

# In REPL code:
result = web_search("latest nodejs version")
print(result)

page = fetch_url("https://example.com/docs")
print(page[:500])

img_path = generate_image("architecture diagram of microservices")
print(img_path)

Non-Google providers get clear error messages: "web_search() requires provider: google".

Examples

See examples/ for complete configs:

  • gemini-research/ — Web search + URL context research agent
  • gemini-multimodal/ — Media resolution + image analysis
  • gemini-cheap-batch/ — Maximum cost stacking example

Config Files

Drop .md files in your working directory to customize behavior. Run rlmx init to scaffold defaults with inline comments.

| File | Purpose | |------|---------| | SYSTEM.md | System prompt sent to the LLM. Default: exact RLM paper prompt. | | CONTEXT.md | Context loading documentation (informational). | | TOOLS.md | Custom Python functions injected into the REPL namespace. | | CRITERIA.md | Output format criteria appended to the system prompt. | | MODEL.md | LLM provider and model selection. |

TOOLS.md Format

Define custom REPL tools as ## heading + python code block:

## search_docs
` ``python
def search_docs(keyword):
    """Search context for files matching keyword."""
    matches = [item for item in context if keyword.lower() in item['content'].lower()]
    return [m['path'] for m in matches]
` ``

## summarize_chunk
` ``python
def summarize_chunk(text, max_words=100):
    """Summarize a chunk of text."""
    return llm_query(f"Summarize in {max_words} words:\n{text}")
` ``

MODEL.md Format

provider: google
model: gemini-3.1-flash-lite-preview
sub-call-model: gemini-3.1-flash-lite-preview

Supports any provider available in pi/ai: anthropic, openai, google, etc.

CLI Reference

rlmx "query" [options]                Run an RLM query
rlmx init [--dir <path>]             Scaffold config files
rlmx batch <file> [options]           Run batch queries from a file
rlmx cache [options]                  Cache management (warmup, estimate)

Options:
  --context <path>        Path to context (directory or file)
  --cache                 Enable CAG mode (cache context in system prompt)
  --output <mode>         Output mode: text (default), json, stream
  --verbose               Show iteration progress on stderr
  --max-iterations <n>    Maximum RLM iterations (default: 30)
  --timeout <ms>          Timeout in milliseconds (default: 300000)
  --dir <path>            Directory for init command (default: cwd)
  --help, -h              Show this help message
  --version, -v           Show version

Gemini options:
  --thinking <level>      Thinking level: minimal, low, medium, high
  --batch-api             Use Gemini Batch API for 50% cost reduction

Cache options:
  --estimate              Estimate cache costs without making LLM calls
  --session-prefix <str>  Override session prefix for cache key

Output Modes

Text (default)

Prints the final answer to stdout.

JSON

rlmx "query" --output json

Returns:

{
  "answer": "The answer to your query...",
  "references": ["docs/start/create-project.md", "docs/concept/inter-process-communication.md"],
  "usage": { "inputTokens": 12500, "outputTokens": 3200, "llmCalls": 5 },
  "iterations": 3,
  "model": "google/gemini-3.1-flash-lite-preview"
}

Stream

rlmx "query" --output stream

Emits JSONL events per iteration, then a final event.

Context Loading

| Input | Behavior | |-------|----------| | --context dir/ | Recursively reads *.md files as list[{path, content}] | | --context file.md | Reads as single string | | --context file.json | Parses JSON as dict or list | | stdin pipe | Reads as single string |

Environment Variables

rlmx uses pi/ai for LLM calls. Set the appropriate API key for your provider:

  • GEMINI_API_KEY — for Google Gemini models (default provider)
  • ANTHROPIC_API_KEY — for Anthropic models
  • OPENAI_API_KEY — for OpenAI models

Programmatic API

import { rlmLoop, loadConfig, loadContext } from "rlmx";

const config = await loadConfig("./");
const context = await loadContext("./docs/");

const result = await rlmLoop("How does IPC work?", context, config, {
  maxIterations: 10,
  timeout: 60000,
  verbose: false,
  output: "json",
});

console.log(result.answer);
console.log(result.references);

Requirements

  • Node.js >= 18
  • Python 3.10+ (for the REPL subprocess)
  • An LLM API key (Anthropic, OpenAI, Google, etc.)

License

MIT