@luckydraw/cumulus

v0.30.27

Published

9 days ago

RLM-based CLI chat wrapper for Claude with external history context management

0High
0Medium
0Low

soapko

claude cli chat rlm context mcp

Cumulus

A self-hosted multi-channel AI gateway built around Claude and other LLMs. Runs as a long-lived daemon, speaks to you through a web chat widget, Slack, Discord, iOS push, email webhooks, and more — with unlimited conversation context via the Recursive Language Model (RLM) pattern.

Originally a CLI wrapper for Claude, Cumulus has grown into a full gateway platform that coordinates agents, channels, models, and federated deployments.

What you get

Gateway daemon (cumulus-gateway) — HTTP + WebSocket server with per-thread conversations, streaming responses, and an admin API.
Web chat widget — embeddable /chat interface with voice mode, push notifications, media uploads, and rich blex block rendering (tables, forms, charts, kanban, diagrams).
Channel adapters — Slack and Discord bots, inbound email webhooks (Resend), and generic HTTP webhooks — all injecting into the same thread model.
Inter-agent messaging — threads can talk to each other via send_to_agent, with support for CC/BCC visibility.
Federation — hub-and-spoke mesh so agents on different machines can message each other across NATs.
Per-thread model selection — Claude (via CLI) or any HuggingFace model (GLM-5, Kimi-K2.5, Qwen3, etc.) with tool calling.
Scheduled triggers, email, push, media serving — built-in MCP tools so agents can send emails, schedule themselves, notify you, and upload files.
Unlimited history — JSONL per thread + vector store + adaptive context budget. Conversations never truncate.
Classic CLI (cumulus) — terminal chat for individual threads, backed by the same history store.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Clients                                                     │
│                                                              │
│   /chat (web)   Slack   Discord   Email   CLI   Push (PWA)   │
│        │         │       │        │       │      │           │
│        └─────────┴───────┴────────┴───────┴──────┘           │
│                          │                                   │
│                          ▼                                   │
│              ┌──────────────────────┐                        │
│              │  cumulus-gateway     │                        │
│              │  (HTTP / WS daemon)  │                        │
│              └──────────┬───────────┘                        │
│                         │                                    │
│     ┌───────────────────┼──────────────────────┐             │
│     ▼                   ▼                      ▼             │
│ ┌─────────┐      ┌──────────────┐      ┌───────────────┐     │
│ │ Thread  │      │ Model router │      │  Federation   │     │
│ │ store   │      │ Claude / HF  │      │  hub/spoke    │     │
│ │ (JSONL) │      │ MCP tools    │      │  (WSS mesh)   │     │
│ └────┬────┘      └──────┬───────┘      └───────────────┘     │
│      │                  │                                    │
│      ▼                  ▼                                    │
│  ~/.cumulus/       Claude CLI                                │
│  threads/          HuggingFace API                           │
│  content/          MCP stdio + in-process                    │
│  media/                                                      │
└──────────────────────────────────────────────────────────────┘

Every turn is a fresh model invocation. The gateway assembles a context budget from recent messages + RAG retrieval against the thread's history and content store, then streams the response back to the originating channel.

Installation

Requires Node 20+.

npm install -g @luckydraw/cumulus

This installs three binaries:

| Command | Purpose | | ----------------- | ------------------------------------------------- | | cumulus | Terminal chat client for a single thread | | cumulus-mcp | MCP server exposing history/content tools (stdio) | | cumulus-gateway | Long-running daemon (HTTP + WebSocket + adapters) |

Quick start — gateway

# Interactive setup: detects project directories, installs a service, generates keys.
cumulus-gateway setup

# Or non-interactive:
cumulus-gateway setup --project-root ~/projects --port 8080

# Start / stop / reload (if you skip the service install):
cumulus-gateway start
cumulus-gateway stop
cumulus-gateway reload     # SIGHUP — drains active streams before restart

Setup writes ~/.cumulus/gateway.config.json, generates VAPID keys for push, scaffolds a systemd (Linux) or LaunchAgent (macOS) unit, and prints the generated API key.

Open http://localhost:8080/chat, paste the API key, and start talking. Messages hit your thread; responses stream back token-by-token.

Configuration

~/.cumulus/gateway.config.json — adjust any field with cumulus-gateway config set <key> <value> or edit directly:

{
  "apiKeys": ["sk-cumulus-…"],
  "port": 8080,
  "projectRoot": "/home/you/projects",
  "model": "claude", // default per-thread model
  "models": [
    // available models for thread picker
    { "id": "claude", "label": "Claude (CLI)", "provider": "claude-cli" },
    { "id": "zai-org/GLM-5", "label": "GLM-5", "provider": "huggingface" },
    { "id": "moonshotai/Kimi-K2.5", "label": "Kimi-K2.5", "provider": "huggingface" },
  ],
  "hfApiKey": "hf_…", // optional, for HuggingFace models
  "channels": {
    "slack": { "token": "xoxb-…", "signingSecret": "…", "appToken": "xapp-…" },
    "discord": { "token": "…", "clientId": "…" },
  },
  "resend": { "apiKey": "re_…", "defaultFrom": "[email protected]" },
  "vapid": { "publicKey": "…", "privateKey": "…", "subject": "mailto:[email protected]" },
  "federation": {
    "enabled": true,
    "role": "hub", // "hub" or "spoke"
    "allowedSpokes": ["mac-karl"], // hub only
    // spoke config: { role:"spoke", hub:"wss://host/federation", apiKey:"…", name:"mac-…" }
  },
}

Reload the daemon (cumulus-gateway reload) after editing. It waits for active streams to finish before restarting, so in-flight responses aren't dropped.

Gateway features

Per-thread model selection

Each thread can run on a different model. Use the dropdown in the widget header, or the REST API:

curl -X PUT http://localhost:8080/api/thread/my-thread/config \
  -H "X-API-Key: sk-…" \
  -d '{"model": "zai-org/GLM-5"}'

claude — spawns claude --print per turn. Gets the full Claude Code tool surface.
HuggingFace models — routed through an OpenAI-compatible endpoint with a built-in agentic loop that handles tool use, truncation recovery, and error retry.

Web chat widget

At /chat. Features:

Streaming responses over WebSocket, with interjection support (type while streaming to interrupt and redirect).
Blex blocks — ~~~blex:table, ~~~blex:poll, ~~~blex:kanban, ~~~blex:mermaid, and 18 other block types for rich interactive content.
Voice mode — hands-free conversation using browser STT + TTS (optionally server-side Piper).
Push notifications — PWA install + VAPID subscriptions. Agents call notify_user to alert you while you're away.
Media uploads — drag files in; upload_media tool returns a public URL backed by ~/.cumulus/media/.
Annotations — highlight text, attach comments, send back as chips.
Texitool integration — edit Unicode-art diagrams in-place via an embedded canvas.

Channel adapters

Slack (channels.slack) — Socket Mode bot. Thread naming: slack-{userId}-{channelId}.
Discord (channels.discord) — Gateway WebSocket. Thread naming: discord-{userId}-{channelId}.
Inbound webhooks — POST /api/hooks/:type for email (Resend), forms, and generic events. Config-driven thread routing with HMAC signature verification.

Inter-agent messaging

Any thread can message another thread using the send_to_agent MCP tool:

send_to_agent(target="devops", message="Deploy the new build", visibility="cc")

cc (default) — all recipients see each other.
blind — each recipient thinks it's a direct message.
{hidden: […]} — selective (observer pattern, hidden agents invisible to visible recipients).

If the target is busy, the message is queued and delivered as a batch when that thread is idle ("while you were busy, 3 messages arrived…").

Federation

Two gateways can be linked in a hub-and-spoke topology. The hub runs at a stable URL; spokes connect outbound via WSS, so NAT doesn't matter.

send_to_agent("thundercat:cumulus", "…")   # cross-gateway addressing
list_agents()                              # aggregates threads across all spokes

Heartbeats every 25s with bidirectional WebSocket pings; dead connections are torn down within 75s.

Scheduled triggers

Agents can schedule themselves:

schedule_trigger(at="2026-05-01T09:00:00Z", message="Follow up with lead")
schedule_trigger(cron="0 9 * * MON", message="Weekly check-in")
cancel_schedule(id="…")

Schedules are per-thread, persisted in {thread}.config.json, and fire as message injections into the thread.

Email (Resend)

With resend.apiKey configured:

send_email(to="[email protected]", subject="Hello", body="…")
list_emails(limit=10)

Rate-limited per thread (default 10/hour). All sends are logged to thread history. First email from a new thread triggers a notify_user ping.

Self-update

cumulus-gateway check-update      # compares running version to npm
cumulus-gateway update             # bumps to latest, saves previous for rollback
cumulus-gateway rollback           # restores the previous version

The widget's top bar also shows an "Update available" indicator when a new version lands on npm.

Classic CLI mode

The original RLM chat loop still works. Great for quick terminal work without running the gateway.

cumulus my-project             # open or create a thread
cumulus --list                 # list threads
cumulus --delete old-project

Each turn:

Append your message to ~/.cumulus/threads/my-project.jsonl.
Spawn claude --print with --mcp-config pointing to the cumulus MCP server.
Claude pulls whatever history it needs via search_history, peek_recent, etc.
Append the response to the JSONL.
Next turn starts from a fresh context.

MCP tools

The cumulus-mcp server exposes history and content tools. Usable from any MCP-compatible client.

History:

| Tool | Purpose | | ------------------- | ---------------------------------------------------------------- | | search_history | Keyword / semantic / hybrid search over a thread | | peek_recent | Last N messages | | read_messages | Message range by index | | get_history_stats | Count, token estimate, time range | | get_summary | Auto-generated summaries (recent chunk, full, or specific range) | | sub_query | Recursive sub-LLM call over retrieved messages |

Content store (file reads, bash output, web fetches):

| Tool | Purpose | | --------------------- | ---------------------------------------- | | read_file | Read text/PDF, chunk + embed + store | | store_content | Store arbitrary text for later retrieval | | search_content | Search across stored content | | retrieve_content | Get full content by [STORED:xxx] id | | read_content_chunk | Read a specific chunk index | | list_stored_content | List all stored items | | detect_anomalies | Find out-of-place content in a store | | forget_content | Remove a stored item |

Gateway-only tools (available to agents running inside the daemon):

send_to_agent, list_agents, notify_user, schedule_trigger, cancel_schedule, list_schedules, send_email, list_emails, upload_media, create_plastic_app, update_pipeline.

RAG & context management

JSONL history per thread — every message, tool call, and tool result.
Content store — chunked file reads, embedded with local HuggingFace transformers, stored as binary Float32.
Segment summaries — LLM-generated per topic boundary, separately embedded for vocabulary-gap retrieval.
Adaptive context budget — self-tuning per thread based on TTFT. Shrinks when slow, grows when fast + near-capacity. Default 300k, floor 100k, ceiling 1M.
Query-type-aware retrieval — classifies queries (recall / synthesis / recent / decision) and adjusts scoring weights accordingly.

REST API (gateway)

| Method | Path | Purpose | | ------ | --------------------------- | --------------------------------------- | | GET | /health | Gateway status | | POST | /api/thread/:name/message | Send a message (SSE stream in response) | | GET | /api/thread/:name/history | Paginated thread history | | GET | /api/thread/:name/config | Thread config | | PUT | /api/thread/:name/config | Update thread config (model, etc.) | | DELETE | /api/thread/:name | Delete a thread | | GET | /api/threads | List threads | | GET | /api/agents | List threads + streaming status | | POST | /api/agents/inject | Inject message into a thread | | GET | /api/models | Available models | | POST | /api/media/upload | Upload a file | | GET | /media/:filename | Serve uploaded file | | POST | /api/hooks/:type | Inbound webhook | | GET | /api/push/vapid-key | Public VAPID key | | POST | /api/push/subscribe | Register a push subscription | | GET | /api/version | Running version + update availability | | POST | /api/admin/update | Trigger self-update (admin key) |

All /api/* routes require X-API-Key: <key> (from apiKeys[]).

WebSocket (/chat/ws) carries the same semantics with streaming, interjection, inject, worker_cancel, and voice-mode audio frames.

Development

git clone https://github.com/soapko/cumulus
cd cumulus
npm install
npm run build
npm test            # vitest
npm run lint
npm run type-check

Build: dist/ compiled TypeScript plus static assets (widget HTML/CSS/JS, blex bundles).
Tests: vitest, 700+ tests covering agentic loop, retriever, adapters, federation, scheduler, push.
Deploy workflow: bump version, npm publish, then cumulus-gateway reload on the host (SIGHUP drains active streams — see docs/tasks/050-graceful-restart.md).

Background

Cumulus implements the Recursive Language Model pattern: treat conversation history as an external environment the model queries programmatically, rather than stuffing everything into context. This enables reasoning over contexts 2+ orders of magnitude beyond the model's window, with graceful cost scaling.

See docs/ for task documents, ADRs, and implementation notes.

License

MIT