@luckydraw/cumulus
v0.30.27
Published
RLM-based CLI chat wrapper for Claude with external history context management
Readme
Cumulus
A self-hosted multi-channel AI gateway built around Claude and other LLMs. Runs as a long-lived daemon, speaks to you through a web chat widget, Slack, Discord, iOS push, email webhooks, and more — with unlimited conversation context via the Recursive Language Model (RLM) pattern.
Originally a CLI wrapper for Claude, Cumulus has grown into a full gateway platform that coordinates agents, channels, models, and federated deployments.
What you get
- Gateway daemon (
cumulus-gateway) — HTTP + WebSocket server with per-thread conversations, streaming responses, and an admin API. - Web chat widget — embeddable
/chatinterface with voice mode, push notifications, media uploads, and rich blex block rendering (tables, forms, charts, kanban, diagrams). - Channel adapters — Slack and Discord bots, inbound email webhooks (Resend), and generic HTTP webhooks — all injecting into the same thread model.
- Inter-agent messaging — threads can talk to each other via
send_to_agent, with support for CC/BCC visibility. - Federation — hub-and-spoke mesh so agents on different machines can message each other across NATs.
- Per-thread model selection — Claude (via CLI) or any HuggingFace model (GLM-5, Kimi-K2.5, Qwen3, etc.) with tool calling.
- Scheduled triggers, email, push, media serving — built-in MCP tools so agents can send emails, schedule themselves, notify you, and upload files.
- Unlimited history — JSONL per thread + vector store + adaptive context budget. Conversations never truncate.
- Classic CLI (
cumulus) — terminal chat for individual threads, backed by the same history store.
Architecture
┌──────────────────────────────────────────────────────────────┐
│ Clients │
│ │
│ /chat (web) Slack Discord Email CLI Push (PWA) │
│ │ │ │ │ │ │ │
│ └─────────┴───────┴────────┴───────┴──────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ cumulus-gateway │ │
│ │ (HTTP / WS daemon) │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌───────────────────┼──────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Thread │ │ Model router │ │ Federation │ │
│ │ store │ │ Claude / HF │ │ hub/spoke │ │
│ │ (JSONL) │ │ MCP tools │ │ (WSS mesh) │ │
│ └────┬────┘ └──────┬───────┘ └───────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ~/.cumulus/ Claude CLI │
│ threads/ HuggingFace API │
│ content/ MCP stdio + in-process │
│ media/ │
└──────────────────────────────────────────────────────────────┘Every turn is a fresh model invocation. The gateway assembles a context budget from recent messages + RAG retrieval against the thread's history and content store, then streams the response back to the originating channel.
Installation
Requires Node 20+.
npm install -g @luckydraw/cumulusThis installs three binaries:
| Command | Purpose |
| ----------------- | ------------------------------------------------- |
| cumulus | Terminal chat client for a single thread |
| cumulus-mcp | MCP server exposing history/content tools (stdio) |
| cumulus-gateway | Long-running daemon (HTTP + WebSocket + adapters) |
Quick start — gateway
# Interactive setup: detects project directories, installs a service, generates keys.
cumulus-gateway setup
# Or non-interactive:
cumulus-gateway setup --project-root ~/projects --port 8080
# Start / stop / reload (if you skip the service install):
cumulus-gateway start
cumulus-gateway stop
cumulus-gateway reload # SIGHUP — drains active streams before restartSetup writes ~/.cumulus/gateway.config.json, generates VAPID keys for push, scaffolds a systemd (Linux) or LaunchAgent (macOS) unit, and prints the generated API key.
Open http://localhost:8080/chat, paste the API key, and start talking. Messages hit your thread; responses stream back token-by-token.
Configuration
~/.cumulus/gateway.config.json — adjust any field with cumulus-gateway config set <key> <value> or edit directly:
{
"apiKeys": ["sk-cumulus-…"],
"port": 8080,
"projectRoot": "/home/you/projects",
"model": "claude", // default per-thread model
"models": [
// available models for thread picker
{ "id": "claude", "label": "Claude (CLI)", "provider": "claude-cli" },
{ "id": "zai-org/GLM-5", "label": "GLM-5", "provider": "huggingface" },
{ "id": "moonshotai/Kimi-K2.5", "label": "Kimi-K2.5", "provider": "huggingface" },
],
"hfApiKey": "hf_…", // optional, for HuggingFace models
"channels": {
"slack": { "token": "xoxb-…", "signingSecret": "…", "appToken": "xapp-…" },
"discord": { "token": "…", "clientId": "…" },
},
"resend": { "apiKey": "re_…", "defaultFrom": "[email protected]" },
"vapid": { "publicKey": "…", "privateKey": "…", "subject": "mailto:[email protected]" },
"federation": {
"enabled": true,
"role": "hub", // "hub" or "spoke"
"allowedSpokes": ["mac-karl"], // hub only
// spoke config: { role:"spoke", hub:"wss://host/federation", apiKey:"…", name:"mac-…" }
},
}Reload the daemon (cumulus-gateway reload) after editing. It waits for active streams to finish before restarting, so in-flight responses aren't dropped.
Gateway features
Per-thread model selection
Each thread can run on a different model. Use the dropdown in the widget header, or the REST API:
curl -X PUT http://localhost:8080/api/thread/my-thread/config \
-H "X-API-Key: sk-…" \
-d '{"model": "zai-org/GLM-5"}'claude— spawnsclaude --printper turn. Gets the full Claude Code tool surface.- HuggingFace models — routed through an OpenAI-compatible endpoint with a built-in agentic loop that handles tool use, truncation recovery, and error retry.
Web chat widget
At /chat. Features:
- Streaming responses over WebSocket, with interjection support (type while streaming to interrupt and redirect).
- Blex blocks —
~~~blex:table,~~~blex:poll,~~~blex:kanban,~~~blex:mermaid, and 18 other block types for rich interactive content. - Voice mode — hands-free conversation using browser STT + TTS (optionally server-side Piper).
- Push notifications — PWA install + VAPID subscriptions. Agents call
notify_userto alert you while you're away. - Media uploads — drag files in;
upload_mediatool returns a public URL backed by~/.cumulus/media/. - Annotations — highlight text, attach comments, send back as chips.
- Texitool integration — edit Unicode-art diagrams in-place via an embedded canvas.
Channel adapters
- Slack (
channels.slack) — Socket Mode bot. Thread naming:slack-{userId}-{channelId}. - Discord (
channels.discord) — Gateway WebSocket. Thread naming:discord-{userId}-{channelId}. - Inbound webhooks —
POST /api/hooks/:typefor email (Resend), forms, and generic events. Config-driven thread routing with HMAC signature verification.
Inter-agent messaging
Any thread can message another thread using the send_to_agent MCP tool:
send_to_agent(target="devops", message="Deploy the new build", visibility="cc")cc(default) — all recipients see each other.blind— each recipient thinks it's a direct message.{hidden: […]}— selective (observer pattern, hidden agents invisible to visible recipients).
If the target is busy, the message is queued and delivered as a batch when that thread is idle ("while you were busy, 3 messages arrived…").
Federation
Two gateways can be linked in a hub-and-spoke topology. The hub runs at a stable URL; spokes connect outbound via WSS, so NAT doesn't matter.
send_to_agent("thundercat:cumulus", "…") # cross-gateway addressing
list_agents() # aggregates threads across all spokesHeartbeats every 25s with bidirectional WebSocket pings; dead connections are torn down within 75s.
Scheduled triggers
Agents can schedule themselves:
schedule_trigger(at="2026-05-01T09:00:00Z", message="Follow up with lead")
schedule_trigger(cron="0 9 * * MON", message="Weekly check-in")
cancel_schedule(id="…")Schedules are per-thread, persisted in {thread}.config.json, and fire as message injections into the thread.
Email (Resend)
With resend.apiKey configured:
send_email(to="[email protected]", subject="Hello", body="…")
list_emails(limit=10)Rate-limited per thread (default 10/hour). All sends are logged to thread history. First email from a new thread triggers a notify_user ping.
Self-update
cumulus-gateway check-update # compares running version to npm
cumulus-gateway update # bumps to latest, saves previous for rollback
cumulus-gateway rollback # restores the previous versionThe widget's top bar also shows an "Update available" indicator when a new version lands on npm.
Classic CLI mode
The original RLM chat loop still works. Great for quick terminal work without running the gateway.
cumulus my-project # open or create a thread
cumulus --list # list threads
cumulus --delete old-projectEach turn:
- Append your message to
~/.cumulus/threads/my-project.jsonl. - Spawn
claude --printwith--mcp-configpointing to the cumulus MCP server. - Claude pulls whatever history it needs via
search_history,peek_recent, etc. - Append the response to the JSONL.
- Next turn starts from a fresh context.
MCP tools
The cumulus-mcp server exposes history and content tools. Usable from any MCP-compatible client.
History:
| Tool | Purpose |
| ------------------- | ---------------------------------------------------------------- |
| search_history | Keyword / semantic / hybrid search over a thread |
| peek_recent | Last N messages |
| read_messages | Message range by index |
| get_history_stats | Count, token estimate, time range |
| get_summary | Auto-generated summaries (recent chunk, full, or specific range) |
| sub_query | Recursive sub-LLM call over retrieved messages |
Content store (file reads, bash output, web fetches):
| Tool | Purpose |
| --------------------- | ---------------------------------------- |
| read_file | Read text/PDF, chunk + embed + store |
| store_content | Store arbitrary text for later retrieval |
| search_content | Search across stored content |
| retrieve_content | Get full content by [STORED:xxx] id |
| read_content_chunk | Read a specific chunk index |
| list_stored_content | List all stored items |
| detect_anomalies | Find out-of-place content in a store |
| forget_content | Remove a stored item |
Gateway-only tools (available to agents running inside the daemon):
send_to_agent, list_agents, notify_user, schedule_trigger, cancel_schedule, list_schedules, send_email, list_emails, upload_media, create_plastic_app, update_pipeline.
RAG & context management
- JSONL history per thread — every message, tool call, and tool result.
- Content store — chunked file reads, embedded with local HuggingFace transformers, stored as binary Float32.
- Segment summaries — LLM-generated per topic boundary, separately embedded for vocabulary-gap retrieval.
- Adaptive context budget — self-tuning per thread based on TTFT. Shrinks when slow, grows when fast + near-capacity. Default 300k, floor 100k, ceiling 1M.
- Query-type-aware retrieval — classifies queries (recall / synthesis / recent / decision) and adjusts scoring weights accordingly.
REST API (gateway)
| Method | Path | Purpose |
| ------ | --------------------------- | --------------------------------------- |
| GET | /health | Gateway status |
| POST | /api/thread/:name/message | Send a message (SSE stream in response) |
| GET | /api/thread/:name/history | Paginated thread history |
| GET | /api/thread/:name/config | Thread config |
| PUT | /api/thread/:name/config | Update thread config (model, etc.) |
| DELETE | /api/thread/:name | Delete a thread |
| GET | /api/threads | List threads |
| GET | /api/agents | List threads + streaming status |
| POST | /api/agents/inject | Inject message into a thread |
| GET | /api/models | Available models |
| POST | /api/media/upload | Upload a file |
| GET | /media/:filename | Serve uploaded file |
| POST | /api/hooks/:type | Inbound webhook |
| GET | /api/push/vapid-key | Public VAPID key |
| POST | /api/push/subscribe | Register a push subscription |
| GET | /api/version | Running version + update availability |
| POST | /api/admin/update | Trigger self-update (admin key) |
All /api/* routes require X-API-Key: <key> (from apiKeys[]).
WebSocket (/chat/ws) carries the same semantics with streaming, interjection, inject, worker_cancel, and voice-mode audio frames.
Development
git clone https://github.com/soapko/cumulus
cd cumulus
npm install
npm run build
npm test # vitest
npm run lint
npm run type-check- Build:
dist/compiled TypeScript plus static assets (widget HTML/CSS/JS, blex bundles). - Tests: vitest, 700+ tests covering agentic loop, retriever, adapters, federation, scheduler, push.
- Deploy workflow: bump version,
npm publish, thencumulus-gateway reloadon the host (SIGHUP drains active streams — seedocs/tasks/050-graceful-restart.md).
Background
Cumulus implements the Recursive Language Model pattern: treat conversation history as an external environment the model queries programmatically, rather than stuffing everything into context. This enables reasoning over contexts 2+ orders of magnitude beyond the model's window, with graceful cost scaling.
See docs/ for task documents, ADRs, and implementation notes.
License
MIT
