npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@luckydraw/cumulus

v0.30.27

Published

RLM-based CLI chat wrapper for Claude with external history context management

Readme

Cumulus

A self-hosted multi-channel AI gateway built around Claude and other LLMs. Runs as a long-lived daemon, speaks to you through a web chat widget, Slack, Discord, iOS push, email webhooks, and more — with unlimited conversation context via the Recursive Language Model (RLM) pattern.

Originally a CLI wrapper for Claude, Cumulus has grown into a full gateway platform that coordinates agents, channels, models, and federated deployments.

What you get

  • Gateway daemon (cumulus-gateway) — HTTP + WebSocket server with per-thread conversations, streaming responses, and an admin API.
  • Web chat widget — embeddable /chat interface with voice mode, push notifications, media uploads, and rich blex block rendering (tables, forms, charts, kanban, diagrams).
  • Channel adapters — Slack and Discord bots, inbound email webhooks (Resend), and generic HTTP webhooks — all injecting into the same thread model.
  • Inter-agent messaging — threads can talk to each other via send_to_agent, with support for CC/BCC visibility.
  • Federation — hub-and-spoke mesh so agents on different machines can message each other across NATs.
  • Per-thread model selection — Claude (via CLI) or any HuggingFace model (GLM-5, Kimi-K2.5, Qwen3, etc.) with tool calling.
  • Scheduled triggers, email, push, media serving — built-in MCP tools so agents can send emails, schedule themselves, notify you, and upload files.
  • Unlimited history — JSONL per thread + vector store + adaptive context budget. Conversations never truncate.
  • Classic CLI (cumulus) — terminal chat for individual threads, backed by the same history store.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Clients                                                     │
│                                                              │
│   /chat (web)   Slack   Discord   Email   CLI   Push (PWA)   │
│        │         │       │        │       │      │           │
│        └─────────┴───────┴────────┴───────┴──────┘           │
│                          │                                   │
│                          ▼                                   │
│              ┌──────────────────────┐                        │
│              │  cumulus-gateway     │                        │
│              │  (HTTP / WS daemon)  │                        │
│              └──────────┬───────────┘                        │
│                         │                                    │
│     ┌───────────────────┼──────────────────────┐             │
│     ▼                   ▼                      ▼             │
│ ┌─────────┐      ┌──────────────┐      ┌───────────────┐     │
│ │ Thread  │      │ Model router │      │  Federation   │     │
│ │ store   │      │ Claude / HF  │      │  hub/spoke    │     │
│ │ (JSONL) │      │ MCP tools    │      │  (WSS mesh)   │     │
│ └────┬────┘      └──────┬───────┘      └───────────────┘     │
│      │                  │                                    │
│      ▼                  ▼                                    │
│  ~/.cumulus/       Claude CLI                                │
│  threads/          HuggingFace API                           │
│  content/          MCP stdio + in-process                    │
│  media/                                                      │
└──────────────────────────────────────────────────────────────┘

Every turn is a fresh model invocation. The gateway assembles a context budget from recent messages + RAG retrieval against the thread's history and content store, then streams the response back to the originating channel.

Installation

Requires Node 20+.

npm install -g @luckydraw/cumulus

This installs three binaries:

| Command | Purpose | | ----------------- | ------------------------------------------------- | | cumulus | Terminal chat client for a single thread | | cumulus-mcp | MCP server exposing history/content tools (stdio) | | cumulus-gateway | Long-running daemon (HTTP + WebSocket + adapters) |

Quick start — gateway

# Interactive setup: detects project directories, installs a service, generates keys.
cumulus-gateway setup

# Or non-interactive:
cumulus-gateway setup --project-root ~/projects --port 8080

# Start / stop / reload (if you skip the service install):
cumulus-gateway start
cumulus-gateway stop
cumulus-gateway reload     # SIGHUP — drains active streams before restart

Setup writes ~/.cumulus/gateway.config.json, generates VAPID keys for push, scaffolds a systemd (Linux) or LaunchAgent (macOS) unit, and prints the generated API key.

Open http://localhost:8080/chat, paste the API key, and start talking. Messages hit your thread; responses stream back token-by-token.

Configuration

~/.cumulus/gateway.config.json — adjust any field with cumulus-gateway config set <key> <value> or edit directly:

{
  "apiKeys": ["sk-cumulus-…"],
  "port": 8080,
  "projectRoot": "/home/you/projects",
  "model": "claude", // default per-thread model
  "models": [
    // available models for thread picker
    { "id": "claude", "label": "Claude (CLI)", "provider": "claude-cli" },
    { "id": "zai-org/GLM-5", "label": "GLM-5", "provider": "huggingface" },
    { "id": "moonshotai/Kimi-K2.5", "label": "Kimi-K2.5", "provider": "huggingface" },
  ],
  "hfApiKey": "hf_…", // optional, for HuggingFace models
  "channels": {
    "slack": { "token": "xoxb-…", "signingSecret": "…", "appToken": "xapp-…" },
    "discord": { "token": "…", "clientId": "…" },
  },
  "resend": { "apiKey": "re_…", "defaultFrom": "[email protected]" },
  "vapid": { "publicKey": "…", "privateKey": "…", "subject": "mailto:[email protected]" },
  "federation": {
    "enabled": true,
    "role": "hub", // "hub" or "spoke"
    "allowedSpokes": ["mac-karl"], // hub only
    // spoke config: { role:"spoke", hub:"wss://host/federation", apiKey:"…", name:"mac-…" }
  },
}

Reload the daemon (cumulus-gateway reload) after editing. It waits for active streams to finish before restarting, so in-flight responses aren't dropped.

Gateway features

Per-thread model selection

Each thread can run on a different model. Use the dropdown in the widget header, or the REST API:

curl -X PUT http://localhost:8080/api/thread/my-thread/config \
  -H "X-API-Key: sk-…" \
  -d '{"model": "zai-org/GLM-5"}'
  • claude — spawns claude --print per turn. Gets the full Claude Code tool surface.
  • HuggingFace models — routed through an OpenAI-compatible endpoint with a built-in agentic loop that handles tool use, truncation recovery, and error retry.

Web chat widget

At /chat. Features:

  • Streaming responses over WebSocket, with interjection support (type while streaming to interrupt and redirect).
  • Blex blocks~~~blex:table, ~~~blex:poll, ~~~blex:kanban, ~~~blex:mermaid, and 18 other block types for rich interactive content.
  • Voice mode — hands-free conversation using browser STT + TTS (optionally server-side Piper).
  • Push notifications — PWA install + VAPID subscriptions. Agents call notify_user to alert you while you're away.
  • Media uploads — drag files in; upload_media tool returns a public URL backed by ~/.cumulus/media/.
  • Annotations — highlight text, attach comments, send back as chips.
  • Texitool integration — edit Unicode-art diagrams in-place via an embedded canvas.

Channel adapters

  • Slack (channels.slack) — Socket Mode bot. Thread naming: slack-{userId}-{channelId}.
  • Discord (channels.discord) — Gateway WebSocket. Thread naming: discord-{userId}-{channelId}.
  • Inbound webhooksPOST /api/hooks/:type for email (Resend), forms, and generic events. Config-driven thread routing with HMAC signature verification.

Inter-agent messaging

Any thread can message another thread using the send_to_agent MCP tool:

send_to_agent(target="devops", message="Deploy the new build", visibility="cc")
  • cc (default) — all recipients see each other.
  • blind — each recipient thinks it's a direct message.
  • {hidden: […]} — selective (observer pattern, hidden agents invisible to visible recipients).

If the target is busy, the message is queued and delivered as a batch when that thread is idle ("while you were busy, 3 messages arrived…").

Federation

Two gateways can be linked in a hub-and-spoke topology. The hub runs at a stable URL; spokes connect outbound via WSS, so NAT doesn't matter.

send_to_agent("thundercat:cumulus", "…")   # cross-gateway addressing
list_agents()                              # aggregates threads across all spokes

Heartbeats every 25s with bidirectional WebSocket pings; dead connections are torn down within 75s.

Scheduled triggers

Agents can schedule themselves:

schedule_trigger(at="2026-05-01T09:00:00Z", message="Follow up with lead")
schedule_trigger(cron="0 9 * * MON", message="Weekly check-in")
cancel_schedule(id="…")

Schedules are per-thread, persisted in {thread}.config.json, and fire as message injections into the thread.

Email (Resend)

With resend.apiKey configured:

send_email(to="[email protected]", subject="Hello", body="…")
list_emails(limit=10)

Rate-limited per thread (default 10/hour). All sends are logged to thread history. First email from a new thread triggers a notify_user ping.

Self-update

cumulus-gateway check-update      # compares running version to npm
cumulus-gateway update             # bumps to latest, saves previous for rollback
cumulus-gateway rollback           # restores the previous version

The widget's top bar also shows an "Update available" indicator when a new version lands on npm.

Classic CLI mode

The original RLM chat loop still works. Great for quick terminal work without running the gateway.

cumulus my-project             # open or create a thread
cumulus --list                 # list threads
cumulus --delete old-project

Each turn:

  1. Append your message to ~/.cumulus/threads/my-project.jsonl.
  2. Spawn claude --print with --mcp-config pointing to the cumulus MCP server.
  3. Claude pulls whatever history it needs via search_history, peek_recent, etc.
  4. Append the response to the JSONL.
  5. Next turn starts from a fresh context.

MCP tools

The cumulus-mcp server exposes history and content tools. Usable from any MCP-compatible client.

History:

| Tool | Purpose | | ------------------- | ---------------------------------------------------------------- | | search_history | Keyword / semantic / hybrid search over a thread | | peek_recent | Last N messages | | read_messages | Message range by index | | get_history_stats | Count, token estimate, time range | | get_summary | Auto-generated summaries (recent chunk, full, or specific range) | | sub_query | Recursive sub-LLM call over retrieved messages |

Content store (file reads, bash output, web fetches):

| Tool | Purpose | | --------------------- | ---------------------------------------- | | read_file | Read text/PDF, chunk + embed + store | | store_content | Store arbitrary text for later retrieval | | search_content | Search across stored content | | retrieve_content | Get full content by [STORED:xxx] id | | read_content_chunk | Read a specific chunk index | | list_stored_content | List all stored items | | detect_anomalies | Find out-of-place content in a store | | forget_content | Remove a stored item |

Gateway-only tools (available to agents running inside the daemon):

send_to_agent, list_agents, notify_user, schedule_trigger, cancel_schedule, list_schedules, send_email, list_emails, upload_media, create_plastic_app, update_pipeline.

RAG & context management

  • JSONL history per thread — every message, tool call, and tool result.
  • Content store — chunked file reads, embedded with local HuggingFace transformers, stored as binary Float32.
  • Segment summaries — LLM-generated per topic boundary, separately embedded for vocabulary-gap retrieval.
  • Adaptive context budget — self-tuning per thread based on TTFT. Shrinks when slow, grows when fast + near-capacity. Default 300k, floor 100k, ceiling 1M.
  • Query-type-aware retrieval — classifies queries (recall / synthesis / recent / decision) and adjusts scoring weights accordingly.

REST API (gateway)

| Method | Path | Purpose | | ------ | --------------------------- | --------------------------------------- | | GET | /health | Gateway status | | POST | /api/thread/:name/message | Send a message (SSE stream in response) | | GET | /api/thread/:name/history | Paginated thread history | | GET | /api/thread/:name/config | Thread config | | PUT | /api/thread/:name/config | Update thread config (model, etc.) | | DELETE | /api/thread/:name | Delete a thread | | GET | /api/threads | List threads | | GET | /api/agents | List threads + streaming status | | POST | /api/agents/inject | Inject message into a thread | | GET | /api/models | Available models | | POST | /api/media/upload | Upload a file | | GET | /media/:filename | Serve uploaded file | | POST | /api/hooks/:type | Inbound webhook | | GET | /api/push/vapid-key | Public VAPID key | | POST | /api/push/subscribe | Register a push subscription | | GET | /api/version | Running version + update availability | | POST | /api/admin/update | Trigger self-update (admin key) |

All /api/* routes require X-API-Key: <key> (from apiKeys[]).

WebSocket (/chat/ws) carries the same semantics with streaming, interjection, inject, worker_cancel, and voice-mode audio frames.

Development

git clone https://github.com/soapko/cumulus
cd cumulus
npm install
npm run build
npm test            # vitest
npm run lint
npm run type-check
  • Build: dist/ compiled TypeScript plus static assets (widget HTML/CSS/JS, blex bundles).
  • Tests: vitest, 700+ tests covering agentic loop, retriever, adapters, federation, scheduler, push.
  • Deploy workflow: bump version, npm publish, then cumulus-gateway reload on the host (SIGHUP drains active streams — see docs/tasks/050-graceful-restart.md).

Background

Cumulus implements the Recursive Language Model pattern: treat conversation history as an external environment the model queries programmatically, rather than stuffing everything into context. This enables reasoning over contexts 2+ orders of magnitude beyond the model's window, with graceful cost scaling.

See docs/ for task documents, ADRs, and implementation notes.

License

MIT