npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@luckydraw/cumulus

v0.31.29

Published

RLM-based CLI chat wrapper for Claude with external history context management

Readme

Cumulus

A self-hosted multi-channel AI gateway built around Claude and other LLMs. Runs as a long-lived daemon, speaks to you through a web chat widget, Slack, Discord, iOS push, email webhooks, and more — with unlimited conversation context via the Recursive Language Model (RLM) pattern.

Originally a CLI wrapper for Claude, Cumulus has grown into a full gateway platform that coordinates agents, channels, and models behind one persistent process.

What you get

  • Gateway daemon (cumulus-gateway) — HTTP + WebSocket server with per-thread conversations, streaming responses, and an admin API.
  • Web chat widget — embeddable /chat interface with voice mode, push notifications, file uploads with progress, and rich blex block rendering (tables, forms, charts, kanban, diagrams).
  • Channel adapters — Slack and Discord bots, inbound email webhooks (Resend), and generic HTTP webhooks — all injecting into the same thread model.
  • Inter-agent messaging — threads can talk to each other via send_to_agent, with support for CC/BCC visibility.
  • Per-thread model selection — Claude (via CLI) or any HuggingFace model (GLM-5, Kimi-K2.5, Qwen3, etc.) with tool calling.
  • Scheduled triggers, email, push, media serving — built-in MCP tools so agents can send emails, schedule themselves, notify you, and upload files.
  • Unlimited history — JSONL per thread + vector store + adaptive context budget. Conversations never truncate.
  • Classic CLI (cumulus) — terminal chat for individual threads, backed by the same history store.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Clients                                                     │
│                                                              │
│   /chat (web)   Slack   Discord   Email   CLI   Push (PWA)   │
│        │         │       │        │       │      │           │
│        └─────────┴───────┴────────┴───────┴──────┘           │
│                          │                                   │
│                          ▼                                   │
│              ┌──────────────────────┐                        │
│              │  cumulus-gateway     │                        │
│              │  (HTTP / WS daemon)  │                        │
│              └──────────┬───────────┘                        │
│                         │                                    │
│           ┌─────────────┴─────────────┐                      │
│           ▼                           ▼                      │
│      ┌─────────┐              ┌──────────────┐               │
│      │ Thread  │              │ Model router │               │
│      │ store   │              │ Claude / HF  │               │
│      │ (JSONL) │              │ MCP tools    │               │
│      └────┬────┘              └──────┬───────┘               │
│           │                          │                       │
│           ▼                          ▼                       │
│       ~/.cumulus/                Claude CLI                  │
│       threads/                   HuggingFace API             │
│       content/                   MCP stdio + in-process      │
│       media/                                                 │
└──────────────────────────────────────────────────────────────┘

Every turn is a fresh model invocation. The gateway assembles a context budget from recent messages + RAG retrieval against the thread's history and content store, then streams the response back to the originating channel.

Installation

Requires Node 20+.

npm install -g @luckydraw/cumulus

This installs three binaries:

| Command | Purpose | | ----------------- | ------------------------------------------------- | | cumulus | Terminal chat client for a single thread | | cumulus-mcp | MCP server exposing history/content tools (stdio) | | cumulus-gateway | Long-running daemon (HTTP + WebSocket + adapters) |

Quick start — gateway

# Interactive setup: detects project directories, installs a service, generates keys.
cumulus-gateway setup

# Or non-interactive:
cumulus-gateway setup --project-root ~/projects --port 8080

# Start / stop / reload (if you skip the service install):
cumulus-gateway start
cumulus-gateway stop
cumulus-gateway reload     # SIGHUP — drains active streams before restart

Setup writes ~/.cumulus/gateway.config.json, generates VAPID keys for push, scaffolds a systemd (Linux) or LaunchAgent (macOS) unit, and prints the generated API key.

Open http://localhost:8080/chat, paste the API key, and start talking. Messages hit your thread; responses stream back token-by-token.

Configuration

~/.cumulus/gateway.config.json — adjust any field with cumulus-gateway config set <key> <value> or edit directly:

{
  "apiKeys": ["sk-cumulus-…"],
  "port": 8080,
  "projectRoot": "/home/you/projects",
  "model": "claude", // default per-thread model
  "models": [
    // available models for thread picker
    { "id": "claude", "label": "Claude (CLI)", "provider": "claude-cli" },
    { "id": "zai-org/GLM-5", "label": "GLM-5", "provider": "huggingface" },
    { "id": "moonshotai/Kimi-K2.5", "label": "Kimi-K2.5", "provider": "huggingface" },
  ],
  "hfApiKey": "hf_…", // optional, for HuggingFace models
  "channels": {
    "slack": { "token": "xoxb-…", "signingSecret": "…", "appToken": "xapp-…" },
    "discord": { "token": "…", "clientId": "…" },
  },
  "resend": { "apiKey": "re_…", "defaultFrom": "[email protected]" },
  "vapid": { "publicKey": "…", "privateKey": "…", "subject": "mailto:[email protected]" },
}

Reload the daemon (cumulus-gateway reload) after editing. It waits for active streams to finish before restarting, so in-flight responses aren't dropped.

Gateway features

Per-thread model selection

Each thread can run on a different model. Use the dropdown in the widget header, or the REST API:

curl -X PUT http://localhost:8080/api/thread/my-thread/config \
  -H "X-API-Key: sk-…" \
  -d '{"model": "zai-org/GLM-5"}'
  • claude — spawns claude --print per turn. Gets the full Claude Code tool surface. Per-thread effort selector (lowmax) maps to the CLI's --effort flag.
  • HuggingFace models — routed through an OpenAI-compatible endpoint with a built-in agentic loop that handles tool use, truncation recovery, and error retry.

Web chat widget

At /chat. Features:

  • Streaming responses over WebSocket, with interjection support (type while streaming to interrupt and redirect).
  • Multiple threads, side by side — Cmd/Ctrl+Click any thread in the sidebar to open it in a second panel alongside the current one. Useful for cross-referencing or driving two agents in parallel. Mobile auto-collapses to a single panel.
  • Inline annotations — highlight any chat text, leave a comment via the popover, and send it back as a quoted chip. Chips can be edited or removed before sending. Works like leaving a margin note on what the agent just said.
  • Blex blocks~~~blex:TYPE fenced JSON renders as a rich, interactive component. 22 block types including:
    • Interactive inputpoll (multi-question carousels, multi-select, write-in answers), confirm (Yes/No/Cancel), form (typed fields with validation). User responses serialize back into the chat input.
    • Embedded contentembed (sandboxed iframe for hosted apps and webpages, inline in the chat), image/gallery (with upload_media-served URLs), mermaid and svg diagrams.
    • Live datatable (sortable/selectable), chart, kanban, calendar, timeline, status, metric, progress, file-tree, terminal, code, diff (with Apply/Reject buttons), layout (composes other blocks), branch (step-through flowcharts).
  • Voice mode — hands-free conversation using browser STT + server-side Piper TTS, with sentence-by-sentence playback and barge-in.
  • Push notifications — PWA install + VAPID subscriptions. Agents call notify_user to alert you while you're away.
  • File attachments — drag or pick any file type. Non-image files upload via XHR with a per-chip progress bar and cancel; agents receive the absolute disk path and can read_file it directly. Images stay on the inline-base64 path for vision-capable models.
  • Texitool integration — edit Unicode-art diagrams in-place via an embedded canvas.
  • Update banner — auto-detects when a newer version is on npm and offers a one-click update.

Channel adapters

  • Slack (channels.slack) — Socket Mode bot. Thread naming: slack-{userId}-{channelId}.
  • Discord (channels.discord) — Gateway WebSocket. Thread naming: discord-{userId}-{channelId}.
  • Inbound webhooksPOST /api/hooks/:type for email (Resend), forms, and generic events. Config-driven thread routing with HMAC signature verification.

Inter-agent messaging

Any thread can message another thread on the same gateway using the send_to_agent MCP tool:

send_to_agent(target="devops", message="Deploy the new build", visibility="cc")
  • cc (default) — all recipients see each other.
  • blind — each recipient thinks it's a direct message.
  • {hidden: […]} — selective (observer pattern, hidden agents invisible to visible recipients).

If the target is busy, the message is queued and delivered as a batch when that thread is idle ("while you were busy, 3 messages arrived…").

Scheduled triggers

Agents can schedule themselves:

schedule_trigger(at="2026-05-01T09:00:00Z", message="Follow up with lead")
schedule_trigger(cron="0 9 * * MON", message="Weekly check-in")
cancel_schedule(id="…")

Schedules are per-thread, persisted in {thread}.config.json, and fire as message injections into the thread.

Email (Resend)

With resend.apiKey configured:

send_email(to="[email protected]", subject="Hello", body="…")
list_emails(limit=10)

Rate-limited per thread (default 10/hour). All sends are logged to thread history. First email from a new thread triggers a notify_user ping.

Reliability

  • Graceful restartcumulus-gateway reload (SIGHUP) drains active Claude/HF streams up to 120s before restarting; no truncated responses on deploy.
  • Auto-resume after restart — interrupted threads get a resume nudge on startup so the agent picks back up with full RAG context.
  • Persistent streaming buffer — partial responses are flushed to disk every 5s during streaming and recovered on restart.
  • Truncation continuationfinish_reason: "length" triggers max-token escalation (8k → 16k → 32k) and seamless continuation stitching.
  • WebSocket keepalive — server-side ping/pong every 30s; clients reload history if a stream goes silent for >120s.
  • Policy-error retry — Claude CLI transient "Usage Policy" refusals auto-retry up to 3 times with a visible "Retrying…" indicator.
  • HF transient-error retry[Error: terminated], connection resets, and similar stream/network errors retry with exponential backoff.

Self-update

cumulus-gateway check-update      # compares running version to npm
cumulus-gateway update             # bumps to latest, saves previous for rollback
cumulus-gateway rollback           # restores the previous version

The widget's top bar also shows an "Update available" indicator (with a manual ↻ check button) when a new version lands on npm.

Classic CLI mode

The original RLM chat loop still works. Great for quick terminal work without running the gateway.

cumulus my-project             # open or create a thread
cumulus --list                 # list threads
cumulus --delete old-project

Each turn:

  1. Append your message to ~/.cumulus/threads/my-project.jsonl.
  2. Spawn claude --print with --mcp-config pointing to the cumulus MCP server.
  3. Claude pulls whatever history it needs via search_history, peek_recent, etc.
  4. Append the response to the JSONL.
  5. Next turn starts from a fresh context.

MCP tools

The cumulus-mcp server exposes history and content tools. Usable from any MCP-compatible client.

History:

| Tool | Purpose | | ------------------- | ---------------------------------------------------------------- | | search_history | Keyword / semantic / hybrid search over a thread | | peek_recent | Last N messages | | read_messages | Message range by index | | get_history_stats | Count, token estimate, time range | | get_summary | Auto-generated summaries (recent chunk, full, or specific range) | | sub_query | Recursive sub-LLM call over retrieved messages |

Content store (file reads, bash output, web fetches):

| Tool | Purpose | | --------------------- | ---------------------------------------- | | read_file | Read text/PDF, chunk + embed + store | | store_content | Store arbitrary text for later retrieval | | search_content | Search across stored content | | retrieve_content | Get full content by [STORED:xxx] id | | read_content_chunk | Read a specific chunk index | | list_stored_content | List all stored items | | detect_anomalies | Find out-of-place content in a store | | forget_content | Remove a stored item |

Gateway-only tools (available to agents running inside the daemon):

send_to_agent, list_agents, notify_user, schedule_trigger, cancel_schedule, list_schedules, send_email, list_emails, upload_media.

RAG & context management

  • JSONL history per thread — every message, tool call, and tool result.
  • Content store — chunked file reads, embedded with local HuggingFace transformers, stored as binary Float32.
  • Segment summaries — LLM-generated per topic boundary, separately embedded for vocabulary-gap retrieval.
  • Adaptive context budget — self-tuning per thread based on TTFT. Shrinks when slow, grows when fast + near-capacity. Default 300k, floor 100k, ceiling 1M.
  • Query-type-aware retrieval — classifies queries (recall / synthesis / recent / decision) and adjusts scoring weights accordingly.

REST API (gateway)

| Method | Path | Purpose | | ------ | --------------------------- | --------------------------------------- | | GET | /health | Gateway status | | POST | /api/thread/:name/message | Send a message (SSE stream in response) | | GET | /api/thread/:name/history | Paginated thread history | | GET | /api/thread/:name/config | Thread config | | PUT | /api/thread/:name/config | Update thread config (model, etc.) | | DELETE | /api/thread/:name | Delete a thread | | GET | /api/threads | List threads | | GET | /api/agents | List threads + streaming status | | POST | /api/agents/inject | Inject message into a thread | | GET | /api/models | Available models | | POST | /api/media/upload | Upload a file | | GET | /media/:filename | Serve uploaded file | | POST | /api/hooks/:type | Inbound webhook | | GET | /api/push/vapid-key | Public VAPID key | | POST | /api/push/subscribe | Register a push subscription | | GET | /api/version | Running version + update availability | | POST | /api/admin/update | Trigger self-update (admin key) |

All /api/* routes require X-API-Key: <key> (from apiKeys[]).

WebSocket (/chat/ws) carries the same semantics with streaming, interjection, inject, and voice-mode audio frames.

Development

git clone https://github.com/soapko/cumulus
cd cumulus
npm install
npm run build
npm test            # vitest
npm run lint
npm run type-check
  • Build: dist/ compiled TypeScript plus static assets (widget HTML/CSS/JS, blex bundles).
  • Tests: vitest, 700+ tests covering agentic loop, retriever, adapters, scheduler, push.
  • Deploy workflow: bump version, npm publish, then cumulus-gateway reload on the host (SIGHUP drains active streams — see docs/tasks/050-graceful-restart.md).

Background

Cumulus implements the Recursive Language Model pattern: treat conversation history as an external environment the model queries programmatically, rather than stuffing everything into context. This enables reasoning over contexts 2+ orders of magnitude beyond the model's window, with graceful cost scaling.

See docs/ for task documents, ADRs, and implementation notes.

License

MIT