npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cachly-dev/openclaw

v0.1.6

Published

Cut LLM API costs 60–90% in 3 lines of code. Semantic cache with BM25+ fuzzy matching, persistent sessions, AI memory adapter. Works with OpenAI, Anthropic, LangChain, Vercel AI SDK. No embeddings required for basic caching.

Readme

@cachly-dev/openclaw

You paid $0.08 for that answer. The next 1,000 identical asks: $0.00.
Semantic LLM cache + persistent sessions + AI memory. 3 lines. No embeddings required.

npm npm downloads Free tier License: Apache-2.0


Before / After

// ❌ Before: Every user message calls OpenAI. Every time. No exceptions.
const reply = await openai.chat.completions.create({ model: 'gpt-4o', messages })

// ✅ After: Same questions = zero API calls = zero cost
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL })
const reply = await cache.getOrSet(userMessage, () =>
  openai.chat.completions.create({ model: 'gpt-4o', messages })
)

"How do I reset my password?" → "How can I reset my pw?" → cache hit. $0.00.


Setup — 60 seconds

npm install @cachly-dev/openclaw
# Get a free Redis instance at cachly.dev (no credit card):
CACHLY_URL=redis://:[email protected]:6379

⚡ Semantic LLM Cache — 3 lines

Every time a user asks the same question in different words, you pay OpenAI again. This stops that.

import { createSemanticLLMCache } from '@cachly-dev/openclaw'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })

// Wrap any LLM call — that's it
const answer = await cache.getOrSet(
  userPrompt,
  () => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] })
)

Without an embed function + no vectorUrl: exact-match caching + local BM25+ fuzzy search kick in immediately (20–50% savings). No API calls. "how do I reset password?" matches "password reset help" — pure in-process.

Without an embed function + vectorUrl: BM25 + hosted pgvector index, higher hit rates across large caches.

Add semantic matching for 60–90% savings (10 more lines):

const cache = createSemanticLLMCache({
  url:       process.env.CACHLY_URL!,
  vectorUrl: process.env.CACHLY_VECTOR_URL,   // from cachly.dev dashboard
  embedFn:   (text) =>
    openai.embeddings.create({ model: 'text-embedding-3-small', input: text })
      .then(r => r.data[0].embedding),
  threshold: 0.92,   // cosine similarity (default)
  ttl:       3600,   // seconds
})

"How do I reset my password?" → "How can I reset my pw?" → same cache hit. 💰


📊 What you save

| Questions/day | Cache hit rate | Monthly saving (GPT-4o) | |---------------|---------------|------------------------| | 100 | 40% exact | ~$8 | | 100 | 70% semantic | ~$22 | | 1 000 | 70% semantic | ~$220 | | 10 000 | 70% semantic | ~$2 200 |

After 10 cache hits, the console logs:

🎯 cachly: 12,340 tokens saved this session (10 hits)
   Full stats → cachly.dev/dashboard

(Cost breakdown available in the dashboard.)


🗄️ Session Store

Persist conversation history in Redis — no cold starts, no lost context:

import { createCachlySessionStore } from '@cachly-dev/openclaw'

const sessions = createCachlySessionStore({
  url: process.env.CACHLY_URL!,
  ttl: 604800,  // 7 days
})

const history = await sessions.get(userId)
await sessions.set(userId, [...history, { role: 'user', content: message }])

Works with any LLM framework — OpenAI, LangChain, Vercel AI SDK, etc.


🧠 Memory Adapter

Long-term semantic memory — store facts, recall by meaning:

import { createCachlyMemoryAdapter } from '@cachly-dev/openclaw'

const memory = createCachlyMemoryAdapter({
  url:       process.env.CACHLY_URL!,
  vectorUrl: process.env.CACHLY_VECTOR_URL!,
  embedFn:   myEmbedFn,
  ttl:       7776000,  // 90 days
})

await memory.store({ id: 'pref-1', text: 'User prefers TypeScript over Python' })
const results = await memory.search('programming language preference', { topK: 5 })
// → [{ text: 'User prefers TypeScript over Python', score: 0.97 }]

🔍 Brain Search (BM25+)

Full-text search over cached data — no embeddings needed:

import { brainSearch } from '@cachly-dev/openclaw'

const results = await brainSearch(process.env.CACHLY_VECTOR_URL!, 'deploy authentication error')
// → [{ key: 'lesson:fix:auth', score: 4.2, preview: '...' }]

🧩 Standalone — works with any LLM stack

No OpenClaw needed. Drop into LangChain, Vercel AI SDK, plain fetch, or any custom pipeline:

LangChain

import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { ChatOpenAI } from '@langchain/openai'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
const llm = new ChatOpenAI()

async function cachedInvoke(prompt: string) {
  return cache.getOrSet(
    prompt,
    () => llm.invoke(prompt).then(r => ({ content: r.content as string, model: 'gpt-4o' }))
  )
}

Vercel AI SDK

import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })

export async function POST(req: Request) {
  const { prompt } = await req.json()
  const result = await cache.getOrSet(
    prompt,
    () => generateText({ model: openai('gpt-4o'), prompt }).then(r => ({ content: r.text, model: 'gpt-4o' }))
  )
  return Response.json(result)
}

Plain fetch / any provider

import { createSemanticLLMCache } from '@cachly-dev/openclaw'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })

const answer = await cache.getOrSet(
  userMessage,
  async () => {
    const res = await fetch('https://api.anthropic.com/v1/messages', {
      method: 'POST',
      headers: { 'x-api-key': process.env.ANTHROPIC_KEY!, 'content-type': 'application/json' },
      body: JSON.stringify({ model: 'claude-opus-4-5', max_tokens: 1024, messages: [{ role: 'user', content: userMessage }] }),
    })
    const json = await res.json()
    return { content: json.content[0].text, model: 'claude-opus-4-5', inputTokens: json.usage.input_tokens, outputTokens: json.usage.output_tokens }
  }
)

🦾 OpenClaw adapter (bonus)

If you use OpenClaw (22-channel AI assistant), one function wires everything:

import { createCachlyOpenClawConfig } from '@cachly-dev/openclaw'
import OpenAI from 'openai'

const openai = new OpenAI()

const cachlyConfig = await createCachlyOpenClawConfig({
  url:       process.env.CACHLY_URL!,
  vectorUrl: process.env.CACHLY_VECTOR_URL,
  embedFn:   (t) =>
    openai.embeddings.create({ model: 'text-embedding-3-small', input: t })
      .then(r => r.data[0].embedding),
})

const app = new OpenClawApp({
  ...cachlyConfig,
  // ... rest of your config
})

This gives you: semantic cache + persistent sessions + Redis memory — all across WhatsApp, Telegram, Slack, Discord at once.


👥 Team Brain

One shared cachly instance → every team member gets smarter from each other's work:

// Alice fixes a deploy issue:
await brain.learnFromAttempts({ topic: 'deploy:k8s', outcome: 'success', whatWorked: '...' })

// Bob starts a session the next day:
await brain.sessionStart()
// → "💡 alice solved deploy:k8s 1d ago: ..."

Team plans from €99/mo (10 seats) at cachly.dev/teams.


🚀 Upgrade path

| Level | What you get | Setup | |-------|-------------|-------| | Free — Exact + BM25 | 20–50% reduction, in-process BM25+ fuzzy, zero config | CACHLY_URL only | | Free — Semantic cache | 60–90% cost reduction | + embedFn | | Speed tier | Hosted pgvector, higher hit rates at scale | Speed plan at cachly.dev | | Team Brain | Shared knowledge, team lessons, analytics | cachly.dev/teams |


🤖 Use with Python AI Agents

OpenClaw has a TypeScript SDK, but Python AI agents can use Cachly's REST API directly for persistent memory and semantic caching. Here are patterns for the most popular frameworks.

LangChain — Persistent Agent Memory

import os, requests
from langchain.memory import ConversationBufferMemory
from langchain.schema import BaseMemory
from typing import Any

CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}

class CachlyBrainMemory(BaseMemory):
    """LangChain memory backed by Cachly Brain — survives restarts."""

    @property
    def memory_variables(self):
        return ["brain_context"]

    def load_memory_variables(self, inputs: dict) -> dict:
        query = inputs.get("input", "")
        r = requests.post(
            f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
            headers=HEADERS,
            json={"query": query, "topK": 3},
        )
        lessons = r.json().get("results", [])
        context = "\n".join(f"- {l['whatWorked']}" for l in lessons if l.get("whatWorked"))
        return {"brain_context": context or "No relevant memory found."}

    def save_context(self, inputs: dict, outputs: dict) -> None:
        # Learn from what the agent discovered
        output = outputs.get("output", "")
        if output:
            requests.post(
                f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
                headers=HEADERS,
                json={"topic": "agent:langchain", "outcome": "success", "whatWorked": output[:500]},
            )

    def clear(self):
        pass  # Brain is persistent — clear not supported by design


# Usage:
from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI

memory = CachlyBrainMemory()
agent = initialize_agent(
    tools=[...],
    llm=ChatOpenAI(model="gpt-4o"),
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    system_message="You have access to a persistent Brain. brain_context = {brain_context}",
)

CrewAI — Shared Team Brain Tool

import os, requests
from crewai_tools import BaseTool
from pydantic import BaseModel, Field

CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}"}

class RecallInput(BaseModel):
    query: str = Field(description="What to search for in the Brain")

class CachlyRecallTool(BaseTool):
    name: str = "cachly_brain_recall"
    description: str = "Search persistent memory for lessons, solutions, and context from past work"
    args_schema: type[BaseModel] = RecallInput

    def _run(self, query: str) -> str:
        r = requests.post(
            f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
            headers={**HEADERS, "Content-Type": "application/json"},
            json={"query": query, "topK": 5},
        )
        results = r.json().get("results", [])
        if not results:
            return "No relevant memory found."
        return "\n".join(f"[{l['topic']}] {l['whatWorked']}" for l in results)

class LearnInput(BaseModel):
    topic: str = Field(description="Category slug like 'deploy:api' or 'fix:auth'")
    what_worked: str = Field(description="What solution worked")
    outcome: str = Field(default="success", description="success | failure | partial")

class CachlyLearnTool(BaseTool):
    name: str = "cachly_brain_learn"
    description: str = "Store a lesson in persistent memory so future agents can benefit from it"
    args_schema: type[BaseModel] = LearnInput

    def _run(self, topic: str, what_worked: str, outcome: str = "success") -> str:
        requests.post(
            f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
            headers={**HEADERS, "Content-Type": "application/json"},
            json={"topic": topic, "outcome": outcome, "whatWorked": what_worked},
        )
        return f"✅ Stored lesson: {topic}"


# Usage with CrewAI:
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Research Analyst",
    goal="Research topics and store findings for the team",
    tools=[CachlyRecallTool(), CachlyLearnTool()],
    backstory="You have a persistent memory that survives across sessions.",
)

AutoGen / Microsoft AutoGen

import os, requests
from autogen import AssistantAgent, UserProxyAgent

CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}

def recall_brain(query: str) -> str:
    """Search Cachly Brain for relevant memory."""
    r = requests.post(
        f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
        headers=HEADERS, json={"query": query, "topK": 5},
    )
    results = r.json().get("results", [])
    return "\n".join(f"• [{l['topic']}] {l['whatWorked']}" for l in results) or "No memory found."

def store_lesson(topic: str, what_worked: str, outcome: str = "success") -> str:
    """Store a lesson in Cachly Brain."""
    requests.post(
        f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
        headers=HEADERS, json={"topic": topic, "outcome": outcome, "whatWorked": what_worked},
    )
    return f"Stored: {topic}"

assistant = AssistantAgent(
    name="CachlyAssistant",
    system_message="""You are a helpful AI with persistent memory via Cachly Brain.
ALWAYS start by calling recall_brain() with the user's query.
ALWAYS end by calling store_lesson() with what you discovered.""",
    llm_config={
        "functions": [
            {"name": "recall_brain", "description": "Search persistent memory", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}},
            {"name": "store_lesson", "description": "Store a lesson", "parameters": {"type": "object", "properties": {"topic": {"type": "string"}, "what_worked": {"type": "string"}, "outcome": {"type": "string"}}, "required": ["topic", "what_worked"]}},
        ],
    },
)

LlamaIndex — QueryEngine with Cachly Memory

import os, requests
from llama_index.core.memory import BaseMemory
from llama_index.core.schema import TextNode

CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}

class CachlyMemory(BaseMemory):
    """LlamaIndex memory backed by Cachly Brain."""

    def get(self, input: str, **kwargs) -> list[TextNode]:
        r = requests.post(
            f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
            headers=HEADERS, json={"query": input, "topK": 5},
        )
        return [
            TextNode(text=f"[{l['topic']}] {l.get('whatWorked', '')}")
            for l in r.json().get("results", [])
        ]

    def put(self, messages) -> None:
        for msg in messages:
            if hasattr(msg, "content") and msg.content:
                requests.post(
                    f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
                    headers=HEADERS,
                    json={"topic": "agent:llamaindex", "outcome": "success", "whatWorked": str(msg.content)[:500]},
                )

    def reset(self) -> None:
        pass  # Persistent by design


# Usage:
from llama_index.core.chat_engine import CondensePlusContextChatEngine

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=CachlyMemory(),
    verbose=True,
)
response = chat_engine.chat("How did we fix the last deployment issue?")

Semantic Cache for LLM API Calls (Python)

Skip expensive LLM calls for semantically similar prompts — no embeddings needed on your side:

import os, hashlib, requests

CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}

def cached_llm_call(prompt: str, llm_fn, namespace: str = "cachly:sem:qa") -> str:
    """Call LLM with semantic caching via Cachly."""
    # 1. Check semantic cache
    r = requests.post(
        f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/cache/semantic-search",
        headers=HEADERS,
        json={"query": prompt, "namespace": namespace, "threshold": 0.85},
    )
    hit = r.json().get("hit")
    if hit:
        return hit["value"]  # Cache hit — no LLM call needed 🎉

    # 2. Cache miss — call LLM
    response = llm_fn(prompt)

    # 3. Store in semantic cache
    key = hashlib.sha256(prompt.encode()).hexdigest()[:16]
    requests.post(
        f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/cache/semantic",
        headers=HEADERS,
        json={"key": key, "value": response, "namespace": namespace, "prompt": prompt},
    )
    return response


# Usage with any LLM:
import openai
client = openai.OpenAI()

def call_gpt(prompt: str) -> str:
    return client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

answer = cached_llm_call("What is the capital of France?", call_gpt)

Environment Setup for Python Agents

pip install requests python-dotenv

# .env
CACHLY_URL=https://api.cachly.dev
CACHLY_JWT=cky_live_...          # from cachly.dev → Dashboard → API Keys
CACHLY_BRAIN_INSTANCE_ID=...     # from cachly.dev → Dashboard → Brain

Get your free instance at cachly.dev/setup-ai — no credit card required.


Links


Apache-2.0 © cachly.dev