@cachly-dev/openclaw

v0.1.5

Published

8 days ago

Cut LLM costs 60–90% in 3 lines. Semantic cache, persistent sessions, AI memory — standalone or OpenClaw adapter.

Downloads

738

0High
0Medium
0Low

heinrichneb

openclaw cachly cache redis llm semantic-cache session

@cachly-dev/openclaw

Cut LLM API costs by 60–90% in 3 lines of code.
Semantic cache, persistent sessions, and AI memory — works standalone or as a drop-in OpenClaw adapter.

Zero-friction start

npm install @cachly-dev/openclaw

Set one env var → done:

CACHLY_URL=redis://:[email protected]:6379

Get a free instance at cachly.dev — no credit card, no infra.

⚡ Semantic LLM Cache — 3 lines

Every time a user asks the same question in different words, you pay OpenAI again. This stops that.

import { createSemanticLLMCache } from '@cachly-dev/openclaw'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })

// Wrap any LLM call — that's it
const answer = await cache.getOrSet(
  userPrompt,
  () => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] })
)

Without an embed function + no vectorUrl: exact-match caching + local BM25+ fuzzy search kick in immediately (20–50% savings). No API calls. "how do I reset password?" matches "password reset help" — pure in-process.

Without an embed function + vectorUrl: BM25 + hosted pgvector index, higher hit rates across large caches.

Add semantic matching for 60–90% savings (10 more lines):

const cache = createSemanticLLMCache({
  url:       process.env.CACHLY_URL!,
  vectorUrl: process.env.CACHLY_VECTOR_URL,   // from cachly.dev dashboard
  embedFn:   (text) =>
    openai.embeddings.create({ model: 'text-embedding-3-small', input: text })
      .then(r => r.data[0].embedding),
  threshold: 0.92,   // cosine similarity (default)
  ttl:       3600,   // seconds
})

"How do I reset my password?" → "How can I reset my pw?" → same cache hit. 💰

📊 What you save

| Questions/day | Cache hit rate | Monthly saving (GPT-4o) | |---------------|---------------|------------------------| | 100 | 40% exact | ~$8 | | 100 | 70% semantic | ~$22 | | 1 000 | 70% semantic | ~$220 | | 10 000 | 70% semantic | ~$2 200 |

After 10 cache hits, the console logs:

🎯 cachly: 12,340 tokens saved this session (10 hits)
   Full stats → cachly.dev/dashboard

(Cost breakdown available in the dashboard.)

🗄️ Session Store

Persist conversation history in Redis — no cold starts, no lost context:

import { createCachlySessionStore } from '@cachly-dev/openclaw'

const sessions = createCachlySessionStore({
  url: process.env.CACHLY_URL!,
  ttl: 604800,  // 7 days
})

const history = await sessions.get(userId)
await sessions.set(userId, [...history, { role: 'user', content: message }])

Works with any LLM framework — OpenAI, LangChain, Vercel AI SDK, etc.

🧠 Memory Adapter

Long-term semantic memory — store facts, recall by meaning:

import { createCachlyMemoryAdapter } from '@cachly-dev/openclaw'

const memory = createCachlyMemoryAdapter({
  url:       process.env.CACHLY_URL!,
  vectorUrl: process.env.CACHLY_VECTOR_URL!,
  embedFn:   myEmbedFn,
  ttl:       7776000,  // 90 days
})

await memory.store({ id: 'pref-1', text: 'User prefers TypeScript over Python' })
const results = await memory.search('programming language preference', { topK: 5 })
// → [{ text: 'User prefers TypeScript over Python', score: 0.97 }]

🔍 Brain Search (BM25+)

Full-text search over cached data — no embeddings needed:

import { brainSearch } from '@cachly-dev/openclaw'

const results = await brainSearch(process.env.CACHLY_VECTOR_URL!, 'deploy authentication error')
// → [{ key: 'lesson:fix:auth', score: 4.2, preview: '...' }]

🧩 Standalone — works with any LLM stack

No OpenClaw needed. Drop into LangChain, Vercel AI SDK, plain fetch, or any custom pipeline:

LangChain

import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { ChatOpenAI } from '@langchain/openai'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
const llm = new ChatOpenAI()

async function cachedInvoke(prompt: string) {
  return cache.getOrSet(
    prompt,
    () => llm.invoke(prompt).then(r => ({ content: r.content as string, model: 'gpt-4o' }))
  )
}

Vercel AI SDK

import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })

export async function POST(req: Request) {
  const { prompt } = await req.json()
  const result = await cache.getOrSet(
    prompt,
    () => generateText({ model: openai('gpt-4o'), prompt }).then(r => ({ content: r.text, model: 'gpt-4o' }))
  )
  return Response.json(result)
}

Plain fetch / any provider

import { createSemanticLLMCache } from '@cachly-dev/openclaw'

const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })

const answer = await cache.getOrSet(
  userMessage,
  async () => {
    const res = await fetch('https://api.anthropic.com/v1/messages', {
      method: 'POST',
      headers: { 'x-api-key': process.env.ANTHROPIC_KEY!, 'content-type': 'application/json' },
      body: JSON.stringify({ model: 'claude-opus-4-5', max_tokens: 1024, messages: [{ role: 'user', content: userMessage }] }),
    })
    const json = await res.json()
    return { content: json.content[0].text, model: 'claude-opus-4-5', inputTokens: json.usage.input_tokens, outputTokens: json.usage.output_tokens }
  }
)

🦾 OpenClaw adapter (bonus)

If you use OpenClaw (22-channel AI assistant), one function wires everything:

import { createCachlyOpenClawConfig } from '@cachly-dev/openclaw'
import OpenAI from 'openai'

const openai = new OpenAI()

const cachlyConfig = await createCachlyOpenClawConfig({
  url:       process.env.CACHLY_URL!,
  vectorUrl: process.env.CACHLY_VECTOR_URL,
  embedFn:   (t) =>
    openai.embeddings.create({ model: 'text-embedding-3-small', input: t })
      .then(r => r.data[0].embedding),
})

const app = new OpenClawApp({
  ...cachlyConfig,
  // ... rest of your config
})

This gives you: semantic cache + persistent sessions + Redis memory — all across WhatsApp, Telegram, Slack, Discord at once.

👥 Team Brain

One shared cachly instance → every team member gets smarter from each other's work:

// Alice fixes a deploy issue:
await brain.learnFromAttempts({ topic: 'deploy:k8s', outcome: 'success', whatWorked: '...' })

// Bob starts a session the next day:
await brain.sessionStart()
// → "💡 alice solved deploy:k8s 1d ago: ..."

Team plans from €99/mo (10 seats) at cachly.dev/teams.

🚀 Upgrade path

| Level | What you get | Setup | |-------|-------------|-------| | Free — Exact + BM25 | 20–50% reduction, in-process BM25+ fuzzy, zero config | CACHLY_URL only | | Free — Semantic cache | 60–90% cost reduction | + embedFn | | Speed tier | Hosted pgvector, higher hit rates at scale | Speed plan at cachly.dev | | Team Brain | Shared knowledge, team lessons, analytics | cachly.dev/teams |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@cachly-dev/openclaw

Zero-friction start

⚡ Semantic LLM Cache — 3 lines

📊 What you save

🗄️ Session Store

🧠 Memory Adapter

🔍 Brain Search (BM25+)

🧩 Standalone — works with any LLM stack

LangChain

Vercel AI SDK

Plain fetch / any provider

🦾 OpenClaw adapter (bonus)

👥 Team Brain

🚀 Upgrade path

Links