@cachly-dev/openclaw
v0.1.5
Published
Cut LLM costs 60–90% in 3 lines. Semantic cache, persistent sessions, AI memory — standalone or OpenClaw adapter.
Downloads
738
Maintainers
Readme
@cachly-dev/openclaw
Cut LLM API costs by 60–90% in 3 lines of code.
Semantic cache, persistent sessions, and AI memory — works standalone or as a drop-in OpenClaw adapter.
Zero-friction start
npm install @cachly-dev/openclawSet one env var → done:
CACHLY_URL=redis://:[email protected]:6379Get a free instance at cachly.dev — no credit card, no infra.
⚡ Semantic LLM Cache — 3 lines
Every time a user asks the same question in different words, you pay OpenAI again. This stops that.
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
// Wrap any LLM call — that's it
const answer = await cache.getOrSet(
userPrompt,
() => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] })
)Without an embed function + no vectorUrl: exact-match caching + local BM25+ fuzzy search kick in immediately (20–50% savings). No API calls. "how do I reset password?" matches "password reset help" — pure in-process.
Without an embed function + vectorUrl: BM25 + hosted pgvector index, higher hit rates across large caches.
Add semantic matching for 60–90% savings (10 more lines):
const cache = createSemanticLLMCache({
url: process.env.CACHLY_URL!,
vectorUrl: process.env.CACHLY_VECTOR_URL, // from cachly.dev dashboard
embedFn: (text) =>
openai.embeddings.create({ model: 'text-embedding-3-small', input: text })
.then(r => r.data[0].embedding),
threshold: 0.92, // cosine similarity (default)
ttl: 3600, // seconds
})"How do I reset my password?" → "How can I reset my pw?" → same cache hit. 💰
📊 What you save
| Questions/day | Cache hit rate | Monthly saving (GPT-4o) | |---------------|---------------|------------------------| | 100 | 40% exact | ~$8 | | 100 | 70% semantic | ~$22 | | 1 000 | 70% semantic | ~$220 | | 10 000 | 70% semantic | ~$2 200 |
After 10 cache hits, the console logs:
🎯 cachly: 12,340 tokens saved this session (10 hits)
Full stats → cachly.dev/dashboard(Cost breakdown available in the dashboard.)
🗄️ Session Store
Persist conversation history in Redis — no cold starts, no lost context:
import { createCachlySessionStore } from '@cachly-dev/openclaw'
const sessions = createCachlySessionStore({
url: process.env.CACHLY_URL!,
ttl: 604800, // 7 days
})
const history = await sessions.get(userId)
await sessions.set(userId, [...history, { role: 'user', content: message }])Works with any LLM framework — OpenAI, LangChain, Vercel AI SDK, etc.
🧠 Memory Adapter
Long-term semantic memory — store facts, recall by meaning:
import { createCachlyMemoryAdapter } from '@cachly-dev/openclaw'
const memory = createCachlyMemoryAdapter({
url: process.env.CACHLY_URL!,
vectorUrl: process.env.CACHLY_VECTOR_URL!,
embedFn: myEmbedFn,
ttl: 7776000, // 90 days
})
await memory.store({ id: 'pref-1', text: 'User prefers TypeScript over Python' })
const results = await memory.search('programming language preference', { topK: 5 })
// → [{ text: 'User prefers TypeScript over Python', score: 0.97 }]🔍 Brain Search (BM25+)
Full-text search over cached data — no embeddings needed:
import { brainSearch } from '@cachly-dev/openclaw'
const results = await brainSearch(process.env.CACHLY_VECTOR_URL!, 'deploy authentication error')
// → [{ key: 'lesson:fix:auth', score: 4.2, preview: '...' }]🧩 Standalone — works with any LLM stack
No OpenClaw needed. Drop into LangChain, Vercel AI SDK, plain fetch, or any custom pipeline:
LangChain
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { ChatOpenAI } from '@langchain/openai'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
const llm = new ChatOpenAI()
async function cachedInvoke(prompt: string) {
return cache.getOrSet(
prompt,
() => llm.invoke(prompt).then(r => ({ content: r.content as string, model: 'gpt-4o' }))
)
}Vercel AI SDK
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
export async function POST(req: Request) {
const { prompt } = await req.json()
const result = await cache.getOrSet(
prompt,
() => generateText({ model: openai('gpt-4o'), prompt }).then(r => ({ content: r.text, model: 'gpt-4o' }))
)
return Response.json(result)
}Plain fetch / any provider
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
const answer = await cache.getOrSet(
userMessage,
async () => {
const res = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: { 'x-api-key': process.env.ANTHROPIC_KEY!, 'content-type': 'application/json' },
body: JSON.stringify({ model: 'claude-opus-4-5', max_tokens: 1024, messages: [{ role: 'user', content: userMessage }] }),
})
const json = await res.json()
return { content: json.content[0].text, model: 'claude-opus-4-5', inputTokens: json.usage.input_tokens, outputTokens: json.usage.output_tokens }
}
)🦾 OpenClaw adapter (bonus)
If you use OpenClaw (22-channel AI assistant), one function wires everything:
import { createCachlyOpenClawConfig } from '@cachly-dev/openclaw'
import OpenAI from 'openai'
const openai = new OpenAI()
const cachlyConfig = await createCachlyOpenClawConfig({
url: process.env.CACHLY_URL!,
vectorUrl: process.env.CACHLY_VECTOR_URL,
embedFn: (t) =>
openai.embeddings.create({ model: 'text-embedding-3-small', input: t })
.then(r => r.data[0].embedding),
})
const app = new OpenClawApp({
...cachlyConfig,
// ... rest of your config
})This gives you: semantic cache + persistent sessions + Redis memory — all across WhatsApp, Telegram, Slack, Discord at once.
👥 Team Brain
One shared cachly instance → every team member gets smarter from each other's work:
// Alice fixes a deploy issue:
await brain.learnFromAttempts({ topic: 'deploy:k8s', outcome: 'success', whatWorked: '...' })
// Bob starts a session the next day:
await brain.sessionStart()
// → "💡 alice solved deploy:k8s 1d ago: ..."Team plans from €99/mo (10 seats) at cachly.dev/teams.
🚀 Upgrade path
| Level | What you get | Setup |
|-------|-------------|-------|
| Free — Exact + BM25 | 20–50% reduction, in-process BM25+ fuzzy, zero config | CACHLY_URL only |
| Free — Semantic cache | 60–90% cost reduction | + embedFn |
| Speed tier | Hosted pgvector, higher hit rates at scale | Speed plan at cachly.dev |
| Team Brain | Shared knowledge, team lessons, analytics | cachly.dev/teams |
Links
- 📖 cachly.dev docs
- 🧠 AI Memory / MCP Server
- 📦
@cachly-dev/mcp-server— give your AI editor persistent memory (51 MCP tools) - 🤖 OpenClaw
- 📦 npm
- 🐛 Issues
Apache-2.0 © cachly.dev
