raziel-agent
v1.0.0
Published
Framework-agnostic agent runtime: a tool-loop agent with 3-layer context management (pruning, memory flush, compaction), keyword memory, and scheduling. Adapter-injected — bring your own store, model provider, and delivery channel.
Maintainers
Readme
raziel-agent
A framework-agnostic agent runtime. A tool-loop agent with 3-layer context management (pruning → memory flush → compaction), built-in memory and scheduling tools, and pluggable I/O via adapters you implement.
It runs indefinitely without losing context: large tool results get pruned, durable facts get flushed to memory before they're summarized away, and old history gets compacted into a structured summary — all after the response is sent, so the user never waits on it.
Built on the Vercel AI SDK's ToolLoopAgent.
The context-management algorithms are adapted from
pi-mono and openclaw via
trustclaw.
Install
npm install raziel-agent aiai (the Vercel AI SDK, v6+) is a peer dependency.
Concept
The runtime owns the algorithms. You own the I/O, via three adapters:
| Adapter | You implement | Used for |
| --- | --- | --- |
| StoreAdapter | persistence | instances, messages, memory, cron jobs, compaction state |
| ProviderAdapter | model resolution | turning an instance's model id into an AI SDK model |
| DeliveryAdapter | outbound messages | sending replies (Telegram, etc.) — optional, only for cron/push flows |
This keeps the package free of any database, web framework, or AI provider lock-in. Bring Postgres or sqlite or in-memory; bring the Vercel AI Gateway or a self-hosted OpenAI-compatible endpoint.
Usage
import { prepareAgentRun } from "raziel-agent";
const { agent, messages } = await prepareAgentRun({
store, // your StoreAdapter
provider, // your ProviderAdapter
instanceId, // which agent instance
userMessage, // the incoming message
source: "web", // "web" | "telegram" | "cron"
thread: "main", // "main" rolling thread, or { sessionId } for a scratch session
});
// Web: stream the response back to the client.
const result = await agent.stream({ prompt: messages });
return result.toUIMessageStreamResponse();
// Telegram / cron: generate and deliver.
const result = await agent.generate({ prompt: messages });
await delivery.send({ instanceId, channel: "telegram", target: chatId, text: result.text });prepareAgentRun handles everything: loading the instance, searching
memory for relevant context, building the system prompt, loading +
pruning the conversation, persisting the user turn, and wiring an
onFinish that records the assistant turn and fires the
fire-and-forget post-response tail (memory flush + compaction).
The 3-layer context system
- Pruning — before every LLM call. Trims tool results > 4KB when context exceeds 30% of the window; hard-clears the oldest large tool results past 50%. The last 3 assistant turns are never pruned.
- Memory flush — once per compaction cycle, when context nears the compaction threshold. A single LLM call with only the memory tools, prompting the model to persist durable facts before they're summarized away.
- Compaction — after a response, fire-and-forget, when context exceeds the reserve threshold. Walks back ~20K tokens, snaps to a valid cut point (never splits a tool-call/tool-result pair), summarizes the older messages, and persists the summary under an optimistic lock.
Tunable via CompactionSettings; defaults: 200K window, 20K reserve,
20K keep-recent.
Built-in tools
memory_save— persist a durable fact.memory_search— retrieve relevant memories. TheStoreAdapterdecides how (keyword/BM25, vector similarity, …).schedule— create / list / delete cron jobs.
Threads
Each instance has one main rolling thread — Telegram, cron, memory,
and compaction all operate on it. The web surface can additionally open
ephemeral scratch sessions (thread: { sessionId }) that don't feed
the main thread's memory or compaction.
License
MIT
