@dory-agentic/dory-agentic-sdk
v0.2.0
Published
Dory Agentic SDK: LFS context management with AI SDK tool loops for long-running agents
Maintainers
Readme
The story of a session that would not end
Imagine you pair with an agent on a real project. Not a five-message demo, but a long session. You explore a repo, run tools, fix tests, argue about architecture, ship a patch, then ask a question that depends on something you said two hours ago.
Most agent stacks treat context like a fixed-size backpack. Every new turn adds weight. Eventually something has to fall out: old instructions, a failing test log, the variable name you agreed on in message twelve. The model does not get angry. It simply stops seeing what mattered.
Dory Agentic SDK exists for that moment.
We built Long-Form Session (LFS), a virtual memory layer that sits between your agent loop and the model. The full conversation still lives in a session store, but what the model sees on each call is curated dynamically: expanded when there is room, compressed when pressure rises, and restorable when the agent needs the exact words again.
You keep the long story. The model keeps a window it can actually think inside.
Proof at million-token scale
On AgencyBench V2 (GAIR/AgencyBench, ACL 2026), six scenarios were run with ~1M tokens of agent history each (Backend and Code domains, gemini-2.5-flash-lite). Native Gemini hit context truncation on every run. With LFS v4.6:
| Metric | Native Gemini | LFS v4.6 | |--------|---------------|----------| | Avg rubric score | 93.7 / 100 | 99.7 / 100 (+6.0) | | Avg latency | 25.6s | 19.0s (~26% faster) | | Success rate | 6 / 6 | 6 / 6 | | Tokens per call (typical) | ~1.42M (truncated) | ~720K (managed) |
Standout case: Code scenario 3 scored 80/100 natively vs 100/100 with LFS (+20), after 524 parity swaps and a memory reflection that hydrated 9 archived chunks back into active context. Backend and Code category averages moved from 96.3 and 91.0 to 99.3 and 100.0 respectively.
That is the story in numbers: the session stays long, the window stays honest, and the agent can still find the needle.
The long session problem
After dozens of turns, three forces collide:
- Volume: code, tool output, and reasoning traces pile up faster than anyone summarizes them.
- A hard ceiling: every model has a context limit; you cannot paste infinity into one call.
- Silent loss: naive truncation drops the wrong messages: not always the oldest, not always the least important.

That is the gap we address. Not by pretending the window is bigger, but by managing it like memory.
What changes with LFS
| Without dynamic context | With Dory Agentic SDK |
|-------------------------|------------------------|
| One growing blob sent every call | A session store plus a prepared view per step |
| Old turns vanish unpredictably | Low-utility chunks archive with labeled placeholders |
| Agent cannot recover detail | remind tool hydrates exact text when needed |
| Same policy at turn 3 and turn 300 | Preflight routing: light path when calm, full MMU when pressured |

Benefits for builders
- Hours-long sessions: coding agents, support bots, research assistants that accumulate real history.
- Dynamic context window: fill ratio is monitored every model round; eviction tracks relevance, recency, and agent credit scores, not arbitrary cutoffs.
- Tool-loop native: built on the AI SDK
ToolLoopAgent; LFS runs inprepareStepbefore each inference, so every tool call sees a fresh, budgeted prompt. - Production-minded: path-safe skill loading, MCP lifecycle disposal, SSRF guards, session cache bounds.
- Two lanes:
standardfor baseline comparison;optimizedfor the full memory manager.
How the agent scores messages (and what gets forgotten)
LFS does not delete history. It ranks it every turn so the least useful material leaves the active window first, while the full session remains in storage.

Step 1: Chunk and index
Large user turns and tool output are split into chunks. Each chunk is embedded in a lightweight vector index so the current user query can retrieve semantically related pieces.
Step 2: Utility score (what to keep in focus)
For each chunk, LFS computes utility from the current query:
U = β · S_rel + (1 − β) · W_agent| Symbol | Meaning | |--------|---------| | S_rel | Semantic similarity between the chunk and the active query (vector retrieval) | | W_agent | Agent-assigned credit weight (what the model marked as important via the scoring manifest) | | β | Blend factor (default 0.5): balance retrieval vs agent judgment |
forgetScore is derived as 1 − U. Low forgetScore means “keep near the model.” High forgetScore means “safe to archive.”
Step 3: Temporal decay (what drifts toward archive)
Even strong chunks age unless the agent keeps using them:
- Each turn, decay increases forgetScore for conversational messages (mass-aware: larger messages decay faster under pressure).
- CODE_ASSET chunks get a short shield when referenced within the last few turns, then fade exponentially as the task moves on.
- TOOL_LOG chunks are penalized first under pressure (noisy logs leave before core instructions).
- Hydrated chunks (brought back via
remindor reflection) get a grace period with forgetScore pinned to 0.
When forgetScore reaches 1.0, the message is a candidate for displacement: content moves to long-term memory and the active window shows a compact, labeled placeholder.
Step 4: Parity swap into long-term memory
When projected tokens exceed the memory cap (default 70% of contextMax, tunable from 50% to 99%):
- Chunks are sorted by utility (lowest first).
- Lowest-utility active chunks are marked
isArchivedand copied tolongTermMemory. - The model still sees where something went (ids, summaries, manifest), not a silent hole.
- The agent can call
remindor trigger a reflection to pull exact text back (AgencyBench run: 9 chunks hydrated in one Code scenario).
Credit budget scales with window size: under pressure the model receives a bounded number of scoring credits so it must declare what it relied on, which feeds W_agent on later turns.
How it works (every model call)
Think of each turn as a flight check:
- Merge new messages into the session.
- Preflight: if projected tokens stay below ~50% of the window, take the fast PASS_THRU lane; otherwise engage ELASTIC_SWAP and the memory manager.
- Chunk large user content and index it for similarity search.
- Score and swap: parity swap ranks chunks by utility (semantic match + agent weights + decay); the least useful leave the window first.
- Budget credits: under pressure, the model receives a scoring budget and manifest metadata so it cites what it actually used.
- Format: only then does the prompt go to the model, wrapped, labeled, and honest about what is archived.

flowchart LR
subgraph store [Session]
History[fullHistory]
LTM[longTermMemory]
Weights[agentWeights]
end
subgraph prepare [prepareStep on every tool call]
Router[preflightRouter]
Chunk[chunkManager]
Vector[vectorDb]
Parity[paritySwapper]
Engine[memoryEngine]
Format[formatForModel]
end
History --> Router
Router --> Chunk --> Vector --> Parity --> Engine --> Format
Parity --> LTM
Format --> Model[(Language model)]The agent loop (tools, streaming, cancellation) stays in the AI SDK. LFS only decides what the model reads.
A day in the life (narrative walkthrough)
Turn 1 to 5: Plenty of space.
Preflight returns PASS_THRU. The model sees the conversation almost as-is. LFS is quiet; you pay almost no overhead.
Turn 40: A fat log arrives.
Tool output pushes projected tokens past the threshold. LFS switches to ELASTIC_SWAP, chunks the new material, and starts scoring older messages. Anything irrelevant drifts toward archive summaries: still named, still addressable, no longer eating tokens.
Turn 41: The agent needs the exact error line.
It calls remind with a message or chunk id. The SDK pulls the raw text back from long-term storage into active context for that step. Surgical, not a full replay of history.
Turn 200: Still the same session id.createLfsAgent and LfsContextAdapter share one session store. Memory cap fraction (default 70%, tunable 50% to 99%) tells LFS how aggressively to keep the window below the physical limit before the provider truncates.
That is the innovation: context as a managed resource, not a static string.
Install
npm install @dory-agentic/dory-agentic-sdkMonorepo / local development:
npm install file:../dory-agentic-sdk{
"dependencies": {
"@dory-agentic/dory-agentic-sdk": "file:../dory-agentic-sdk"
}
}Quick start
Full agent (tool loop + LFS)
import { createLfsAgent, createOllamaProvider } from "@dory-agentic/dory-agentic-sdk";
const ollama = createOllamaProvider();
const model = ollama("qwen2.5-coder:7b");
const agent = await createLfsAgent({
sessionId: "project-alpha",
lane: "optimized",
model,
contextMax: 32768,
memoryCapFraction: 0.7,
});
// Use agent.toolLoopAgent with your AI SDK patterns, then:
await agent.dispose();Prepare context only (bring your own loop)
import { LfsContextAdapter } from "@dory-agentic/dory-agentic-sdk";
const adapter = new LfsContextAdapter();
const prepared = await adapter.prepareStep({
sessionId: "project-alpha",
lane: "optimized",
incomingMessages: history,
replaceSessionHistory: true,
contextMax: 32768,
memoryCapFraction: 0.7,
useRemindTool: true,
});
// prepared.messagesForModel → send to your model
// prepared.isMmuActive, prepared.executionLane → observabilityOne-shot helper
import { runLfsTurn } from "@dory-agentic/dory-agentic-sdk";
const result = await runLfsTurn({
sessionId: "bench-1",
lane: "optimized",
incomingMessages: messages,
replaceSessionHistory: true,
contextMax: 8192,
});Core concepts
| Concept | Meaning |
|---------|---------|
| Session store | Canonical history, archived chunks, forget scores, agent weights |
| Utility / forgetScore | Rank for retention; archive when pressure forces lowest-U chunks out |
| PASS_THRU | Low pressure; minimal memory management |
| ELASTIC_SWAP | High pressure; chunking, parity swap, credit budget active |
| Memory cap fraction | How full LFS allows the window to get before aggressive eviction (50% to 99%) |
| longTermMemory | Archived chunks keyed by id; placeholders stay in the active manifest |
| remind | Tool to hydrate archived content back into active context |
| Lanes | standard (baseline) vs optimized (full LFS) |
Configuration
| Variable | Purpose |
|----------|---------|
| OLLAMA_BASE_URL | Base URL for Ollama integration tests |
| DORY_AGENTIC_SKILLS_ROOT | Allowed directory root for skillLoader |
| DORY_AGENTIC_MCP_FETCH_ALLOWLIST | Comma-separated host allowlist for fetch MCP URLs |
| MCP_FETCH_URL_ALLOWLIST | Alias for the above |
Development
npm install
npm run typecheck
npm run test:unit # fast, no network
npm run test:ollama # optional; needs local Ollama
npm run buildArchitecture (repository layout)
src/
agent/ createLfsAgent, runLfsTurn, ToolLoopAgent wiring
lfs/ preflight, chunking, vector index, parity swap, memory engine
history/ manifest formatting for the model
tools/ remind, skillLoader, registry
mcp/ MCP client helpers
providers/ Ollama and provider registryRelated packages
@dory-agentic/dory-agentic-sdk(this repo): LFS virtual memory for AI SDK agent loops.@dorycode-ai/sdk: separate HTTP client for the DoryCode server.
License
MIT. See package.json.
