@dory-agentic/dory-agentic-sdk

v0.2.0

Published

a month ago

Dory Agentic SDK: LFS context management with AI SDK tool loops for long-running agents

0High
0Medium
0Low

agenaidy

ai-sdk agent context-window lfs long-running tool-loop memory llm

The story of a session that would not end

Imagine you pair with an agent on a real project. Not a five-message demo, but a long session. You explore a repo, run tools, fix tests, argue about architecture, ship a patch, then ask a question that depends on something you said two hours ago.

Most agent stacks treat context like a fixed-size backpack. Every new turn adds weight. Eventually something has to fall out: old instructions, a failing test log, the variable name you agreed on in message twelve. The model does not get angry. It simply stops seeing what mattered.

Dory Agentic SDK exists for that moment.

We built Long-Form Session (LFS), a virtual memory layer that sits between your agent loop and the model. The full conversation still lives in a session store, but what the model sees on each call is curated dynamically: expanded when there is room, compressed when pressure rises, and restorable when the agent needs the exact words again.

You keep the long story. The model keeps a window it can actually think inside.

Proof at million-token scale

On AgencyBench V2 (GAIR/AgencyBench, ACL 2026), six scenarios were run with ~1M tokens of agent history each (Backend and Code domains, gemini-2.5-flash-lite). Native Gemini hit context truncation on every run. With LFS v4.6:

| Metric | Native Gemini | LFS v4.6 | |--------|---------------|----------| | Avg rubric score | 93.7 / 100 | 99.7 / 100 (+6.0) | | Avg latency | 25.6s | 19.0s (~26% faster) | | Success rate | 6 / 6 | 6 / 6 | | Tokens per call (typical) | ~1.42M (truncated) | ~720K (managed) |

Standout case: Code scenario 3 scored 80/100 natively vs 100/100 with LFS (+20), after 524 parity swaps and a memory reflection that hydrated 9 archived chunks back into active context. Backend and Code category averages moved from 96.3 and 91.0 to 99.3 and 100.0 respectively.

That is the story in numbers: the session stays long, the window stays honest, and the agent can still find the needle.

The long session problem

After dozens of turns, three forces collide:

Volume: code, tool output, and reasoning traces pile up faster than anyone summarizes them.
A hard ceiling: every model has a context limit; you cannot paste infinity into one call.
Silent loss: naive truncation drops the wrong messages: not always the oldest, not always the least important.

The long session problem: chat timeline grows while the context window fills to 100%

That is the gap we address. Not by pretending the window is bigger, but by managing it like memory.

What changes with LFS

| Without dynamic context | With Dory Agentic SDK | |-------------------------|------------------------| | One growing blob sent every call | A session store plus a prepared view per step | | Old turns vanish unpredictably | Low-utility chunks archive with labeled placeholders | | Agent cannot recover detail | remind tool hydrates exact text when needed | | Same policy at turn 3 and turn 300 | Preflight routing: light path when calm, full MMU when pressured |

Before and after: lost context vs organized active and archived memory

Benefits for builders

Hours-long sessions: coding agents, support bots, research assistants that accumulate real history.
Dynamic context window: fill ratio is monitored every model round; eviction tracks relevance, recency, and agent credit scores, not arbitrary cutoffs.
Tool-loop native: built on the AI SDK ToolLoopAgent; LFS runs in prepareStep before each inference, so every tool call sees a fresh, budgeted prompt.
Production-minded: path-safe skill loading, MCP lifecycle disposal, SSRF guards, session cache bounds.
Two lanes: standard for baseline comparison; optimized for the full memory manager.

How the agent scores messages (and what gets forgotten)

LFS does not delete history. It ranks it every turn so the least useful material leaves the active window first, while the full session remains in storage.

Scoring and long-term memory: utility, forgetScore, and archive path

Step 1: Chunk and index

Large user turns and tool output are split into chunks. Each chunk is embedded in a lightweight vector index so the current user query can retrieve semantically related pieces.

Step 2: Utility score (what to keep in focus)

For each chunk, LFS computes utility from the current query:

U = β · S_rel + (1 − β) · W_agent

| Symbol | Meaning | |--------|---------| | S_rel | Semantic similarity between the chunk and the active query (vector retrieval) | | W_agent | Agent-assigned credit weight (what the model marked as important via the scoring manifest) | | β | Blend factor (default 0.5): balance retrieval vs agent judgment |

forgetScore is derived as 1 − U. Low forgetScore means “keep near the model.” High forgetScore means “safe to archive.”

Step 3: Temporal decay (what drifts toward archive)

Even strong chunks age unless the agent keeps using them:

Each turn, decay increases forgetScore for conversational messages (mass-aware: larger messages decay faster under pressure).
CODE_ASSET chunks get a short shield when referenced within the last few turns, then fade exponentially as the task moves on.
TOOL_LOG chunks are penalized first under pressure (noisy logs leave before core instructions).
Hydrated chunks (brought back via remind or reflection) get a grace period with forgetScore pinned to 0.

When forgetScore reaches 1.0, the message is a candidate for displacement: content moves to long-term memory and the active window shows a compact, labeled placeholder.

Step 4: Parity swap into long-term memory

When projected tokens exceed the memory cap (default 70% of contextMax, tunable from 50% to 99%):

Chunks are sorted by utility (lowest first).
Lowest-utility active chunks are marked isArchived and copied to longTermMemory.
The model still sees where something went (ids, summaries, manifest), not a silent hole.
The agent can call remind or trigger a reflection to pull exact text back (AgencyBench run: 9 chunks hydrated in one Code scenario).

Credit budget scales with window size: under pressure the model receives a bounded number of scoring credits so it must declare what it relied on, which feeds W_agent on later turns.

How it works (every model call)

Think of each turn as a flight check:

Merge new messages into the session.
Preflight: if projected tokens stay below ~50% of the window, take the fast PASS_THRU lane; otherwise engage ELASTIC_SWAP and the memory manager.
Chunk large user content and index it for similarity search.
Score and swap: parity swap ranks chunks by utility (semantic match + agent weights + decay); the least useful leave the window first.
Budget credits: under pressure, the model receives a scoring budget and manifest metadata so it cites what it actually used.
Format: only then does the prompt go to the model, wrapped, labeled, and honest about what is archived.

LFS flow: session store → preflight → chunk → vector match → parity swap → dynamic context

flowchart LR
  subgraph store [Session]
    History[fullHistory]
    LTM[longTermMemory]
    Weights[agentWeights]
  end

  subgraph prepare [prepareStep on every tool call]
    Router[preflightRouter]
    Chunk[chunkManager]
    Vector[vectorDb]
    Parity[paritySwapper]
    Engine[memoryEngine]
    Format[formatForModel]
  end

  History --> Router
  Router --> Chunk --> Vector --> Parity --> Engine --> Format
  Parity --> LTM
  Format --> Model[(Language model)]

The agent loop (tools, streaming, cancellation) stays in the AI SDK. LFS only decides what the model reads.

A day in the life (narrative walkthrough)

Turn 1 to 5: Plenty of space.
Preflight returns PASS_THRU. The model sees the conversation almost as-is. LFS is quiet; you pay almost no overhead.

Turn 40: A fat log arrives.
Tool output pushes projected tokens past the threshold. LFS switches to ELASTIC_SWAP, chunks the new material, and starts scoring older messages. Anything irrelevant drifts toward archive summaries: still named, still addressable, no longer eating tokens.

Turn 41: The agent needs the exact error line.
It calls remind with a message or chunk id. The SDK pulls the raw text back from long-term storage into active context for that step. Surgical, not a full replay of history.

Turn 200: Still the same session id.
createLfsAgent and LfsContextAdapter share one session store. Memory cap fraction (default 70%, tunable 50% to 99%) tells LFS how aggressively to keep the window below the physical limit before the provider truncates.

That is the innovation: context as a managed resource, not a static string.

Install

npm install @dory-agentic/dory-agentic-sdk

Monorepo / local development:

npm install file:../dory-agentic-sdk

{
  "dependencies": {
    "@dory-agentic/dory-agentic-sdk": "file:../dory-agentic-sdk"
  }
}

Quick start

Full agent (tool loop + LFS)

import { createLfsAgent, createOllamaProvider } from "@dory-agentic/dory-agentic-sdk";

const ollama = createOllamaProvider();
const model = ollama("qwen2.5-coder:7b");

const agent = await createLfsAgent({
  sessionId: "project-alpha",
  lane: "optimized",
  model,
  contextMax: 32768,
  memoryCapFraction: 0.7,
});

// Use agent.toolLoopAgent with your AI SDK patterns, then:
await agent.dispose();

Prepare context only (bring your own loop)

import { LfsContextAdapter } from "@dory-agentic/dory-agentic-sdk";

const adapter = new LfsContextAdapter();
const prepared = await adapter.prepareStep({
  sessionId: "project-alpha",
  lane: "optimized",
  incomingMessages: history,
  replaceSessionHistory: true,
  contextMax: 32768,
  memoryCapFraction: 0.7,
  useRemindTool: true,
});

// prepared.messagesForModel → send to your model
// prepared.isMmuActive, prepared.executionLane → observability

One-shot helper

import { runLfsTurn } from "@dory-agentic/dory-agentic-sdk";

const result = await runLfsTurn({
  sessionId: "bench-1",
  lane: "optimized",
  incomingMessages: messages,
  replaceSessionHistory: true,
  contextMax: 8192,
});

Core concepts

| Concept | Meaning | |---------|---------| | Session store | Canonical history, archived chunks, forget scores, agent weights | | Utility / forgetScore | Rank for retention; archive when pressure forces lowest-U chunks out | | PASS_THRU | Low pressure; minimal memory management | | ELASTIC_SWAP | High pressure; chunking, parity swap, credit budget active | | Memory cap fraction | How full LFS allows the window to get before aggressive eviction (50% to 99%) | | longTermMemory | Archived chunks keyed by id; placeholders stay in the active manifest | | remind | Tool to hydrate archived content back into active context | | Lanes | standard (baseline) vs optimized (full LFS) |

Configuration

| Variable | Purpose | |----------|---------| | OLLAMA_BASE_URL | Base URL for Ollama integration tests | | DORY_AGENTIC_SKILLS_ROOT | Allowed directory root for skillLoader | | DORY_AGENTIC_MCP_FETCH_ALLOWLIST | Comma-separated host allowlist for fetch MCP URLs | | MCP_FETCH_URL_ALLOWLIST | Alias for the above |

Development

npm install
npm run typecheck
npm run test:unit      # fast, no network
npm run test:ollama    # optional; needs local Ollama
npm run build

Architecture (repository layout)

src/
  agent/          createLfsAgent, runLfsTurn, ToolLoopAgent wiring
  lfs/            preflight, chunking, vector index, parity swap, memory engine
  history/        manifest formatting for the model
  tools/          remind, skillLoader, registry
  mcp/            MCP client helpers
  providers/      Ollama and provider registry

Related packages

@dory-agentic/dory-agentic-sdk (this repo): LFS virtual memory for AI SDK agent loops.
@dorycode-ai/sdk: separate HTTP client for the DoryCode server.

License

MIT. See package.json.