torus-ai

v0.18.0

Published

8 days ago

Torus — a minimal, ICM-structured Agent SDK: agent loop, tools, in-process MCP, markdown-contract subagents, multimodal input, and a free-first model cascade (NVIDIA Kimi K2.6 / DeepSeek V4 → Gemini → Claude) with cost routing.

0High
0Medium
0Low

anfer

agent sdk llm mcp nvidia nim kimi deepseek anthropic claude gemini google multimodal icm pipeline router cascade

Torus

npm: torus-ai · repo: aenfr/torus-ai

A minimal Agent SDK whose architecture is the ICM folder structure. Inspired by the Claude Agent SDK — same core ideas (agent loop, tools, in-process MCP, subagents, permissions, streaming) — but agents and pipelines are defined as markdown contracts in folders, not framework code.

Built on the Interpretable Context Methodology (ICM): folder structure as agent architecture, plain text as the interface, layered context loading.

Quick start

Requires Node ≥ 22.6 (runs TypeScript natively — no build step).

node examples/blog-pipeline/run.ts     # or: npm run demo

This runs a 3-stage pipeline (research → draft → polish) with the offline MockProvider — no API key needed. Each stage writes an artifact to its output/ folder; open them to inspect the handoff.

What's inside

| Concept (Claude Agent SDK) | Here | |---|---| | The agentic loop | src/loop.ts — gather → call model → run tools → repeat | | tool() / createSdkMcpServer() | src/tools.ts — in-process MCP, mcp__<server>__<tool> namespacing | | Built-in tools | src/builtins.ts — read_file / write_file / list_dir | | Permissions / canUseTool | src/permissions.ts — allowlist + wildcard + callback gate | | Subagents | src/subagents.ts — markdown stage contracts (Layer 2) | | Context management | src/context.ts — layered, scoped loading (Layers 0–4) | | query() streaming | src/index.ts — single-shot run yielding events | | Pipeline orchestration | src/pipeline.ts — sequential stages + review gates | | Lifecycle hooks | src/hooks.ts — PreToolUse / PostToolUse / Stop around the loop | | Eval / verification | src/eval.ts — LLM-judge, runEval, passAtK → feeds SkillLibrary | | Config security audit | src/audit.ts — deterministic A–F lint of packs / MCP / permissions | | Runtime content shield | src/shield.ts — strips invisible text, fences injection, blocks memory poisoning | | Control plane | src/control.ts — kill switch + circuit breaker + budget (Governor), owner authority | | Observability ledger | src/observability.ts — append-only record of tool calls, denials, halts | | Context compaction | src/compaction.ts — fold old turns into a summary / the Dream | | Atomic semantic memory | src/memory.ts — one fact, one file; propose → review → receipt | | Browser hands | src/browser.ts — wraps the agent-browser CLI as mcp__browser__* tools | | Model backends | src/providers/ — NvidiaProvider, GeminiProvider, AnthropicProvider, MockProvider + CascadeProvider |

Three ways to use it

1. Single agent run (query) — mirrors the Claude Agent SDK streaming shape:

import { query, MockProvider, tool, createSdkMcpServer } from "./src/index.ts";

const time = tool("now", "Current ISO time", { type: "object", properties: {} },
  () => ({ content: new Date().toISOString() }));
const clock = createSdkMcpServer({ name: "clock", tools: [time] });

for await (const ev of query("What time is it?", {
  provider: new MockProvider(),
  mcpServers: [clock],
  permissions: { allowedTools: ["mcp__clock__*"] },
})) {
  if (ev.type === "result") console.log(ev.finalText);
}

2. Folder pipeline (runPipeline) — the ICM workflow. Drop numbered stage folders with CONTEXT.md contracts under stages/, then run. See examples/blog-pipeline.

3. Real model — swap the provider and install the optional dep:

npm i @anthropic-ai/sdk        # Claude
export ANTHROPIC_API_KEY=sk-ant-...
# or
npm i @google/genai            # Gemini
export GOOGLE_API_KEY=...

import { AnthropicProvider, GeminiProvider } from "torus-ai";
const claude = new AnthropicProvider({ model: "claude-sonnet-4-6" });
const gemini = new GeminiProvider({ model: "gemini-2.5-flash" });

Providers & the default cascade

Four pluggable providers implement the same ModelProvider interface and drop into query(), runPipeline(), or runLoop() interchangeably:

| Provider | Package | Env | Default model | |---|---|---|---| | NvidiaProvider | none (fetch) | NVIDIA_API_KEY | moonshotai/kimi-k2.6 | | GeminiProvider | @google/genai | GOOGLE_API_KEY | gemini-2.5-flash | | AnthropicProvider | @anthropic-ai/sdk | ANTHROPIC_API_KEY | claude-sonnet-4-6 | | MockProvider | none | — | offline |

The default is a free-first cascade. If you don't pass a provider, query() uses createDefaultProvider() — it tries each step and falls through on failure:

NVIDIA Kimi K2.6 — main; agentic + tools (text), free NIM endpoint
NVIDIA DeepSeek V4 Pro — 1M-context text model, free; skipped for media
NVIDIA Llama-3.2-90B-Vision — image requests, free
Gemini 2.5 Flash — final fallback (image + video), different provider for resilience

import { query } from "torus-ai";          // NVIDIA_API_KEY in env → cascade default
for await (const ev of query("Explain MoE in one line")) { /* ... */ }

import { createDefaultProvider } from "torus-ai";
const provider = createDefaultProvider({ mainModel: "moonshotai/kimi-k2.6" });

It's capability-aware: image requests skip text-only steps and route to a vision model; video requests route only to a video-capable step.

Multimodal (image verified, video experimental)

Pass content blocks instead of a string. Images route to a vision step (NVIDIA Llama-Vision → Gemini); video routes to Gemini.

await query([
  { type: "text", text: "What animal is this?" },
  { type: "image", url: "https://example.com/cat.jpg" },   // or { data: base64, mimeType }
]);

Note: Kimi K2.6's docs claim vision, but its NIM endpoint is text-only in practice (verified) — so the cascade sends images to a real vision model instead. Video is experimental and currently served only by Gemini.

Cost routing (per provider)

Each model provider also supports route: true — fast heuristics, then a structured "judge" call on the cheap model, picking cheap vs expensive (never throws; falls back to expensive). Exposed for Claude and Gemini today:

new GeminiProvider({ route: true });   // gemini-2.5-flash-lite ↔ gemini-2.5-pro
new AnthropicProvider({ route: true }); // claude-haiku-4-5 ↔ claude-sonnet-4-6
import { getRoutingStats } from "torus-ai";

Keeping models fresh

models/registry.json is the source of truth for the cascade; models/POLICY.md is the rule for what earns a slot. A weekly GitHub Action (model-watch.yml) pulls NVIDIA's live /v1/models, flags new free endpoints as candidates, and opens a PR for human review against the policy. Run it locally with npm run model-watch.

Connecting to MCP servers (Torus is an MCP host)

Torus speaks two flavors of MCP:

In-process — tool() + createSdkMcpServer() define tools that run in your process (the toolkit, the catalog/billing servers).
External — connect to the wider MCP ecosystem (GitHub, Postgres, Slack, …) over stdio (a local subprocess) or HTTP. Their tools are discovered and registered under the same mcp__<server>__<tool> namespace, so they work in the loop, the cascade, and packs, gated by the same permission allowlist.

import { query, connectMcpServers } from "torus-ai";

const { servers, close } = await connectMcpServers({
  github: { command: "npx", args: ["-y", "@modelcontextprotocol/server-github"],
            env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN! } }, // stdio
  docs:   { type: "http", url: "https://example.com/mcp" },     // remote
});

for await (const ev of query("List my 3 newest GitHub issues", {
  mcpServers: servers,
  permissions: { allowedTools: ["mcp__github__*"] }, // allowlist the external tools
})) { /* ... */ }

await close(); // tear down the subprocess / connection

Needs the optional @modelcontextprotocol/sdk package (npm i @modelcontextprotocol/sdk).

Browser hands (drive a real browser)

Give an agent a browser via agent-browser — a self-contained native CLI. Torus is the brain; agent-browser is the hands. createBrowserServer() wraps its action commands as mcp__browser__* tools, so they run through the same loop, permission allowlist, and confirm gate as everything else.

npm i -g agent-browser && agent-browser install   # one-time: binary + Chrome for Testing

import { createBrowserServer, BROWSER_CONFIRM_TOOLS, createSpecializedAgent } from "torus-ai";

const browser = createBrowserServer();   // spawns the `agent-browser` binary
const agent = createSpecializedAgent({
  name: "researcher",
  persona: "You browse to answer questions. Take a snapshot, then act on the @refs.",
  tools: [browser],
  guardrails: { allowedTools: ["mcp__browser__*"], confirm: BROWSER_CONFIRM_TOOLS }, // eval/upload need a yes
});

The agent's loop is open → snapshot (an accessibility tree with @e1 refs — the right thing for a model to reason over, not pixels) → click/fill a ref → get_text → close. The CLI is reached through an injectable exec seam, so tests and npm run example:browser drive a mock CLI fully offline — no binary, no Chrome. Drop the override and it spawns the real agent-browser. Outward tools (eval, upload) sit behind confirm, and the audit flags any outward browser tool you forget to gate.

Specializing for a product (packs)

Don't fork the SDK per product — load a pack. A pack is an adapter that turns the generic engine into a vertical specialist (a bridal consultant, a mortgage advisor, a support agent): persona + sales playbook + policy + domain tools + catalog grounding + guardrails.

import { createSpecializedAgent, createCatalogServer, createInvoiceServer,
         createHandoffServer } from "torus-ai";

const agent = createSpecializedAgent({
  name: "bridal",
  persona: "You are a warm bridal consultant for Aurora Bridal.",
  playbook: "discover needs → recommend → handle objections → close → settle → confirm",
  knowledge: { catalog: dresses, faqs: "Alterations take 3 weeks. ..." }, // → search_catalog
  tools: [createInvoiceServer(), createHandoffServer()],
  guardrails: {
    policy: "Never invent a price or availability. Max 10% discount. Escalate over $5,000.",
    confirm: ["mcp__billing__create_invoice"], // money step needs a yes
  },
}, {
  onConfirm: async (tool, input) => askHuman(tool, input), // your confirmation UI
});

for await (const ev of agent.query("Anything under $2k for an August wedding?")) { /* ... */ }

What the pack gives you, mapped to the engine:

| Pack part | Effect | |---|---| | persona + playbook + policy | assembled into the system prompt | | knowledge.catalog | auto-wired search_catalog tool + a "never guess prices" instruction | | tools | your domain actions (compose the toolkit: catalog, lead memory, invoice, handoff) | | guardrails.allowedTools / confirm / canUseTool | the safety gate — irreversible steps (billing) pause for confirmation | | model | defaults to the free-first cascade |

Edit content as files (persona.md, playbook.md, voice.md, policy.md, catalog.json, faqs.md) and loadPack("packs/bridal", { tools }) assembles the pack — so a shop owner edits the catalog and tone while devs write the few action tools.

Voice is an ICM Layer 3 communication guide every agent internalizes (configure the factory, not the product). Set voice: CLEAR_LANGUAGE_VOICE — or your own voice.md — so an agent, especially a human-facing one like the Alpha/coordinator, speaks so the reader understands on first read:

import { CLEAR_LANGUAGE_VOICE, createSpecializedAgent } from "torus-ai";
const agent = createSpecializedAgent(pack, { voice: CLEAR_LANGUAGE_VOICE });

Universes — one role, many cultures

A pack is portable. Drop it into a universe and it inherits that universe's culture: mission, values, how agents collaborate, voice, and baseline policy. The same role plays by local rules.

import { defineUniverse, CLEAR_LANGUAGE_VOICE } from "torus-ai";

const boutique = defineUniverse({
  name: "Aurora Bridal",
  culture: { name: "Aurora Bridal", values: "warm, unhurried, never pushy",
             voice: CLEAR_LANGUAGE_VOICE, policy: "Maximum discount you may offer is 10%." },
  roster: { sales: salesPack },
});
const agent = boutique.agent("sales"); // the sales pack, bound by the boutique culture

The same salesPack in a formal enterprise universe answers a discount request by qualifying the deal; in the boutique it warmly offers up to 10%. npm run example:universe.

The reusable toolkit (src/packkit.ts): createCatalogServer, createLeadMemoryServer, createInvoiceServer (generic settle stub), createHandoffServer.

A specialized agent does single-shot query() or stateful session() (remembers history across turns):

const chat = agent.session();
for await (const ev of chat.send("A lace gown under $2,000?")) { /* recommends + quotes */ }
for await (const ev of chat.send("I'll take that one — charge the $200 deposit")) {
  // "that one" resolves to the recommended gown via memory → invoice (gated) + reserve
}
// chat.messages holds the transcript — persist it and rehydrate with agent.session(saved)

Persist conversations (one per customer/ticket) so they survive a restart:

import { fileSessionStore, resumeSession } from "torus-ai";

const store = fileSessionStore("./sessions"); // or memorySessionStore(), or your own DB
const chat = await resumeSession(agent, store, customerId); // loads prior history
for await (const ev of chat.send("...")) { /* auto-saved when the turn ends */ }
// next visit: resumeSession(agent, store, customerId) picks up exactly where it left off

SessionStore is a two-method interface (load/save) — back it with Redis, Postgres, or anything; the transcript is plain JSON.

A complete runnable pack lives in packs/bridal — npm run pack:bridal runs a two-turn sale (discover → recommend → quote → close → settle) on the free cascade, with the $200 deposit behind the confirmation gate.

Appointments & booking

createCalendarServer adds scheduling tools — list_availability, book_appointment (both parties), reschedule_appointment, cancel_appointment — backed by an in-memory calendar with an onBook hook: the seam where a real backend goes.

const calendar = createCalendarServer({
  slots, businessName: "Aurora Bridal",
  onBook: async (appt) => {
    // wire your backend here, e.g. a Google Calendar MCP tool:
    // await gcal.callTool({ name: "create_event", arguments: {
    //   summary: appt.title, start: appt.start, attendees: [appt.attendeeEmail] } });
    // Google then emails the invite to both parties.
  },
});

packs/booking is a runnable scheduling assistant — npm run pack:booking books a fitting in a two-turn chat (offer slots → confirm → book), with the booking behind the confirmation gate (it's outward — it creates an event and emails an invite). To make it real, either connect a Google Calendar MCP server (via connectMcpServers) so the agent creates live events + invites, or point onBook at your own booking service.

Self-improvement — the dream loop

Agents can learn from the world, but only keep what fits who they are. The loop:

import { reflect, oracle, integrate } from "torus-ai";

// 1. the agent's OWN sessions → its Dream (self-state): identity, strengths, gaps, strategies
const dream = await reflect(provider, { role, culture, transcript: ownSessions });

// 2. an EXTERNAL knowledge transcript (a video, talk, call — transcribe it first) → Knowledge Units
const units = await oracle(provider, videoTranscript);

// 3. the Upgrader contrasts each unit against the Dream:
//    serves → integrate (updates the pack + the dream) · culture-clash → reject · money/integration → escalate
const result = await integrate(provider, { dream, knowledge: units, gate: askHuman });

The Dream makes integration self- and context-aware: the same lesson is absorbed by one agent and rejected by another, by their dreams. npm run example:dream runs it end to end (prefers a JSON-reliable model like Gemini). Dreams persist via fileDreamStore / memoryDreamStore.

The 6 AM briefing

Each area advisor surfaces the decisions that need you; a chief of staff pre-digests them into one short, clear-language brief — choices, not data — so you just direct.

import { morningBriefing } from "torus-ai";

const brief = await morningBriefing(provider, {
  advisors: [
    { area: "Sales", watches: "pipeline, conversion", context: "checkout conversion fell 8%" },
    { area: "Finance", watches: "cash, runway", context: "runway is 7 months" },
  ],
});
console.log(brief.summary);  // grouped by area; each item ends in a decision + options
brief.items;                 // structured: { area, decision, options, priority }

Wire it to a schedule (a cron / GitHub Action, like model-watch) so it lands before you wake. npm run example:briefing.

Agentic research

Ask a question; get a cited answer. research() plans focused sub-queries, fans them out in parallel through a pluggable search backend, dedupes the sources, and synthesizes a clear-language answer that cites [n].

import { research } from "torus-ai";

const report = await research(provider, "How should a small team run a standup?", {
  search: myWebSearch,   // (query) => SearchResult[] — a web-search MCP, an API, or a corpus
  breadth: 4,            // how many sub-queries to plan
});
console.log(report.answer);    // cited synthesis
report.sources;                // the [n] sources, deduped

The backend is just a function (query) => SearchResult[], so Torus stays search-engine-agnostic — wire a web-search MCP, Tavily/Brave, or your own index into the same slot. npm run example:research runs it offline over a tiny in-memory corpus.

Subagent delegation

A coordinator stays focused by handing self-contained tasks to specialist teammates. The roster is fixed (you define the team); which teammate handles a given task is the model's call at runtime. That's the primitive every multi-role org — and the Alpha Agent — is built on.

import { defineSubagent, createDelegationServer, createSpecializedAgent } from "torus-ai";

const researcher = defineSubagent({ name: "researcher", description: "Finds facts.", system: "..." });
const copywriter = defineSubagent({ name: "copywriter", description: "Writes copy.", system: "..." });

const team = createDelegationServer([researcher, copywriter]);   // a `delegate` tool
const editor = createSpecializedAgent({
  name: "editor",
  persona: "You delegate to teammates, then assemble their work.",
  tools: [team],
  guardrails: { allowedTools: ["mcp__delegation__delegate"] },
});

for await (const ev of editor.query("Research a benefit, then write a tagline.")) { /* ... */ }

The delegate tool advertises the roster to the parent, so it routes by specialty and rejects unknown teammates. defineSubagent builds a teammate from a system prompt; subagentFromAgent promotes any existing pack into one. npm run example:delegation.

The Alpha Agent — give it a goal, it runs the org

The capstone. Stand up a universe (culture + roster), then hand the Alpha one goal. It binds the culture, treats every role as a teammate, breaks the goal into tasks, delegates them (in sequence when one feeds the next), and returns a single clear-language, decision-ready answer. It composes everything below it — universes + packs + delegation + voice — into one entry point.

import { defineUniverse, defineAlphaAgent } from "torus-ai";

const agency = defineUniverse({ name: "Lumen Growth", culture, roster: { strategist, researcher, copywriter } });
const alpha = defineAlphaAgent({ universe: agency });

for await (const ev of alpha.pursue("Plan a $0-budget launch for a students' note app")) {
  if (ev.type === "result") console.log(ev.finalText);   // channel + insight + headline, assembled
}

In the live demo the Alpha delegates research → strategy → copy, each teammate seeing only its task, then assembles the plan. npm run example:alpha.

The Gatekeeper — intake triage in front of the org

Work and requests arrive faster than any org can act on them. The Gatekeeper is the filter in front of the Alpha and the 6 AM briefing: it sorts each incoming item into a lane (handle · delegate · defer · reject · escalate), sets a priority, gives one clear-language reason, and names the role to own anything it delegates.

import { createGatekeeper } from "torus-ai";

const gate = createGatekeeper({
  provider,
  roles: ["engineering", "sales", "support", "legal"],
  policy: "Customer-facing bugs → engineering. Leads → sales. Legal/threats → escalate. Spam → reject.",
});

const verdicts = await gate.triageMany(inbox);   // [{ decision, priority, reason, route }]
const act = verdicts.filter((v) => v.decision !== "reject" && v.decision !== "defer");

In the live demo a mixed inbox — a production outage, an enterprise lead, a billing question, a 1px nitpick, a crypto scam, a legal threat — sorts into P0 engineering, P1 sales, P1 support, P3 defer, reject, and P0 escalate. If triage itself fails it escalates rather than silently dropping. npm run example:gatekeeper.

Agent Skills (portable, model-agnostic)

A skill is a folder with a SKILL.md (YAML frontmatter: name + description) plus optional bundled files — the exact Anthropic Agent Skills format, so skills are portable to/from Claude Code and the open-source skills ecosystem.

Anthropic loads skills via a Claude VM + bash. Torus is model-agnostic, so it reproduces the same three-level progressive disclosure with tools — working on NVIDIA, Gemini, anything:

| Level | What | How | |---|---|---| | 1 · metadata | name + description | injected into the system prompt (~100 tokens) | | 2 · instructions | the SKILL.md body | use_skill(name) loads it on demand | | 3 · resources | bundled templates/scripts | read_skill_file(name, path) (sandboxed) |

import { loadSkills, createSpecializedAgent } from "torus-ai";

const skills = await loadSkills("./skills");          // each subfolder = one SKILL.md
const agent = createSpecializedAgent({ name: "bot", persona: "...", skills });
// the agent calls use_skill itself when a task matches a skill's description

npm run example:skills proves it live: an agent given a release-notes skill loads it, reads its TEMPLATE.md, and formats the notes — even dropping the non-user-visible change because the skill said to.

Agents that author their own skills

The dream loop doesn't just fold what it learns into a persona — it can write a new skill. Feed it a transcript; it distills the insight and authors a portable SKILL.md the whole org can reuse.

import { oracle, authorSkill, writeSkill, loadSkills } from "torus-ai";

const units = await oracle(provider, transcript);                 // distill
const skill = await authorSkill(provider, units[0], { role: "sales" });  // → a SKILL.md
await writeSkill("./skills", skill);                              // persist
const skills = await loadSkills("./skills");                     // reusable, loop closed

And skills earn their place by track record — a SkillLibrary logs win/loss outcomes and promotes the performers, ranking by the Wilson lower bound so a 2/2 fluke never outranks a proven 18/20:

import { createSkillLibrary } from "torus-ai";

const lib = createSkillLibrary();
await lib.record("handle-price-objection", true);     // log outcomes over time
await lib.promote();   // ["handle-price-objection"] — the ones that earned it
await lib.retire();    // underperformers to drop

npm run example:dream-skills runs the whole loop: distill → author → reload → promote.

Hardening — hooks, evals, audit, compaction

Four seams that turn the engine from a demo into something you can operate. Each is a small, provider-agnostic primitive in the ICM spirit (configure the factory, not the product) — and they compose. npm run example:hardening exercises all four offline (no API key).

Lifecycle hooks (the deterministic seam around the loop)

The permission gate decides whether a tool runs; hooks let you observe and intervene at the same lifecycle points the Claude Agent SDK exposes — PreToolUse (deny or rewrite a call before permissions), PostToolUse (observe the result), and Stop (the loop settled). They're the seam for cost tracking, secret scrubbing, a type-check after a write — or auto-learning at the end of a turn.

import { query, type Hooks } from "torus-ai";

const hooks: Hooks = {
  preToolUse: ({ name }) =>
    /delete|drop/.test(name) ? { behavior: "deny", message: "destructive tool blocked" } : undefined,
  postToolUse: ({ name, result }) => track(name, result),  // observe-only
  onStop: ({ messages }) => persistMetrics(messages),      // fires once, with the transcript
};

for await (const ev of query("…", { hooks })) { /* … */ }

Hooks thread through query(), createSpecializedAgent({ hooks }), and runLoop. Because Stop hands you the full transcript, the dream loop becomes a hook: dreamStopHook folds each finished turn into the agent's Dream automatically, instead of waiting for a manual reflect call.

import { createSpecializedAgent, dreamStopHook, fileDreamStore } from "torus-ai";

const dream = dreamStopHook(provider, { id: customerId, role: "sales", store: fileDreamStore("./dreams") });
const agent = createSpecializedAgent(pack, { hooks: dream });
// …chat… then read the live self-state any time:
dream.current();   // { identity, strengths, gaps, strategies, updatedAt }

Eval harness (the producer of the win/loss signal)

SkillLibrary ranks skills by track record — but something has to generate that record. The eval harness is that referee: an LLM-judge grades an output against a rubric (continuous score + pass/fail), runEval grades a whole suite (a checkpoint eval you re-run to catch regressions), and passAtK measures reliability across k attempts. Feed any grade straight back into the library to close the loop.

import { evaluate, recordEval, runEval, passAtK, createSkillLibrary } from "torus-ai";

const grade = await evaluate(provider, { task, output, rubric });   // { pass, score, reasons }
await recordEval(lib, "handle-price-objection", grade);             // → SkillLibrary.record

const report = await runEval(provider, cases, (task) => agentAnswer(task));   // { passRate, runs }
const reliability = await passAtK(provider, { task, rubric, k: 5 }, run);     // { passAtK, passes }

Never throws — a judge that fails grades as a conservative fail, so a broken judge can't fake a win.

Security audit (AgentShield for your own config surface)

The moment Torus runs multi-tenant, the contract surface becomes an attack surface: untrusted persona/policy/voice text can carry prompt injection, a * allowlist hands the agent every tool, an irreversible money tool with no confirmation is a foot-gun, and an external MCP server is code you didn't write inside your trust boundary. audit* is a deterministic linter (no model, no key — safe in CI) that grades the surface A–F.

import { auditPack, auditMcpConfigs, auditUniverse, auditReport, formatAuditReport } from "torus-ai";

const r = auditReport(auditPack(pack));   // { grade: "A".."F", ok, findings[] }
if (!r.ok) { console.log(formatAuditReport(r)); process.exit(2); }   // gate the build

auditMcpConfigs({ github: { command: "npx", args: ["-y", "…"] } });  // unpinned npx, shell-fetch, plaintext http
auditUniverse(universe);   // every pack in the roster + the culture's baseline policy

It flags wildcard allowlists, irreversible tools missing from guardrails.confirm, injection phrasing in contract text, and risky MCP launch configs — each with a remediation line.

Runtime content shield (the trust boundary for what arrives mid-run)

audit lints config at rest; the shield defends content that arrives during a run — a fetched web page, a tool result, a stored memory — which is where the real attacks land once an agent can browse. It stops three, all deterministically (no key):

Invisible text — white-on-white, zero-width, and bidi-override characters a human never sees but the model reads. scanContent strips them and flags it.
Injection in a page/URL — "ignore your instructions…" smuggled into retrieved content. guardServer wraps an MCP server so every tool's output is scrubbed, and injected content is fenced as data (wrapUntrusted) — the model is told to read it, not obey it.
Memory poisoning (the drip method) — instruction-like text parcelled into long-term memory to fire later. Memory holds data, never instructions: lintFact now rejects injection at the write, and scanFacts re-scans the store at rest (the drip assembles over time, so a whole-store scan catches what looked benign).

import { guardServer, createBrowserServer, scanContent, scanFacts } from "torus-ai";

const browser = guardServer(createBrowserServer());   // web content is scrubbed + fenced before the model sees it
scanContent(fetchedHtml);                             // { findings, sanitized, injected }
scanFacts(await store.query("acme"));                 // poison that slipped into memory earlier

The shield reuses #22's injection patterns and plugs into the same seams: a content firewall on the tool-result boundary (#24's browser intake), an injection reject at the memory write (#23), and a scan-at-rest that extends the audit to memory. npm run example:shield stops all three offline.

Control plane — terminate, bound, authorize

The shield and audit defend content; the control plane governs the agent itself — the gap the Agents of Chaos study named (most orgs can't terminate a misbehaving agent, bound its resource use, or stop it obeying non-owners). A Governor is a kill switch + circuit breaker + budget that drives the loop's AbortSignal; an Authority gates privileged tools to the owner.

import { createGovernor, createAuthority, query } from "torus-ai";

const gov = createGovernor({ budget: { maxToolCalls: 20, maxWallClockMs: 60_000 }, tripAfterFailures: 3 });
for await (const ev of query(prompt, { hooks: gov.hooks, signal: gov.signal })) { /* … */ }
gov.halt("operator pressed stop");   // terminate from anywhere — the loop stops next check
gov.state();                          // { halted, reason, toolCalls, failures, elapsedMs }

const auth = createAuthority({ owner: "dana", privilegedTools: ["mcp__billing__*"] });
auth.setPrincipal({ id: "stranger", role: "guest" }); // now billing is denied
query(prompt, { hooks: auth.hooks });

Both are just hooks + the abort signal, so they compose (mergeHooks(gov.hooks, auth.hooks)) with everything else. The loop halts between turns and again after the tool batch, so a budget breach stops before the next model call. npm run example:control shows the budget, breaker, kill switch, and authority gate offline.

Observability ledger (the controls, on the record)

Enforcing isn't reviewing. The ledger is an append-only record of the security-relevant moments — every tool call, every denial (permission / authority / governor / hook), every halt — so you can answer what did this agent try, what was blocked, and why. The loop already yields these as events and funnels every gate through permission_denied, so one pass-through wrapper captures it all.

import { createLedger, fileLedger, formatLedger } from "torus-ai";

const ledger = createLedger(fileLedger("./audit.jsonl"));   // or memoryLedger()
for await (const ev of ledger.observe(query(prompt, { hooks: gov.hooks, signal: gov.signal }))) { /* … */ }

await ledger.entries({ kind: "denied" });   // exactly what was blocked
await ledger.summary();                     // { total, byKind, denials, halts }
console.log(formatLedger(await ledger.entries()));

Back it with memoryLedger() (tests), fileLedger() (a JSONL audit trail you can git diff and ship to a SIEM), or your own sink. npm run example:observability runs a governed agent and prints its full review ledger offline.

Context compaction (the PreCompact seam)

A persisted session grows without bound. Compaction folds the old turns into a summary and keeps the recent ones verbatim — and since reflect already distills a transcript into a Dream, compactWithDream reuses it, so the agent carries forward who it has become, not a flat recap.

import { compactingSession, compactTranscript, compactWithDream } from "torus-ai";

// auto-compact a session before each turn (mirrors resumeSession's wrapping):
const chat = compactingSession(agent.session(), provider, { maxChars: 12000, keepRecent: 6 });

// or compact a raw transcript directly:
const { messages, compacted, summary } = await compactTranscript(provider, transcript, { maxChars: 12000 });
const dreamFolded = await compactWithDream(provider, { messages: transcript, role: "sales", store, id });

A no-op until the transcript crosses maxChars, so it's cheap to call every turn.

Atomic semantic memory (one fact, one file)

Compaction is the lossy half of memory; this is the lossless half. A session remembers the whole transcript or a prose summary — but neither lets you ask "what is this customer's budget?" and get 2000 back precisely. Adapted from the obsidian-memory-for-ai protocol, memory stores one fact per file keyed by entity + predicate, so recall is exact and survives compaction.

Two safety properties, both reusing what Torus already has:

Agents never write memory directly. remember_fact only proposes a write into an inbox — the same propose→approve→receipt gate Torus uses for money, pointed at memory. A misremembered fact can't silently land.
Schema + provenance. Every fact is checked against a controlled predicate list and stamped with which agent proposed it, when, and how sure — so a model can't quietly write garbage into long-term memory.

import { createMemoryServer, fileFactStore, fileOpLog, reviewProposals } from "torus-ai";

const store = fileFactStore("./vault");   // facts/{entity}/{predicate}.md — human-readable, git-diffable
const log = fileOpLog("./vault");         // _ops/applied.jsonl — an append-only audit trail
const memory = createMemoryServer({
  store, log, agent: "bridal",
  schema: { predicates: ["budget", "wedding_date", "gown_style"], minConfidence: 0.4 },
});

const agent = createSpecializedAgent({ name: "bridal", persona: "…", tools: [memory] });
// …chat… the agent calls remember_fact (which only queues). Then the shop applies the queue:
await reviewProposals(store, log, { gate: (p) => (p.fact.confidence ?? 0) >= 0.8 }); // or askHuman(p)
await store.get("elena-voss", "budget");   // { value: "2000", confidence: 0.9, agent: "bridal", … }

Stores are pluggable like SessionStore/DreamStore (memoryFactStore for tests, fileFactStore for a markdown vault, or your own DB). npm run example:memory runs the whole flow — propose → review → recall → audit — offline. The packs/bridal demo wires it in: the consultant records the customer's budget and date as durable facts, and the shop applies them after the chat.

Run it as a service (deploy on Render)

Torus is a library; server/index.ts wraps it in a thin HTTP service so you can host it. Native node:http, no web framework. Endpoints:

| Method | Route | Body | Returns | |---|---|---|---| | GET | /health | — | { ok, name, version } | | POST | /query | { prompt, system? } | { text } | | POST | /alpha | { goal, mission?, values?, roster? } | { text } |

/query runs a single agent turn (no file tools — a public endpoint shouldn't expose them). /alpha builds an Alpha over an inline roster and pursues the goal. If TORUS_API_KEY is set, every POST needs Authorization: Bearer <key>.

# local
npm run build && node --env-file=.env dist/server.js
curl -s localhost:8080/health
curl -s -X POST localhost:8080/alpha -H 'content-type: application/json' \
  -d '{"goal":"Plan a launch tweet for a free note app for students"}'

Deploy on Render — the repo ships a render.yaml blueprint:

Render Dashboard → New → Blueprint, pick this repo (connect GitHub once for the private repo).
Render reads render.yaml: build npm install && npm run build, start npm start, health check /health, free plan.
Set the secrets in the dashboard (they're sync:false, never in git): NVIDIA_API_KEY (required), GOOGLE_API_KEY (optional), TORUS_API_KEY (guards POSTs).

Free plan sleeps when idle; bump to starter (~$7/mo) to stay warm.

The stage contract (Layer 2)

Each stages/NN_verb/CONTEXT.md is both the agent's instructions and human docs:

# Stage 02 — draft

## Inputs
- Layer 4 (working):   ../01_research/output/research-output.md
- Layer 3 (reference): ../../_config/voice.md

## Process
Turn the research brief into a ~300-word first draft in the house voice.

## Outputs
- draft.md -> output/

## Tools          # optional — the stage's tool allowlist (source of truth)
- mcp__research__lookup

The runner reads ## Inputs to scope context (loads only those files), ## Tools to gate what the loop may call, and ## Outputs to name the artifact it persists.

Design notes

The contract is the control point. A stage loads only the files it names (ICM principle 3) and may call only the tools it lists.
output/ is the handoff. Stage NN's output is stage NN+1's input — the coordination "logic" is one folder feeding the next.
Observable by default. No logging layer — open the folders. Re-run one stage by itself; its ## Inputs table is its dependency declaration.
Provider-agnostic. The loop only needs ModelProvider.generate(). Mock for tests/offline, Anthropic for real, anything else you implement.

MIT-licensed, like the ICM protocol it's built on.