@vauban-org/agent-sdk
v1.0.0
Published
Vauban agent primitives: loop, budget, routing, HITL, permissions, tracking, durable execution
Readme
@vauban-org/agent-sdk
Vauban agent primitives extracted from Command Center. Provides the core loop, budget tracking, provider routing, HITL approval, permissions, durable execution, and telemetry building blocks.
Quickstart
import {
AgentLoop,
createBudgetState,
createProviderRouter,
} from "@vauban-org/agent-sdk";
const router = createProviderRouter({
groqApiKey: process.env.GROQ_API_KEY,
});
const loop = new AgentLoop({
agentId: "my-agent",
agentVersion: "0.1.0",
systemPrompt: "You are a helpful assistant.",
provider: router,
tools: myToolRegistry,
budget: createBudgetState({ maxSteps: 10 }),
});
const result = await loop.run("Summarise the latest market news.");
console.log(result.finalMessage);Public API
See CONTRACT.md for all signatures and the breaking-change policy.
| Export | Description |
|--------|-------------|
| AgentLoop | Multi-provider loop (Anthropic + Groq cascade) |
| SdkAgentLoop | Anthropic-direct loop with permission enforcement |
| AgentRegistry | Plugin registration + descriptor validation |
| createBudgetState | Per-run budget counters |
| createCoherenceDetector | Loop/stall detection |
| createProviderRouter | Anthropic → Groq fallback router |
| InMemoryApprovalStore | In-process HITL approval store |
| sanitizeExternalInput | Prompt injection defence |
| keepSafeOnly | Filter + return clean items |
| recordOutcome | OTel span finalisation helper |
| createAgentRunTracker | DB-backed token/cost accounting |
| createBullMQRunner | BullMQ queue/worker/DLQ factory |
| AGENT_IDS | Stable UUIDs per agent archetype |
| mapScopesToSdkPermissions | JWT scope → SdkPermissions |
Architecture
@vauban-org/agent-sdk
├── loop/ AgentLoop (minimal-loop) + SdkAgentLoop (sdk-loop)
├── budget/ AgentBudgetState, CoherenceDetector, compactToolLog
├── router/ ProviderRouter (Anthropic + Groq)
├── hitl/ ApprovalChannel interface + InMemoryApprovalStore
├── permissions/ SdkPermissions, mapScopesToSdkPermissions
├── safety/ sanitizeExternalInput, keepSafeOnly
├── tracking/ OTel gen-ai spans + AgentRunTracker
├── durable/ BullMQRunner (queues, workers, DLQ, flow)
└── registry/ AgentRegistry + AGENT_IDSTrading-NQ Paper Validation (Required Before Live)
Before flipping TRADING_MODE=live:
- Run 3 weeks of
TRADING_MODE=papercycles in production env - Verify
net_roi_pctstable (median > 0 across 21 days) - Verify
slippage_bpstolerable (P95 < 5 bps) - Founder + RSO sign-off (per
governance/founder-authority.md)
Bundle size gates
size-limit enforces maximum bundle sizes per entry-point. These are checked automatically on every PR.
| Entry | Limit |
|-------|-------|
| dist/index.js (core) | 150 KB |
| dist/proof/index.js | 30 KB |
| dist/orchestration/ooda/index.js | 40 KB |
Run locally: pnpm size
Performance benchmarks
Micro-benchmarks for hot paths (AgentLoop.run, tracedPort, ProviderRouter.complete) live in benchmarks/sdk.bench.ts. Powered by tinybench.
# Run benchmarks + compare vs committed baseline
pnpm bench:sdk
# Capture a new baseline (after an intentional performance change)
pnpm bench:baselineUpdating the baseline after intentional changes
When you make a deliberate change that improves or alters performance:
- Run
pnpm bench:baselinelocally — this overwritesbenchmarks/baseline.json. - Review the diff:
git diff packages/agent-sdk/benchmarks/baseline.json. - Commit
baseline.jsonalongside your code change with a message explaining the intentional shift. - PR CI will use the new baseline for subsequent comparisons.
Never update the baseline just to "fix" a regression caused by unintended slowdowns — diagnose and fix those instead.
Determinism guarantees
random.next()/random.uuid(): seedable, deterministic, NON SECURITY-GRADE. Use for application logic only.random.crypto()/random.cryptoUuid(): crypto-strong, NON REPLAY-SAFE BY DESIGN (security trumps audit). Calling these during replay throwsCryptoRandomDuringReplayErrorin strict mode.clock.now(): virtualized viaRecordedClockduring replay.- LLM responses replayed via
LLMResponseCachekeyed by SHA-256(canonical({provider, model, messages, temperature, seed})).
Replay proves byte-identical traces when these are virtualized. It does NOT prove LLM provider-side determinism.
Agentic privacy via HDNT
The privacy/ module implements the first slice of the Vauban Privacy Protocol
integration (sprint-576). It provides a TypeScript shim for the Hierarchical
Deterministic Nullifier Tree (HDNT) derivation paths described in the design docs:
vauban-privacy-protocol/docs/research/03-agentic-privacy.md— agent nullifier hierarchy, delegation-as-Claim pattern.vauban-privacy-protocol/docs/research/05-cross-domain-nullifiers.md— HDNT derivation algorithm and cross-domain proof structure.
Agent nullifier hierarchy
import { deriveAgentNullifier, deriveActionNullifier } from "@vauban-org/agent-sdk";
// N_agent = Poseidon(N_root, agentPubkey) — unlinkable per agent
const agentNullifier = deriveAgentNullifier(masterKey, agentPubkey);
// N_action = Poseidon(N_agent, actionId, nonce) — one-time use
const actionNullifier = deriveActionNullifier(agentNullifier, actionId, nonce);Delegation-as-Claim
import { buildDelegationClaim, isDelegationScopeValid } from "@vauban-org/agent-sdk";
const claim = buildDelegationClaim(parentClaimRef, delegateeNullifier, {
allowedActions: ["trade", "report"],
expiry: Math.floor(Date.now() / 1000) + 3600,
});
// Pass claim.scopeWitness to the Rust HDNT crate for STARK proof generation.The authoritative Poseidon implementation is in the Rust HDNT crate at
vauban-privacy-protocol/crates/hdnt/. The TypeScript module uses starknet.js
hash.computePoseidonHashOnElements when the starknet peer dep is present,
and a deterministic placeholder otherwise (test/dev only).
Inter-agent encrypted channels
The privacy/channel module (checkpoint 2, sprint-576) provides shielded
bidirectional communication between agents.
import {
establishChannel,
sendOverChannel,
receiveOverChannel,
ChannelError,
} from "@vauban-org/agent-sdk";
// Sender establishes a channel to the receiver
const channel = establishChannel(senderSecretKey, receiverPublicKey);
// Encrypt a message with a monotonically increasing nonce
const msg = sendOverChannel(channel, plaintext, nonceCounter);
// Receiver decrypts — replay and tamper protection built in
try {
const plaintext = receiveOverChannel(channel, msg, lastAcceptedNonce);
} catch (e) {
if (e.message === ChannelError.ReplayedNonce) { /* ... */ }
if (e.message === ChannelError.InvalidTag) { /* ... */ }
if (e.message === ChannelError.MismatchedChannel) { /* ... */ }
}Security properties:
- Replay protection: nonce must be strictly increasing (
ReplayedNonceerror on violation). - Integrity: HMAC-style Poseidon auth tag; tampered ciphertext triggers
InvalidTag. - Channel isolation:
channel_idmismatch triggersMismatchedChannel. - Directional asymmetry: sender and receiver roles are asymmetric — swapping
them produces a different
channel_idandshared_secret.
STUB notice (ORQ-1): Key exchange uses Poseidon-derived shared secret as a placeholder for ML-KEM-768 (CRYSTALS-Kyber). Stream cipher uses Poseidon keystream as a placeholder for AES-256-GCM or ChaCha20-Poly1305. Not production-ready until ORQ-1 replaces both primitives.
Workspace dep (Command Center)
"@vauban-org/agent-sdk": "workspace:*"