@sonzai-labs/agents
v1.7.6
Published
TypeScript SDK for the Sonzai Mind Layer API
Maintainers
Readme
Sonzai TypeScript SDK
The official TypeScript SDK for the Sonzai Mind Layer API. Build AI agents with persistent memory, evolving personality, and proactive behaviors.
Zero runtime dependencies. Uses the native fetch API. Works with Node.js (>=18), Bun, and Deno. Ships both ESM and CJS builds with full type definitions.
Installation
npm install @sonzai-labs/agents
# or:
bun add @sonzai-labs/agents
pnpm add @sonzai-labs/agents
# Deno:
import { Sonzai } from "npm:@sonzai-labs/agents";Quick Start
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: "your-api-key" }); // or set SONZAI_API_KEY
const response = await client.agents.chat({
agent: "your-agent-id",
messages: [{ role: "user", content: "Hello! What's your favorite hobby?" }],
userId: "user-123",
});
console.log(response.content);Authentication
Get your API key from the Sonzai Dashboard under Projects > API Keys.
const client = new Sonzai({ apiKey: "sk-..." }); // explicit
const client = new Sonzai(); // or: SONZAI_API_KEY=sk-...API keys are sent as Authorization: Bearer <key>.
Configuration
const client = new Sonzai({
apiKey: "sk-...", // or SONZAI_API_KEY env var
baseUrl: "https://api.sonz.ai", // or SONZAI_BASE_URL env var
timeout: 30_000, // request timeout (ms)
maxRetries: 2, // retries for idempotent failures
defaultHeaders: { "X-My": "hdr" },
customFetch: fetch, // swap in undici / a mock / a wrapper
});Idempotent requests (GET, DELETE) retry with exponential backoff on transient failures. Mutating requests (POST, PATCH, PUT) do not retry.
Resources
client.agents // chat, CRUD, dialogue, context engine
client.knowledge // project-scoped KB (docs, graph, facts, search)
client.evalTemplates // evaluation templates
client.evalRuns // eval/simulation runs, reconnectable streaming
client.voices // global voice catalog
client.webhooks // webhook registration and rotation
client.projects // project management & API keys
client.userPersonas // user persona CRUD
client.analytics // cost, usage, real-time analytics
client.workbench // internal simulation & time-machine
client.projectConfig // project-scoped config
client.accountConfig // tenant-scoped config
client.customLLM // bring-your-own-model (BYOM)
client.projectNotificationsAgent sub-resources:
client.agents.memory // tree, search, facts, timeline
client.agents.personality // Big5, dimensions, deltas, overlays
client.agents.sessions // start / end
client.agents.instances // parallel instances
client.agents.notifications // proactive notifications
client.agents.customStates // scoped key-value state
client.agents.voice // TTS / STT / live WebSocket
client.agents.generation // bio, character, seed memory
client.agents.priming // batch import / user priming
client.agents.inventory // user inventory
client.agents.schedules // user-scoped recurring eventsBring Your Own Key (BYOK)
BYOK lets you register your own LLM provider API keys with a project. Once set,
upstream LLM calls for that project route through your key — token billing falls
on your provider account, not Sonzai's. Keys are encrypted at rest server-side
and are never returned by the API (only the key prefix and health metadata are
exposed). Requires read:byok / write:byok scopes on the API key you use to
call these endpoints.
// List all configured BYOK providers for a project
const keys = await client.byok.list("project-id");
for (const k of keys) {
console.log(`${k.provider}: ${k.health_status} (active: ${k.is_active})`);
}
// Store or replace a key (validates against the provider before saving)
const key = await client.byok.set("project-id", "openai", "sk-...");
console.log(`Stored prefix: ${key.api_key_prefix}`);
// Enable or disable without rotating
await client.byok.setActive("project-id", "openai", false);
// Re-run the provider health check on a stored key
const result = await client.byok.test("project-id", "gemini");
console.log(result.health_status); // "healthy" | "invalid" | "unknown"
// Remove a stored key (project falls back to platform billing)
await client.byok.delete("project-id", "xai");Supported providers: "openai" | "gemini" | "xai" | "openrouter".
REST path: /api/v1/projects/{project_id}/byok-keys[/{provider}[/test]].
Usage
Chat (non-streaming)
const response = await client.agents.chat({
agent: "agent-id",
messages: [{ role: "user", content: "Hello!" }],
userId: "user-123",
sessionId: "session-456", // auto-created if omitted
provider: "gemini", // gemini | zhipu | volcengine | openrouter | custom
model: "gemini-3.1-flash-lite",
});
console.log(response.content);
console.log(`Tokens: ${response.usage?.totalTokens}`);Chat (streaming)
for await (const event of client.agents.chatStream({
agent: "agent-id",
messages: [{ role: "user", content: "Tell me a story" }],
})) {
const delta = event.choices?.[0]?.delta?.content ?? "";
process.stdout.write(delta);
}Streams return an AsyncGenerator<ChatStreamEvent>. Each event carries a delta in choices[0].delta.content, plus optional usage on the final frame.
Chat (async with polling)
For chats that may run longer than your network can hold an SSE stream
open (Cloudflare/LB cuts at ~100s), queue the request and poll for the
result. Cancelling locally does not cancel the server-side task —
re-poll the same processingId later if needed.
// Fire-and-forget — returns { processingId, status: "queued" } immediately.
const queued = await client.agents.chatAsync({
agent: "agent-id",
messages: [{ role: "user", content: "Plan my week." }],
userId: "user-123",
sessionId: "session-456",
provider: "openai",
model: "gpt-4o",
});
// Manual poll loop — recommended backoff: 1s → 2s → 4s, capped at 5s.
let delay = 1000;
while (true) {
const result = await client.agents.pollChatResult("agent-id", queued.processingId);
// status: "queued" | "running" | "complete" | "failed"
// While running, `response` carries partial text and `phase`/`tool`
// reflect the latest progressive-elaboration event.
if (result.status === "complete" || result.status === "failed") {
console.log(result.response, result.sideEffects);
break;
}
await new Promise((r) => setTimeout(r, delay));
delay = Math.min(delay * 2, 5000);
}
// Convenience: queue + poll until terminal in one call.
const result = await client.agents.chatAsyncBlocking(
{
agent: "agent-id",
messages: [{ role: "user", content: "Plan my week." }],
userId: "user-123",
},
{ pollIntervalMs: 1000, maxPollIntervalMs: 5000, timeoutMs: 600_000 },
);Streaming and cancellation (detached variants)
The default chat / chatStream methods use the SDK's own AbortController — they don't accept a caller-supplied AbortSignal. That's a deliberate ergonomic choice for one specific anti-pattern that surfaces over and over:
// ❌ DON'T do this. req.signal fires the moment the HTTP request ends,
// which aborts fetch mid-generation and wastes the tokens.
app.post("/chat", async (req, res) => {
await someQueue.publish({ agentId, signal: req.signal });
res.json({ status: "queued" });
});
// Background worker:
const reply = await fetch("...", { signal: queueJob.signal /* ← stale! */ });When you're calling Sonzai from a queue worker, NATS handler, Express/Hono route, or any context where the caller's signal lifetime is shorter than the AI generation, use the *Detached variants. They keep an internal AbortController whose signal is not chained to the caller's, while still:
- Enforcing an SDK-managed 5-minute timeout so calls can't leak indefinitely (override via
opts.timeoutMs, or pass<= 0to disable). - Watching an optional
opts.parentSignalso misuse surfaces as aconsole.warn(oropts.onParentCancelcallback) instead of silently aborting fetch.
import type { DetachOptions } from "@sonzai-labs/agents";
// Non-streaming (aggregates the SSE stream like chat()):
const reply = await client.agents.chatDetached(
{ agent: "agent-id", messages: [...] },
{
parentSignal: req.signal, // observed, NOT propagated
timeoutMs: 120_000, // default: 300_000 (5 min)
onParentCancel: () => metrics.inc("chat_detached.parent_cancelled"),
} satisfies DetachOptions,
);
// Callback-based streaming:
await client.agents.chatStreamDetached(
{ agent: "agent-id", messages: [...] },
(event) => process.stdout.write(event.choices?.[0]?.delta?.content ?? ""),
{ parentSignal: req.signal },
);
// AsyncIterableIterator-based streaming (the TypeScript analogue of Go's
// ChatStreamChannelDetached):
for await (const event of client.agents.chatStreamChannelDetached(
{ agent: "agent-id", messages: [...] },
{ parentSignal: req.signal },
)) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}For interactive UIs where the caller really does want to cancel the generation on user nav-away, keep using vanilla chat / chatStream — they already abort fetch cleanly on internal timeout.
Sync vs async memory recall (memoryMode)
Supplementary memory recall can run synchronously (blocks context build until recall completes — every fact lands in the current turn) or asynchronously (races a deadline — slow hits spill to the next turn for lower first-token latency). Default is sync.
memoryMode is an agent-wide capability. You can set it at creation time or flip it later with updateCapabilities:
// At creation
await client.agents.create({
name: "Luna",
toolCapabilities: { memory_mode: "async" },
});
// Or flip an existing agent
const caps = await client.agents.updateCapabilities("agent-id", {
memoryMode: "async", // or "sync"
});
// Read the current mode
const current = await client.agents.getCapabilities("agent-id");
console.log(current.memoryMode);updateCapabilities is PATCH-style — omitted fields are left unchanged. To skip the context engine entirely on a single chat (e.g. test paths), set skipContextBuild: true in the chat options.
Advanced chat options
await client.agents.chat({
agent: "agent-id",
messages: [...],
userId: "user-123",
userDisplayName: "Alex",
instanceId: "instance-789", // parallel branch
sessionId: "session-456",
provider: "gemini",
model: "gemini-3.1-flash-lite",
language: "en",
timezone: "America/New_York",
compiledSystemPrompt: "You are a helpful assistant.",
toolCapabilities: { web_search: true, remember_name: true },
toolDefinitions: [
{
name: "get_weather",
description: "Get current weather",
parameters: { type: "object", properties: { city: { type: "string" } } },
},
],
maxTurns: 10,
skipContextBuild: false,
gameContext: { custom_fields: { /* ... */ } },
skillLevels: { negotiation: 5 },
});Temperature
Pass temperature on any chat call to override the AI service's default sampling temperature (currently 0.1 for most models). Omit the field to inherit the server-side default.
await client.agents.chat({
agent: "agent-id",
messages: [...],
temperature: 0.7,
});The Platform automatically adapts or omits temperature for providers whose models constrain it. You do not need to know provider-specific rules — pass the value you want, and the Platform reconciles it where necessary. temperature: 0 is forwarded as a literal zero.
Provider constants
import { Sonzai, providers } from "@sonzai-labs/agents";
await client.agents.chat({
agent: "agent-id",
messages: [...],
provider: providers.GEMINI,
model: providers.models.gemini.FLASH_LITE,
});Agent CRUD
const agent = await client.agents.create({
name: "Luna",
gender: "female",
bio: "A thoughtful AI companion",
personalityPrompt: "You are warm and empathetic",
// Big5 traits: 0-100 canonical scale. Values <=1 are also accepted as
// fractions (0.85 → 85) so existing code keeps working.
big5: { openness: 85, conscientiousness: 60 },
// Tool capabilities are configurable at creation time:
toolCapabilities: {
web_search: true,
remember_name: true,
image_generation: false,
inventory: false,
knowledge_base: true, // enable project-scoped KB search
memory_mode: "async", // "sync" (default) or "async"
},
});
await client.agents.get(agent.agentId);
await client.agents.update(agent.agentId, { name: "New Name" });
await client.agents.list({ pageSize: 20, search: "luna" });
await client.agents.delete(agent.agentId);Memory
// Tree
const memory = await client.agents.memory.list("agent-id", {
userId: "user-123",
includeContents: true,
limit: 100,
});
for (const node of memory.nodes) {
console.log(`${node.title}: ${node.summary} (importance: ${node.importance})`);
}
// Semantic search (cosine embeddings when user_id is set, BM25 otherwise)
const results = await client.agents.memory.search("agent-id", {
query: "favorite food",
user_id: "user-123",
limit: 20,
});
// Timeline
const timeline = await client.agents.memory.timeline("agent-id", {
userId: "user-123",
start: "2026-01-01",
end: "2026-03-01",
});
// Fact CRUD
const fact = await client.agents.memory.createFact("agent-id", {
content: "Likes pizza",
factType: "preference",
importance: 0.8,
userId: "user-123",
});
await client.agents.memory.updateFact("agent-id", fact.factId, { importance: 0.9 });
await client.agents.memory.getFactHistory("agent-id", fact.factId);
await client.agents.memory.deleteFact("agent-id", fact.factId);
// Wisdom (agent-global) facts
await client.agents.memory.getWisdomAudit("agent-id", fact.factId);
await client.agents.memory.deleteWisdomFact("agent-id", fact.factId);
// Bulk create up to 1000 pre-formed facts in a single request.
// source_type="manual" — no LLM extraction.
await client.agents.memory.bulkCreateFacts("agent-id", {
userId: "user-123",
facts: [
{ content: "prefers espresso" },
{ content: "based in Singapore", factType: "location" },
],
});
// Seed / reset
await client.agents.memory.seed("agent-id", {
userId: "user-123",
memories: [{ atomic_text: "User is a chess enthusiast", fact_type: "interest" }],
});
await client.agents.memory.reset("agent-id", { userId: "user-123" });
// Paginated fact listing
const facts = await client.agents.memory.listFacts("agent-id", { limit: 50, offset: 0 });Personality
const profile = await client.agents.personality.get("agent-id");
console.log(profile.profile.name);
console.log(profile.profile.big5.openness.score);
console.log(profile.profile.dimensions.warmth);
// Recent shifts and significant moments
const shifts = await client.agents.personality.getRecentShifts("agent-id");
const moments = await client.agents.personality.getSignificantMoments("agent-id", { limit: 10 });
// Per-user overlays (how the agent perceives a specific user)
const overlays = await client.agents.personality.listUserOverlays("agent-id");
const overlay = await client.agents.personality.getUserOverlay("agent-id", "user-123");Sessions & instances
sessions.start() returns a Session handle that bundles the identity
tuple (agentId, userId, sessionId, instanceId) plus session-level
provider/model defaults. The handle drives the per-turn loop with one
fresh enriched context per turn — so the LLM you call out to (OpenAI,
Anthropic, Gemini, your own) sees up-to-date mood, recalled facts, and
recent turns on every message.
const session = await client.agents.sessions.start("agent-id", {
userId: "user-123",
sessionId: "session-456",
provider: "gemini", // session-level default
model: "gemini-3.1-flash-lite", // (per-turn overrides OK)
});
// Per-turn loop: fetch enriched context, hand it to your LLM, submit the turn.
const ctx = await session.context({ query: "what's the user about to say?" });
// ... build your prompt with `ctx`, call your LLM, get assistantReply ...
const result = await session.turn({
messages: [
{ role: "user", content: "what did we talk about last week?" },
{ role: "assistant", content: assistantReply },
],
// Prefetch the *next* enriched context in the same round-trip
// so the next user message renders without a second fetch.
fetchNextContext: { query: "anticipated next user message" },
});
console.log(result.mood); // sync mood update (undefined if none)
console.log(result.extraction_id); // async fact-extraction job id
console.log(result.next_context); // populated when fetchNextContext is set
// Poll deferred extraction (memory write-back) when you need to know it landed.
let status = await session.status(result.extraction_id);
while (status.state !== "done" && status.state !== "failed") {
await new Promise((r) => setTimeout(r, 500)); // then back off
status = await session.status(result.extraction_id);
}
// End the session. wait: true forces the CE pipeline to run synchronously
// (use in benchmarks/tests that query memory immediately after).
await session.end({ totalMessages: 10, durationSeconds: 300, wait: true });Per-call provider/model on session.turn(...) and session.end(...)
override the session defaults; omit them to fall through. TurnMessage
carries tool_call_id and tool_calls for function-calling turns;
plain ChatMessage (used by chat / chatStream) is text-only.
Legacy void-style start/end
sessions.end("agent-id", { ... }) still works for callers that don't
need the handle:
await client.agents.sessions.end("agent-id", {
userId: "user-123",
sessionId: "session-456",
totalMessages: 10,
durationSeconds: 300,
});Instances
// Parallel agent instances for A/B testing or sandboxed forks
const instances = await client.agents.instances.list("agent-id");
const instance = await client.agents.instances.create("agent-id", { name: "Beta" });
await client.agents.instances.reset("agent-id", instance.instance_id);
await client.agents.instances.delete("agent-id", instance.instance_id);Context engine state
// Single-call enriched context — fact retrieval runs query-conditioned
// (two-pass: entity-filtered + raw-text vector), and recent_turns surfaces
// this session's raw messages before consolidation has run.
const ctx = await client.agents.getContext("agent-id", {
userId: "user-123",
query: "what did we discuss earlier about espresso?",
});
for (const turn of ctx.recent_turns ?? []) {
console.log(`[${turn.timestamp}] ${turn.role}: ${turn.content}`);
}
// Individual layer accessors (the single getContext call above pulls all of these)
await client.agents.getMood("agent-id", { userId: "user-123" });
await client.agents.getRelationships("agent-id");
await client.agents.getHabits("agent-id");
await client.agents.getGoals("agent-id");
await client.agents.getInterests("agent-id");
await client.agents.getDiary("agent-id");
await client.agents.getUsers("agent-id");
await client.agents.getBreakthroughs("agent-id");
await client.agents.getTimeMachine("agent-id", "2026-01-15T00:00:00Z");Notifications
const pending = await client.agents.notifications.list("agent-id", {
status: "pending",
userId: "user-123",
limit: 20,
});
for (const n of pending.notifications) {
console.log(`[${n.check_type}] ${n.generated_message}`);
}
await client.agents.notifications.consume("agent-id", pending.notifications[0].message_id);
await client.agents.notifications.history("agent-id", { limit: 50 });Custom states
const state = await client.agents.customStates.create("agent-id", {
key: "player_level",
value: { level: 15, xp: 2400 },
scope: "user", // or "global"
contentType: "json",
userId: "user-123",
});
await client.agents.customStates.upsert("agent-id", {
key: "player_level",
value: { level: 16 },
scope: "user",
userId: "user-123",
});
await client.agents.customStates.getByKey("agent-id", {
key: "player_level", scope: "user", userId: "user-123",
});
await client.agents.customStates.deleteByKey("agent-id", {
key: "player_level", scope: "user", userId: "user-123",
});
await client.agents.customStates.list("agent-id", { scope: "global", limit: 50 });Knowledge base
// Documents
await client.knowledge.uploadDocument("project-id", "document.pdf", fileData);
const docs = await client.knowledge.listDocuments("project-id", 10);
await client.knowledge.getDocument("project-id", "doc-id");
await client.knowledge.deleteDocument("project-id", "doc-id");
// Structured facts
await client.knowledge.insertFacts("project-id", {
entities: [...],
relationships: [...],
});
// Graph nodes
const nodes = await client.knowledge.listNodes("project-id", { type: "Person", limit: 100, offset: 0 });
await client.knowledge.getNode("project-id", "node-id");
await client.knowledge.deleteNode("project-id", "node-id");
// Semantic search
const results = await client.knowledge.search("project-id", {
query: "what is the user's favorite food?",
limit: 10,
});
for (const r of results.results) {
console.log(`[${r.score.toFixed(2)}] ${r.label} (${r.type})`);
}Voice (TTS / STT / live)
// Text-to-Speech
const tts = await client.agents.voice.tts("agent-id", {
text: "Hello, how are you?",
voiceName: "Kore",
language: "en-US",
});
const audioBytes = Buffer.from(tts.audio, "base64");
// Speech-to-Text
const stt = await client.agents.voice.stt("agent-id", {
audio: pcmBuffer.toString("base64"),
audioFormat: "pcm",
language: "en-US",
});
console.log(stt.transcript);
// Live bidirectional voice (WebSocket, Gemini Live)
const token = await client.agents.voice.getToken("agent-id", {
voiceName: "Kore",
language: "en-US",
userId: "user-123",
});
const stream = await client.agents.voice.stream(token);
stream.sendText("Hello!");
// or: stream.sendAudio(pcm16kHzMonoBytes);
for await (const event of stream) {
if (event.type === "input_transcript") console.log("User:", event.text);
if (event.type === "output_transcript") console.log("Agent:", event.text);
if (event.type === "audio") playPCM(event.audio); // 24 kHz PCM
if (event.type === "session_ended") break;
}
stream.close();Evaluation & simulation
// One-off evaluation
const result = await client.agents.evaluate("agent-id", {
messages: [
{ role: "user", content: "I'm feeling sad today" },
{ role: "assistant", content: "I'm sorry to hear that..." },
],
templateId: "template-uuid",
});
console.log(result.score, result.feedback);
// Streaming simulation
for await (const event of client.agents.simulate("agent-id", {
userPersona: { name: "Alex", background: "College student" },
config: { max_sessions: 3, max_turns_per_session: 10 },
})) {
console.log(`[${event.type}] ${event.message}`);
}
// Fire-and-forget — returns RunRef, reconnect later
const ref = await client.agents.simulateAsync("agent-id", {
userPersona: { name: "Alex" },
config: { max_sessions: 2 },
});
for await (const event of client.evalRuns.streamEvents(ref.run_id, 0)) {
console.log(`[${event.type}] ${event.message}`);
}
// Combined simulation + evaluation
for await (const event of client.agents.runEval("agent-id", {
templateId: "template-uuid",
userPersona: { name: "Alex" },
simulationConfig: { max_sessions: 2 },
})) {
console.log(`[${event.type}] ${event.message}`);
}
// Re-evaluate an existing run with a different template
for await (const event of client.agents.evalOnly("agent-id", {
templateId: "new-template-uuid",
sourceRunId: "existing-run-uuid",
})) {
console.log(`[${event.type}] ${event.message}`);
}
// Templates & runs
await client.evalTemplates.list();
const template = await client.evalTemplates.create({
name: "Empathy Check",
scoringRubric: "...",
categories: [
{ name: "Awareness", weight: 0.5, criteria: "..." },
{ name: "Response", weight: 0.5, criteria: "..." },
],
});
await client.evalTemplates.update(template.id, { name: "v2" });
await client.evalTemplates.delete(template.id);
await client.evalRuns.list({ agentId: "agent-id", limit: 20, offset: 0 });
await client.evalRuns.get("run-id");
await client.evalRuns.delete("run-id");Webhooks
// Register / update
const resp = await client.webhooks.register("agent.message.created", {
webhookUrl: "https://example.com/hook",
authHeader: "Bearer your-secret", // optional; added to delivery requests
});
console.log("signing secret:", resp.signing_secret);
// Inspect
await client.webhooks.list();
await client.webhooks.listDeliveryAttempts("agent.message.created");
// Rotate & delete
await client.webhooks.rotateSecret("agent.message.created");
await client.webhooks.delete("agent.message.created");
// Project-scoped variants
await client.webhooks.registerForProject("project-id", "agent.created", { webhookUrl });
await client.webhooks.listForProject("project-id");
await client.webhooks.deleteForProject("project-id", "agent.created");Verify incoming webhooks on your endpoint with HMAC-SHA256 over the raw body using the signing_secret you received at registration:
import { createHmac, timingSafeEqual } from "node:crypto";
function verify(payload: Buffer, header: string, secret: string) {
const expected = createHmac("sha256", secret).update(payload).digest("hex");
const a = Buffer.from(expected, "hex");
const b = Buffer.from(header, "hex");
return a.length === b.length && timingSafeEqual(a, b);
}Platform models & analytics
const { providers, default_model } = await client.listModels();
await client.analytics.usage({ start: "2026-01-01", end: "2026-03-01" });
await client.analytics.costs({ projectId: "project-id" });
await client.analytics.realtime();
await client.workbench.advanceTime("agent-id", { hours: 24 });Error handling
import {
Sonzai,
SonzaiError, // base class
AuthenticationError, // 401
PermissionDeniedError, // 403
NotFoundError, // 404
BadRequestError, // 400
RateLimitError, // 429 — has `retryAfter?: number` (ms)
InternalServerError, // 5xx
APIError, // generic, has `statusCode: number`
StreamError, // SSE / WebSocket streaming
} from "@sonzai-labs/agents";
try {
const res = await client.agents.chat({ agent: "agent-id", messages: [...] });
} catch (err) {
if (err instanceof AuthenticationError) {
console.log("Invalid API key");
} else if (err instanceof NotFoundError) {
console.log("Agent not found");
} else if (err instanceof RateLimitError) {
console.log(`Rate limited, retry after ${err.retryAfter}ms`);
} else if (err instanceof SonzaiError) {
console.log(`API error: ${err.message}`);
}
}Pagination
Two patterns are used depending on the endpoint:
// Cursor-based (agents list)
const page1 = await client.agents.list({ pageSize: 20 });
const page2 = await client.agents.list({ pageSize: 20, cursor: page1.next_cursor });
// Offset-based (most list endpoints)
const runs = await client.evalRuns.list({ agentId: "agent-id", limit: 50, offset: 0 });
const facts = await client.agents.memory.listFacts("agent-id", { limit: 50, offset: 50 });Types
All request and response types are exported from the root entry point:
import type {
SonzaiConfig,
ChatMessage, ChatOptions, ChatResponse, ChatStreamEvent, ChatUsage,
MemoryNode, AtomicFact, MemoryResponse,
PersonalityProfile, Big5, PersonalityDimensions,
SimulationEvent, EvalTemplate, EvalRun,
// ...and many more
} from "@sonzai-labs/agents";Most types are regenerated from the committed OpenAPI spec; a few SDK-specific options (SonzaiConfig, ChatOptions, streaming events) are hand-written.
Runtime compatibility
| Runtime | Version | Status | |---------|---------|--------| | Node.js | ≥ 18 | Full support | | Bun | ≥ 1.0 | Full support | | Deno | ≥ 1.28 | Full support |
The SDK uses only the standard Web API (fetch, ReadableStream, TextDecoder, URL, AbortController) with no runtime-specific dependencies. Package ships both ESM (dist/index.js) and CJS (dist/index.cjs) builds with matching type definitions.
Benchmarks
Sonzai leads on three independent benchmarks (LoCoMo, LongMemEval, SOTOPIA), running on the cheap end of the LLM stack — chat, judge, reader, and partner agent all run on Gemini 3.1 Flash Lite. No frontier-model arms race propping up the numbers; the lift is from the memory architecture. Drop in a heavier model and the ceiling goes up from there.
LoCoMo — long-term conversational memory (mem0's home turf)
10 peer-to-peer dialogues, 19–35 sessions each, 1540 QAs across 4 reasoning
categories. Run via mem0's published evaluation pipeline byte-for-byte
(their ANSWER_PROMPT + ACCURACY_PROMPT, dual-perspective ingest, dual
search) so numbers are directly comparable.
| Category | n | Sonzai (J) | mem0 (J, published) | |---|---:|---:|---:| | 1. single-hop | 282 | 0.720 | ~0.65 | | 2. multi-hop | 321 | 0.723 | ~0.55 | | 3. temporal reasoning | 96 | 0.531 | ~0.55 | | 4. open-domain | 841 | 0.762 | ~0.71 | | Overall | 1540 | 0.732 ✅ | ~0.67 |
Multi-hop is Sonzai's strongest category (+~17 points over mem0) — the hardest LoCoMo bucket and the one mem0's graph variant typically claims its lift on. Sonzai matches/beats without graph-specific machinery.
LongMemEval — retrieval (MemPalace's home turf)
| Metric | Sonzai | MemPalace (hybrid_v4) | |---|---:|---:| | R@G (overall recall) | 0.773 | 0.741 | | R@1 (top-hit accuracy) | 0.800 | 0.770 | | Recall@10, multi-session | 1.000 | 1.000 |
SOTOPIA longitudinal — compounding across sessions
Sonzai's USP: agents that compound. Same agent, same partner, N sessions,
advance_time between each. Canonical SOTOPIA scores session 1 only — we
also run it at s10, s20, s30 and add an 8th judge-scored dim
memory_continuity (0..10) grading whether the agent treats the
relationship as continuous with prior sessions.
Head-to-head at session 1 (no accumulated memory, standard SOTOPIA):
| Dimension (session 1) | Sonzai | MemPalace | Δ | |---|---:|---:|---:| | Believability (0..10) | 9.00 | 9.00 | tie | | Relationship (−5..5) | 4.25 | 4.00 | +0.25 | | Knowledge (0..10) | 7.75 | 6.50 | +1.25 | | Goal (0..10) | 9.00 | 8.75 | +0.25 | | Overall | 8.44 | 8.03 | +0.41 ✅ |
Sonzai improves across sessions (same agent, rolling history):
| Dim | s1 | s10 | s20 | s30 | Δ s1→s30 |
|---|---:|---:|---:|---:|---:|
| Believability (0..10) | 9.00 | 9.75 | 9.62 | 10.00 (ceiling) | +1.00 ↑ |
| Relationship (−5..5) | 4.25 | 5.00 | 4.75 | 5.00 (ceiling) | +0.75 ↑ |
| Knowledge (0..10) | 7.75 | 8.50 | 7.75 | 8.50 | +0.75 ↑ |
| Goal (0..10) | 9.00 | 9.75 | 9.50 | 9.75 | +0.75 ↑ |
| memory_continuity (0..10) | 5.00 | 10.00 (ceiling) | 9.75 | 10.00 (ceiling) | +5.00 ↑ |
| Overall | 8.44 | 9.45 | 9.38 | 9.56 | +1.13 ↑ |
Every non-floor dim climbs. Believability and relationship hit the rubric
ceiling by s30; memory_continuity hits the ceiling by s10 — Sonzai's
identity model is producing accurate unprompted callbacks before a
verbatim-retrieval baseline has history to compete.
Full scores, methodology, per-question-type breakdown, and reproduction
steps (including comparison against MemPalace's canonical
longmemeval_bench.py):
→ sonzai-python/benchmarks/README.md
Staying in sync with the production API
This SDK tracks https://api.sonz.ai/docs/openapi.json. A git pre-push hook
checks for drift; npm install / bun install auto-enables it via the
prepare script. To refresh the committed spec snapshot, run
bun run sync-spec (or just sync-spec) and commit the diff.
Development
git clone https://github.com/sonz-ai/sonzai-typescript.git
cd sonzai-typescript
bun install # or: npm install
bun test # or: npx vitest run
npx tsc --noEmit # type check
bun run build # or: npx tsupLicense
MIT License — see LICENSE for details.
