@psci-labs/chat-runtime

v0.3.0

Published

3 days ago

Node agent runner for the Claude Agent SDK chat surface — Web-standard Request → Response handler with SSE streaming, MCP plumbing, and a pluggable persistence interface

0High
0Medium
0Low

blaize-at-psci

keenanb

`@psci-labs/chat-runtime`

Node agent runner for the Claude Agent SDK chat surface. Returns a Web-standard Request → Response handler with SSE streaming, MCP plumbing, and a pluggable persistence interface.

pnpm add @psci-labs/chat-runtime @anthropic-ai/claude-agent-sdk

@anthropic-ai/claude-agent-sdk is a peer dependency — pin the version your app wants and the runtime forwards through to it. @psci-labs/chat-protocol is a direct dependency (no need to install separately).

Persistence

import { InMemoryPersistence, type PersistenceAdapter } from '@psci-labs/chat-runtime';

const adapter: PersistenceAdapter = new InMemoryPersistence();

The interface (in src/persistence/adapter.ts) is intentionally narrow:

saveCheckpoint({ threadId, sessionId, sdkVersion, state, metadata }) — sdkVersion is required so SDK shape drift fails at the boundary.
loadCheckpoint(threadId) — returns the latest checkpoint for the thread, or null.
appendMessage(threadId, sessionId, message) / listMessages(threadId, sessionId?) — replay history, optionally scoped to a single session.
archiveSession(threadId, sessionId) — lifecycle hook; messages stay readable.

Reusable contract test

Every adapter is tested against the same suite via @psci-labs/chat-runtime/testing:

import { describe } from 'vitest';
import { runPersistenceAdapterContract } from '@psci-labs/chat-runtime/testing';
import { PostgresPersistence } from '../src/index.js';

describe('PostgresPersistence', () => {
  runPersistenceAdapterContract(() => new PostgresPersistence({ ...config }));
});

The factory must return a fresh empty adapter per call; the contract invokes it once per test case so suites stay isolated.

Auth (`getUserContext`)

The runtime is auth-agnostic. Apps inject a callback:

import { resolveUserContext, type GetUserContext } from '@psci-labs/chat-runtime';

const getUserContext: GetUserContext = async (req) => {
  const session = await getServerSession(req);
  return session ? { userId: session.user.id } : null;
};

Returning null is the contract for "unauthenticated" — the runtime answers 401 with { error: 'Unauthorized' }. Throwing from the callback is reserved for genuine bugs in the integration and surfaces as 500.

The shape returned is validated against the UserContext schema from @psci-labs/chat-protocol, so a misshapen context (empty userId, wrong attributes type) fails loudly at the boundary.

System prompt

Three input shapes — passed in via createAgentRunner({ systemPrompt }):

// Plain string — full custom prompt, SDK preset bypassed
systemPrompt: 'You are a billing-workflow specialist.'

// Bare preset — use the SDK's claude_code preset as-is
systemPrompt: { preset: 'claude_code' }

// Preset + per-request append (static or dynamic)
systemPrompt: {
  preset: 'claude_code',
  append: ({ userContext, threadId }) =>
    `User ${userContext.userId} on thread ${threadId}.`,
}

resolveSystemPrompt(input, ctx) translates these into the SDK's expected shape and is awaited per request, so the dynamic callback can do async work (e.g. read the user's tenant config from a DB).

Built-in MCP servers

Two SDK MCP servers ship with the runtime. Both are pluggable — apps inject the side-effect (notification delivery, user-question UI) via callbacks; the servers own the tool name, description, and Zod schema.

import { createNotificationsServer, createInteractionServer } from '@psci-labs/chat-runtime';

const notifications = createNotificationsServer({
  onNotify: async ({ type, title, body }) => {
    await pushNotificationService.send({
      /* ... */
    });
  },
});

const interaction = createInteractionServer({
  askUser: async (questions) => {
    // session manager wires this up — resolves when the user replies
    return await sessionManager.askUser(threadId, questions);
  },
});

Renderer keys (used by chat-ui to dispatch to a renderer):

mcp__notifications__notify — type (progress/completed/error/info), title, body
mcp__interaction__ask_user — array of questions; supports freeform / single-select / multi-select / yes-no

For unit tests, the handler is also exported as makeNotifyHandler / makeAskUserHandler so you can exercise the business logic without spinning up the SDK.

SSE encoder

import { encodeSSE } from '@psci-labs/chat-runtime';

const stream = encodeSSE(eventIterable, { signal: req.signal });
return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});

pull-based ReadableStream — backpressure is automatic. Custom formatter hook lets you set the SSE event: line when you want typed events. Stream cancellation calls iter.return?.() so the upstream agent can release resources promptly.

Session manager

The session manager is the runtime's brain. It owns:

The active-sessions map keyed by threadId
A MessagePump<UserPrompt> per session (queues follow-up messages while the agent is busy, drains as the SDK pulls)
Lifecycle: start / continueSession / terminate / clearContext
Persistence checkpointing at session boundaries (sdkVersion always recorded; state carries previous-session IDs and cost/turn metrics)
respondToTool(threadId, answer) to resolve the mcp__interaction__ask_user waiter

The SDK call is injected via a runAgent: (opts) => AsyncIterable<SDKEvent> factory, so tests drive the manager with canned event sequences and the Phase 2E runner factory binds the real query() call (with per-session MCP wiring).

import { SessionManager, InMemoryPersistence } from '@psci-labs/chat-runtime';
import { query } from '@anthropic-ai/claude-agent-sdk';

const mgr = new SessionManager({
  persistence: new InMemoryPersistence(),
  sdkVersion: '0.1.77',
  runAgent: ({ prompts, signal, resumeSessionId }) =>
    query({ prompt: prompts, options: { signal, resume: resumeSessionId } }),
});

const events = await mgr.start({ threadId, userMessage: { text: 'Hello' } });
// pipe through encodeSSE → return as Response

HTTP handlers

Four standalone handlers, one per route. Each takes (req, ctx) where ctx carries the resolved userContext, threadId, sessionManager, and persistence. The router (Phase 2E) is responsible for parsing the URL and resolving auth before dispatching.

| Handler | Route | Behavior | | --------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | handleStream | POST /threads/:id/stream | Body { message: { text } }. Starts a session, returns text/event-stream. 400 invalid body, 409 session already running. Forwards req.signal so client disconnects abort the agent. | | handleHistory | GET /threads/:id/history | Optional ?sessionId. Returns { messages: StoredMessage[] }. | | handleCancel | POST /threads/:id/cancel | Idempotent. terminate(threadId) → 200. | | handleClear | POST /threads/:id/clear | Idempotent. clearContext(threadId) → 200. The current session is archived; its sessionId is queued for the next session's previousSessionIds. |

Status codes are surfaced explicitly so the UI can react: 409 says "the agent is busy, you might want to cancel"; 400 says "your client sent something I can't parse." Auth-derived 401s come from the router, not the handlers.

Putting it all together: `createAgentRunner`

// app/api/chat/[...path]/route.ts (Next.js App Router)
import { createAgentRunner } from '@psci-labs/chat-runtime';
import { getServerSession } from 'next-auth';

const runner = createAgentRunner({
  apiKey: process.env.ANTHROPIC_API_KEY,
  model: 'claude-sonnet-4-6',

  getUserContext: async (req) => {
    const session = await getServerSession();
    return session ? { userId: session.user.id } : null;
  },

  systemPrompt: { preset: 'claude_code', append: 'Use only SharePoint MCP tools.' },
  allowedTools: ['Read', 'Grep', 'Glob', 'WebSearch'],

  mcpServers: { sharepoint: sharepointMcpServer },

  onNotify: async ({ type, title, body, userContext, threadId }) => {
    await pushService.send(userContext.userId, { type, title, body, threadId });
  },
  onAskUser: async ({ questions, threadId }) => {
    return await ui.deliverAndAwait(threadId, questions);
  },
});

export const { GET, POST } = runner.toNextHandlers();

The factory:

Defaults persistence to InMemoryPersistence if omitted
Pins sdkVersion to the bundled SDK release (overridable via sdkVersion)
Defaults disallowedTools to ['AskUserQuestion'] so the SDK's built-in AskUser stops shadowing the mcp__interaction__ask_user MCP renderer. Pass an explicit empty array (disallowedTools: []) to opt out and allow the SDK's version.
Always registers the interaction MCP server. If you don't supply an onAskUser callback, the runtime supplies a default that resolves via the session manager's pendingToolResponse slot — wired to POST /threads/:id/respond. Apps with custom AskUser delivery (Slack-mediated, server-side automation, etc.) override the default by supplying their own onAskUser.
Builds the per-session notifications MCP server only when onNotify is supplied
Exposes sessionManager and persistence on the returned AgentRunner so apps can read history or call respondToTool outside the HTTP layer
Accepts a runAgent override for tests and the future opencode/pi adapter

runner.handle(req) is also a Web-standard handler — works under any framework that exposes Request/Response (Hono, Fastify with @fastify/web-fetch, plain Node 22+ HTTP server, etc.).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme