cardan
v0.6.0
Published
Unified, zero-dependency adapter for major LLM provider APIs
Downloads
938
Maintainers
Keywords
Readme
cardan
Unified TypeScript adapter for major LLM provider APIs. Zero runtime dependencies — adapters speak HTTP via native fetch. Runs on Node ≥ 20, Deno, and edge runtimes.
See DESIGN.md for goals, non-goals, and provider tiers.
Status
0.x — API is unstable until raven/ticks migrate onto it. Implemented: core schema + Anthropic, OpenAI (Responses API), Google (Gemini API), xAI, Groq, and Modal (self-deployed, Chat Completions) adapters (generate, streaming, tools, structured output, thinking, vision; OpenAI, Google, and Modal also embeddings).
Usage
import { createCardan } from "cardan";
const cardan = createCardan(); // reads ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY / XAI_API_KEY / GROQ_API_KEY from env
// generate
const result = await cardan.generate({
model: "anthropic/claude-opus-4-8",
messages: [{ role: "user", content: [{ type: "text", text: "Hello" }] }],
});
console.log(result.message, result.usage, result.finishReason);
// streaming
for await (const event of cardan.stream({ model: "anthropic/claude-opus-4-8", messages })) {
if (event.type === "text_delta") process.stdout.write(event.text);
}
// tools (parameters accept plain JSON Schema or a zod 4 schema)
await cardan.generate({
model: "anthropic/claude-opus-4-8",
messages,
tools: [{ name: "get_weather", description: "…", parameters: { type: "object", properties: { city: { type: "string" } } } }],
});
// structured output — zod schemas are converted and the result validated via .parse()
const extracted = await cardan.generate({
model: "anthropic/claude-opus-4-8",
messages,
output: { schema: z.object({ name: z.string() }) },
});
console.log(extracted.output);
// built-in web search — the provider runs the searches server-side and the
// answer comes back with citations. `true` for defaults, or an options object.
const searched = await cardan.generate({
model: "anthropic/claude-opus-4-8",
messages,
webSearch: { maxUses: 5, allowedDomains: ["arxiv.org"] },
});
console.log(searched.message, searched.citations); // [{ url, title?, snippet? }, …]Per-provider use without the provider/ prefix:
import { AnthropicProvider } from "cardan";
const anthropic = new AnthropicProvider({ apiKey: "sk-…" });
await anthropic.generate({ model: "claude-opus-4-8", messages });collectStream(stream) accumulates a stream into a GenerateResult-shaped value (message + finishReason + usage); collectStreamToMessage(stream) returns just the assistant Message, ready to push back into the next request. See Reasoning / thinking state.
Conversation
cardan.conversation(options) returns a stateful Conversation that holds a running transcript and collapses the "push user → generate → push assistant" dance into one ask. Every generation option (model, reasoning, tools, …) is a default carried on defaults, overridable per turn and reassignable mid-conversation — model is not privileged.
import { defineTool, type Infer } from "cardan";
import { z } from "zod";
const c = cardan.conversation({
model: "anthropic/claude-opus-4-8",
system: "You are a research assistant.",
reasoning: { effort: "high" }, // a default for every turn
label: "research", // tag for onCall telemetry
onCall: (i) => console.log(`${i.tag} ${i.model} ${i.ms}ms ${i.usage.output.total}tok ${i.finishReason ?? i.error}`),
});
// Tools: defineTool infers the handler args from the schema (no casts).
const search = defineTool(
{ name: "web_search", description: "Search the web.", parameters: z.object({ query: z.string() }) },
({ query }) => runSearch(query), // query: string
);
// With tools, ask loops model↔tools until it stops. `compact` then rewrites the
// round-trips so their bulky raw outputs aren't replayed on later turns — the
// default keeps the tool-use trace but blanks the result bodies.
await c.ask("Research the topic and conclude.", { tools: [search], compact: true, step: "research" });
// Structured output is just an option on ask: pass `output.schema` and read the
// parsed (zod-validated) value off the result — type it yourself if you want.
const res = await c.ask("Emit the final report as JSON.", { output: { schema: reportSchema } });
const report = res.output as Infer<typeof reportSchema>;
c.defaults.model = "openai/gpt-5.4"; // switch model for all later turnsask adds a user turn, generates, appends the reply, and returns the GenerateResult; with tools it loops (maxRounds caps the loop, forcing a tool-free conclusion on the last round). Structured output stays a plain option (output.schema) — res.output holds the parsed value; cast it with the exported Infer<typeof schema> helper if you want the static type. output and tools can't be combined in one ask (structured output is constrained decoding, which blocks tool calls, so ask throws) — run the tool loop first, then ask again with output, as the examples do. cardan logs nothing itself: pass onCall to receive per-call telemetry (tag, model, ms, usage, citations, finishReason on success / error on failure) and format it however you like.
compact keeps a tool-using turn from bloating later context. The default compactor (redactToolResults) keeps the tool-call/result structure — so the model still sees it reached the conclusion by using tools and won't mistake it for innate knowledge — but replaces each result's payload with a short placeholder (this also keeps raw page content from tripping provider filters on replay). Pass your own Compactor ((region: Message[]) => Message[]) to customize, e.g. the built-in dropToolRounds (keep only the conclusion) or an LLM-written summary.
fork(overrides?) branches a conversation: the copy shares the client/defaults but gets an independent transcript, so diverging turns never touch the original — use it before fanning work out in parallel (a shared mutable transcript would corrupt).
Flow
createFlow<State>()(config?) is a tiny runner for multi-step work over a shared typed state. There is no graph to declare — a step is just an async function, and it decides what runs next by returning goto(nextStep, patch?). Branching, loops, and fan-out live in ordinary code (if, recursion, an array of steps); a step that returns without a goto ends its branch. No edge table, no END sentinel.
A full pipeline — screen → investigate → report — showing a self-loop, intra-step parallelism, structured output, and ending by returning without a goto:
import { createFlow, goto, parallel, withConversations } from "cardan";
import type { ConversationContext, FlowEvent, Step } from "cardan";
import { z } from "zod";
type State = {
candidates: Event[];
picked: string[];
reports: Record<string, Report>;
attempt: number;
};
const flow = createFlow<State>()({
maxSteps: 25, // cap supersteps (loop guard)
reducers: { reports: (a, b) => ({ ...a, ...b }) }, // merge concurrent writes to `reports`
extendCtx: withConversations(cardan), // every step gets ctx.conversation(...)
});
// 1) Screen candidates. If none are picked, loop back to retry (bounded by `attempt`).
const screen: Step<State, ConversationContext> = async (s, ctx) => {
const c = ctx.conversation({ model: "anthropic/claude-opus-4-8", label: "screen" });
const res = await c.ask(`Pick the noteworthy events:\n${format(s.candidates)}`, {
output: { schema: z.object({ ids: z.array(z.string()) }) },
});
const picked = (res.output as { ids: string[] }).ids;
if (picked.length === 0 && s.attempt < 2) {
return goto(screen, { attempt: s.attempt + 1 }); // self-loop = retry
}
return goto(investigate, { picked }); // route on to the next step
};
// 2) Investigate each picked item concurrently — fan out *inside* one step with
// parallel(). Each item gets its own conversation, so transcripts don't collide;
// every ctx.conversation call surfaces as an `llm` flow event tagged "inv <id>".
const investigate: Step<State, ConversationContext> = async (s, ctx) => {
const entries = await parallel(s.picked, async (id, _i, signal) => {
const c = ctx.conversation({ model: "anthropic/claude-opus-4-8", label: `inv ${id}` });
await c.ask(`Research ${id}.`, { tools: [search], compact: true, signal });
const res = await c.ask("Emit the report.", { output: { schema: reportSchema }, signal });
return [id, res.output as Report] as const;
}, { concurrency: 4, signal: ctx.signal }); // ≤ 4 in flight; signal threads into each ask
return goto(report, { reports: Object.fromEntries(entries) });
};
// 3) Publish and stop — returning without a goto ends the flow.
const report: Step<State> = async (s) => {
await publish(s.reports);
};
const onEvent = (e: FlowEvent) => console.log(`${e.type} ${e.name ?? ""}`);
const final = await flow.run(screen, { candidates, picked: [], reports: {}, attempt: 0 }, { onEvent });Routing lives in the step: return goto(next, patch?) to continue, goto([a, b], patch?) to fan out (those steps run in parallel next superstep), or a plain patch / nothing to stop. Loops are a step that gotos itself; conditional routing is an if/switch choosing the next step. Execution is by superstep — steps scheduled together run concurrently on the same state snapshot, their patches merge at the barrier (a reducer is required for any key two steps write at once, else it throws), then the next set runs. maxSteps caps the superstep count (the loop guard); concurrency caps in-flight tasks inside one parallel() call.
Step-level fan-out is for branches with different logic that converge on a join — steps that goto the same function run once on the merged state:
const plan: Step<S> = () => goto([searchWeb, searchDocs]); // fan out
const searchWeb: Step<S> = async () => goto(synthesize, { web: await web() });
const searchDocs: Step<S> = async () => goto(synthesize, { docs: await docs() });
const synthesize: Step<S> = (s) => ({ answer: combine(s.web, s.docs) }); // joins, runs onceParallel, two ways: do the same work over N items inside a step with parallel() (the common case — concurrency-limited, order-preserving); use step-level fan-out only when branches have different downstream routing. onEvent receives step:start / step:end / error, plus llm events for every call made through ctx.conversation.
Behavior notes
- Message normalization (applied before every request): consecutive same-role messages merge;
tool_resultparts are relocated into atoolmessage directly after theirtool_call, in call order; a danglingtool_call(no result) gets a synthesized error result (isError: true) so aborted conversations stay replayable; an orphan or duplicatetool_resultthrowsinvalid_request. - System messages: leading system messages hoist to the provider's top-level system field (Anthropic
system, GeminisystemInstruction); mid-conversation system messages downgrade to user text. OpenAI's Responses API acceptssystemanywhere ininput, so system messages pass through in place. - Anthropic
oauth: pass{ oauth: { credentials: { accessToken, refreshToken?, expiresAt? }, onRefresh? } }to authenticate with a Claude.ai OAuth token instead ofapiKey. Sends Bearer auth, refreshes before expiry (persist the rotated token viaonRefresh), and retries once on 401. - OpenAI is stateless by default: every request sends
store: false+include: ["reasoning.encrypted_content"]; context is replayed frommessagesand reasoning items survive multi-turn tool use viaencrypted_content(held inThinkingPart.signaturewith the item id inThinkingPart.id). Override viaproviderOptions. The Responses API has no stop-sequence parameter, sostopSequencesis ignored there. - Background mode (
background?: boolean, OpenAI/xAI Responses only): keeps long high-effort generations from dropping on idle-connection timeouts by decoupling execution from the HTTP connection.undefined(default) auto-enables it forhigh/xhigh/maxreasoning effort;true/falseforce it. It forcesstore: true(so it's not ZDR-compatible; data is retained ~10 min):generatecreates the response then pollsGET /v1/responses/{id}to completion, andstreamtransparently resumes a dropped SSE viastarting_after. Other providers ignore the flag (use streaming for long requests there). The total time is bounded by yoursignal. - xAI speaks the same Responses API (its Chat Completions endpoint is documented as legacy), so the adapter subclasses the OpenAI one and inherits the stateless defaults and background mode above. Differences:
reasoning.effortcaps athigh(grok-4.3+ only — omitreasoningfor older models), nosummaryparameter is sent (xAI always returns detailed reasoning summaries), grok models keeptemperature/top_p, and there is no embeddings API. - Groq speaks the Chat Completions API (
/openai/v1/chat/completions) — Groq's Responses API is beta and rejects thestore/includeparameters the stateless OpenAI adapter depends on. Reasoning models (gpt-oss, qwen3) always getreasoning_format: "parsed", so thinking arrives inmessage.reasoning→ thinking parts (no signature; never replayed).reasoning.effort→reasoning_effort: gpt-oss gradeslow/medium/high(xhigh/maxcap tohigh), qwen3 only acceptsnone/defaultso graded efforts are omitted;enabled: false→"none"(qwen3 only — gpt-oss cannot disable reasoning). Omitreasoningfor non-reasoning models. Structured output sendsstrict: true(constrained decoding) on gpt-oss and best-effort mode elsewhere; models withoutjson_schemasupport (llama-3.x) reject it. Prompt caching is automatic (cache_readin usage details); oversized prompts (413) map tocontext_length; no embeddings API. - Modal is for self-deployed models behind Modal web endpoints (vLLM/SGLang), which speak the Chat Completions API.
baseUrlis required (per-deployment*.modal.runURL; orMODAL_BASE_URL). Auth is optional and dual-track:apiKey→Authorization: Bearer(vLLM/SGLang--api-key;MODAL_API_KEY) and/orproxyAuth→Modal-Key/Modal-Secretheaders (Modal Proxy Auth Tokens;MODAL_KEY/MODAL_SECRET). Responses'reasoning_contentmaps to thinking parts; thinking is never replayed (Chat Completions has no replay format).reasoning.effort→reasoning_effort(caps athigh; unsupported servers reject it — omitreasoningthen),reasoning.enabledis ignored (useproviderOptions, e.g. vLLMchat_template_kwargs). Sendsmax_tokensfor compatibility;embedhits/v1/embeddingsif the deployment serves an embedding model. - Web search (
webSearch: boolean | WebSearchOptions): a first-class option, not aTool— it's a server-side tool, so the provider runs the searches and returns a finished answer withcitations({ url, title?, snippet? }[], also on thefinishstream event).WebSearchOptions(maxUses,allowedDomains,blockedDomains,userLocation,contextSize) is the cross-provider subset; each adapter maps what it supports and ignores the rest (provider-specific knobs go throughproviderOptions). Routing: Anthropic/OpenAI/xAI server tools, Gemini Google Search grounding, Groq built-inbrowser_search(gpt-oss; incompatible with structured output) or automatic compound search. Requesting it on a model that can't do web search throwsinvalid_request(Modal never can). Anthropic'spause_turn(server tool-loop limit) is resumed transparently, so a single call still returns a finished turn. Citations are a normalized source list (the cross-provider common denominator); provider-specific inline-span data stays inraw. - Usage:
input.totalincludes cached tokens; breakdown indetails(cache_read,cache_write,reasoning, and the web-search request count underweb_search_requests— a billed request tally, not tokens). - Retry: 429/529/5xx/network errors retry with exponential backoff (default 2 retries), honoring
Retry-After(and Gemini'sRetryInfo.retryDelay). Disable withretry: false. Streams only retry before the first byte. - Capability table: models that reject sampling params (Fable 5 / Mythos 5 / Opus 4.7+, OpenAI o-series / non-chat gpt-5*) have
temperature/topPdropped silently; Gemini 3 mapsreasoning.efforttothinkingLevel, Gemini 2.x tothinkingBudget. reasoning:{ enabled: true }→ Anthropic adaptive thinking / GeminiincludeThoughts/ OpenAIreasoning.summary: "auto";effort→ Anthropicoutput_config.effort/ Gemini thinking level or budget / OpenAIreasoning.effort(maxcaps toxhigh;enabled: false→effort: "none", gpt-5.1+ only). UseproviderOptionsfor anything provider-specific (e.g. beta headers go in providerheaders).- Thinking parts: replayed with their
signature; unsigned thinking parts are dropped on send;redacted: truemaps to Anthropicredacted_thinking. - Tool call ids: provider-assigned ids are preserved verbatim. Gemini 2.x omits function-call ids, so the adapter synthesizes
cardan_call_…ids for pairing and strips them on replay; GeminithoughtSignatures ride onsignatureof text/thinking/tool_call parts and are required for Gemini 3 function-calling replay (preserved identically in streaming and non-streaming — see below). - Gemini files: image/file input supports inline bytes (
inlineData) andURL→fileData.fileUripassthrough (Files API URIs); cardan does not wrap the File API itself.embedusesbatchEmbedContents, which returns no usage metadata. - Errors: all failures are
CardanErrorwithcode(auth/rate_limit/overloaded/context_length/invalid_request/not_found/server/network/aborted/unknown),status,retryable, and the raw provider body inraw.
Reasoning / thinking state
Providers return opaque reasoning state that must be replayed verbatim for multi-turn / tool-use loops to keep working. cardan normalizes it onto ThinkingPart/TextPart/ToolCallPart and replays it to the same provider:
- Anthropic —
thinkingblocks carrysignature;redacted_thinkingcarries opaquedata(mapped tosignaturewithredacted: true). Both are replayed unchanged and in order; unsigned thinking is dropped on send. - OpenAI / xAI — stateless by default (
store: false+include: ["reasoning.encrypted_content"]). The encrypted reasoning item is held inThinkingPart.signaturewith its item id inThinkingPart.id; both are required to replay, so summary-only thinking (noencrypted_content) is dropped. For server-side state instead, passprevious_response_idviaproviderOptions. - Gemini — every
Part(text, thought, orfunctionCall) may carry athoughtSignature; it rides onsignatureand is sent back on the original Part. Signed Parts are never merged with each other or with unsigned Parts. Function-callids are preserved and echoed in the matchingfunctionResponse.
Streaming and non-streaming preserve the same replay-critical state. Signatures, encrypted reasoning content, ids, and tool-call signatures all survive collection identically.
Use collectStream(stream) / collectStreamToMessage(stream) to capture a streamed turn — they reassemble the parts (including signatures) correctly. If you consume stream events yourself, you must retain the signature field on text_delta/thinking_delta deltas, thinking_signature events, and tool_call event signatures; dropping them loses reasoning state and breaks the next turn. Push the collected Message back into messages as-is — don't reduce a tool-use turn to its text.
Opaque state is provider-specific: replay a reasoning-bearing turn to the same provider that produced it. cardan does not strip foreign signatures, so feeding one provider's thinking/reasoning parts to another will send invalid opaque state — start a fresh turn (or drop the thinking parts) when switching providers mid-conversation.
Development
npm install
npm run typecheck
npm test # fixture unit tests (no network)
npm run build