structured-llm

v0.3.1

Published

24 days ago

Provider-agnostic TypeScript library for Zod-validated, fully-typed structured output from any LLM

npm install structured-llm zod

import { generate } from "structured-llm";
import { z } from "zod";

const { data } = await generate({
  client: openai,       // pass your existing OpenAI / Anthropic / Gemini / Mistral client
  model: "gpt-4o-mini",
  schema: z.object({
    sentiment: z.enum(["positive", "negative", "neutral"]),
    score: z.number().min(0).max(1),
    tags: z.array(z.string()),
  }),
  prompt: "Analyze: The new MacBook completely changed how I work.",
});

console.log(data.sentiment); // "positive"
console.log(data.score);     // 0.94
console.log(data.tags);      // ["productivity", "hardware", "apple"]
// fully typed — no casting, no guessing

Why another structured output library?

You have a few options today:

| | structured-llm | Vercel AI SDK | instructor-js | |---|---|---|---| | Bring your own client | yes | no (their SDK) | partial | | Zero runtime dependencies | yes | no | no | | 14 providers | yes | yes | OpenAI only | | Streaming partial objects | yes | yes | no | | Fallback chain | yes | no | no | | Retry with error feedback | yes | basic | yes | | Standard Schema (Valibot, ArkType) | yes | no | no | | Custom schema (no Zod) | yes | no | no | | Works with local Ollama | yes | limited | no | | AWS Bedrock | yes | no | no |

structured-llm has one job: take any LLM client you already have, take a Zod schema you already wrote, give back a typed object. No ecosystem lock-in.

Installation

npm install structured-llm zod
# or
pnpm add structured-llm zod
# or
yarn add structured-llm zod

Install only the provider SDKs you actually use:

npm install openai                       # OpenAI, Groq, xAI, Together, Fireworks, Ollama, Azure
npm install @anthropic-ai/sdk            # Anthropic
npm install @google/genai                # Gemini
npm install @mistralai/mistralai         # Mistral
npm install cohere-ai                    # Cohere
npm install @aws-sdk/client-bedrock-runtime  # AWS Bedrock

Requires: Node.js 18+, TypeScript 5+ (strict mode recommended)

Core functions

`generate`

Extracts a single structured object from the LLM.

import OpenAI from "openai";
import { z } from "zod";
import { generate } from "structured-llm";

const openai = new OpenAI(); // reads OPENAI_API_KEY from env

const InvoiceSchema = z.object({
  vendor: z.string(),
  amount: z.number(),
  currency: z.string().length(3),
  dueDate: z.string().describe("ISO 8601 date"),
  lineItems: z.array(z.object({
    description: z.string(),
    quantity: z.number(),
    unitPrice: z.number(),
  })),
  isPaid: z.boolean(),
});

const { data, usage } = await generate({
  client: openai,
  model: "gpt-4o-mini",
  schema: InvoiceSchema,
  prompt: invoiceText,
  systemPrompt: "You are a precise invoice parser.",
  temperature: 0,
  maxRetries: 3,
  trackUsage: true,
});

// data is fully typed as z.infer<typeof InvoiceSchema>
console.log(data.vendor);        // "Acme Corp"
console.log(data.lineItems[0]);  // { description: "...", quantity: 2, unitPrice: 49.99 }
console.log(usage?.estimatedCostUsd); // 0.000043

All options:

generate({
  // Provider — one of these two forms
  client: openai,              // pass an existing client (auto-detected)
  // OR
  provider: "openai",          // reads API key from env (OPENAI_API_KEY)
  apiKey: "sk-...",            // or pass the key directly
  baseURL: "...",              // optional custom endpoint

  model: "gpt-4o-mini",        // required
  schema: MyZodSchema,         // required — Zod, Standard Schema, or custom schema

  // Input — use prompt, messages, or both
  prompt: "...",
  messages: [
    { role: "system", content: "..." },
    { role: "user", content: "..." },
  ],
  systemPrompt: "...",         // shorthand for a system message

  // Extraction
  mode: "auto",                // "auto" | "tool-calling" | "json-mode" | "prompt-inject"

  // Retry
  maxRetries: 3,
  retryOptions: {
    strategy: "exponential",   // "immediate" (default) | "linear" | "exponential"
    baseDelayMs: 500,
  },

  // Generation params
  temperature: 0,
  maxTokens: 1000,
  topP: 1,                     // nucleus sampling
  seed: 42,                    // reproducible outputs (where supported)

  // Cancellation
  signal: abortController.signal,

  // Observability
  trackUsage: false,
  hooks: { ... },

  // Fallback
  fallbackChain: [ ... ],
});

`generateArray`

Extracts a list of items. Pass the schema for a single item, get back an array.

import { generateArray } from "structured-llm";
import { z } from "zod";

const TransactionSchema = z.object({
  date: z.string(),
  merchant: z.string(),
  amount: z.number(),
  category: z.enum(["food", "transport", "shopping", "utilities", "other"]),
});

const { data } = await generateArray({
  client: openai,
  model: "gpt-4o-mini",
  schema: TransactionSchema,     // schema for ONE transaction
  prompt: bankStatementText,
  minItems: 1,                   // hint to the LLM
  maxItems: 100,
});

// data is Transaction[]
const total = data.reduce((sum, t) => sum + t.amount, 0);
console.log(`${data.length} transactions, total: $${total.toFixed(2)}`);

`generateStream`

Streams the response, yielding partial objects as fields come in. Useful for long outputs or real-time UIs.

import { generateStream } from "structured-llm";
import { z } from "zod";

const ReportSchema = z.object({
  title: z.string(),
  executiveSummary: z.string(),
  sections: z.array(z.object({
    heading: z.string(),
    content: z.string(),
    keyPoints: z.array(z.string()),
  })),
  conclusion: z.string(),
  riskLevel: z.enum(["low", "medium", "high"]),
});

const stream = generateStream({
  client: openai,
  model: "gpt-4o",
  schema: ReportSchema,
  prompt: "Write a comprehensive market analysis for the EV industry in 2025.",
  signal: request.signal,   // cancel when the HTTP request is aborted
});

// Iterate over partial updates
for await (const event of stream) {
  if (event.isDone) {
    console.log("Complete:", event.partial.title);
    console.log("Sections:", event.partial.sections?.length);
  } else {
    // Partial<ReportSchema> — render what you have so far
    process.stdout.write(".");
  }
}

// Or just await the final validated result
const { data } = await stream.result;

Automatically retries on rate limits (429, 502, 503, 529) with exponential backoff, rolling back any partial events before retrying.

`generateArrayStream`

Stream array items as they complete. Each event contains the cumulative list of fully-parsed items so far.

import { generateArrayStream } from "structured-llm";
import { z } from "zod";

const stream = generateArrayStream({
  client: openai,
  model: "gpt-4o",
  schema: z.object({ name: z.string(), price: z.number(), category: z.string() }),
  prompt: "List 20 top-selling electronics products for 2025",
});

for await (const { items, isDone } of stream) {
  console.log(`${items.length} items loaded...`);
  if (isDone) renderFinalList(items);
}

// Or await the complete result directly
const { data } = await stream.result;

Each event:

interface ArrayStreamEvent<T> {
  items: T[];          // cumulative array of complete, validated items
  isDone: boolean;
  usage?: UsageInfo;   // only on the final event when trackUsage: true
}

`generateBatch`

Process many inputs against the same schema with controlled concurrency. Handles partial failures, progress callbacks, and aggregated usage stats.

import { generateBatch } from "structured-llm";

const { items, succeeded, failed, totalUsage } = await generateBatch({
  client: openai,
  model: "gpt-4o-mini",
  schema: SentimentSchema,
  inputs: reviews.map((text) => ({ prompt: text })),
  concurrency: 5,          // max parallel API calls (default 3)
  continueOnError: true,   // don't throw on individual failures (default true)
  onProgress: ({ completed, total, succeeded, failed }) => {
    console.log(`${completed}/${total} (${failed} failed)`);
  },
});

console.log(`${succeeded.length}/${items.length} succeeded`);
console.log(`Total cost: $${totalUsage?.estimatedCostUsd?.toFixed(4)}`);

// Results are in original input order
items.forEach(({ index, data, error, durationMs }) => {
  if (error) console.log(`[${index}] failed: ${error.message}`);
  else console.log(`[${index}] ${data.sentiment} (${durationMs}ms)`);
});

`generateMultiSchema`

Run the same input through multiple Zod schemas simultaneously. Useful when you need different structured views of the same document.

import { generateMultiSchema } from "structured-llm";

const { results, totalUsage } = await generateMultiSchema({
  client: openai,
  model: "gpt-4o-mini",
  prompt: contractText,
  schemas: {
    keyTerms: KeyTermsSchema,      // parties, dates, governing law
    risks: RiskAssessmentSchema,   // red flags, severity scores
    obligations: ObligationSchema, // what each party must do
  },
  parallel: true,          // run all schemas concurrently (default true)
  continueOnError: true,   // individual schema failures don't abort others
});

console.log(results.keyTerms.data);    // KeyTerms | undefined
console.log(results.risks.data);       // RiskAssessment | undefined
console.log(results.obligations.data); // Obligations | undefined
console.log(results.risks.error);      // Error | undefined

`createClient`

Pre-configure a client once, call it many times. Useful when you're making lots of calls with the same provider/model/settings.

import { createClient } from "structured-llm";
import OpenAI from "openai";

const llm = createClient({
  client: new OpenAI(),
  model: "gpt-4o-mini",
  defaultOptions: {
    temperature: 0,
    maxRetries: 2,
    trackUsage: true,
    hooks: {
      onSuccess: ({ usage }) => {
        db.insert({ tokens: usage?.totalTokens, cost: usage?.estimatedCostUsd });
      },
    },
  },
});

// All calls inherit the defaults — override per-call as needed
const { data: sentiment } = await llm.generate({
  schema: SentimentSchema,
  prompt: "Analyze this review: ...",
});

const { data: entities } = await llm.generateArray({
  schema: EntitySchema,
  prompt: "Extract all named entities from: ...",
  temperature: 0.2,              // overrides defaultOptions.temperature
});

const stream = llm.generateStream({
  schema: ReportSchema,
  prompt: "Write a report on...",
});

// All helpers are also available on the client
const result = await llm.classify({ ... });
const data = await llm.extract({ ... });
const { results } = await llm.generateMultiSchema({ ... });
const batchResult = await llm.generateBatch({ ... });

High-level helpers

`classify`

Classify text into one of your categories. No schema boilerplate needed — pass an array of labels and get back a typed result.

import { classify } from "structured-llm";

const { label, confidence, reasoning } = await classify({
  client: openai,
  model: "gpt-4o-mini",
  prompt: "My payment was charged twice last week.",
  options: [
    { value: "billing", description: "Charge, refund, subscription issues" },
    { value: "auth", description: "Login, password, account access" },
    { value: "bug", description: "App not working as expected" },
    { value: "how-to", description: "Questions about how to use the product" },
  ],
  includeConfidence: true,   // 0–1 confidence score
  includeReasoning: true,    // one-sentence explanation
  allowMultiple: false,      // set true for multi-label classification
});

console.log(label);      // "billing"
console.log(confidence); // 0.97
console.log(reasoning);  // "User reports a duplicate charge, a billing issue."

With allowMultiple: true, the response has a labels array:

const { labels } = await classify({
  ...,
  allowMultiple: true,
  prompt: "URGENT: can't log in and my card was charged $500 I didn't authorize",
  options: ["billing", "auth", "urgent", "fraud"],
});
// labels: ["billing", "auth", "urgent", "fraud"]

`extract`

Extract specific fields from free-form text without writing a full Zod schema. Fields are optional by default — the LLM omits what it can't find.

import { extract } from "structured-llm";

const data = await extract({
  client: openai,
  model: "gpt-4o-mini",
  prompt: invoiceText,
  fields: {
    // shorthand — just a type string
    invoiceNumber: "string",
    totalAmount: "number",
    issueDate: "date",      // "string" | "number" | "boolean" | "date" | "email" | "phone" | "url" | "integer"

    // full FieldDef for more control
    vendorEmail: {
      type: "email",
      description: "Vendor's billing email address",
      required: true,        // validation error if missing
    },
    status: {
      type: "string",
      options: ["draft", "sent", "paid", "overdue"],  // enum
    },
  },
  requireAll: false,     // make all fields required at once
});

console.log(data.invoiceNumber); // "INV-2024-00842"
console.log(data.totalAmount);   // 10476
console.log(data.issueDate);     // "2024-03-05"

`createTemplate`

Bind a prompt template to a schema and config. Reuse it across your app with different variable substitutions.

import { createTemplate } from "structured-llm";

const analyzeDoc = createTemplate({
  template: "Analyze this {{docType}} from {{company}}:\n\n{{content}}",
  schema: AnalysisSchema,
  client: openai,
  model: "gpt-4o-mini",
  systemPrompt: "You are a business analyst.",
  temperature: 0,
});

// run with variable substitution
const { data } = await analyzeDoc.run({
  docType: "contract",
  company: "Acme Corp",
  content: contractText,
});

// run as array extraction
const { data: items } = await analyzeDoc.runArray({
  docType: "meeting notes",
  company: "TechCo",
  content: notesText,
});

// preview the rendered prompt (no API call)
const prompt = analyzeDoc.render({ docType: "invoice", company: "Acme", content: "..." });

Variables use {{double_braces}} syntax. An error is thrown if a variable is missing at runtime.

`withCache`

Wrap generate() with TTL-based memoization. Identical prompts + model + schema combinations skip the API and return the cached result.

import { withCache } from "structured-llm";

const cachedGenerate = withCache({
  ttl: 5 * 60 * 1000,  // 5 minute TTL (default)
  debug: true,          // log cache hits/misses
  // store: customStore  // optional custom cache backend (e.g. Redis)
  // keyFn: (opts) => myKey  // optional custom cache key function
});

const r1 = await cachedGenerate({ client, model, schema, prompt: "same question" });
const r2 = await cachedGenerate({ client, model, schema, prompt: "same question" });

console.log(r1.fromCache); // false — hit the API
console.log(r2.fromCache); // true — served from cache, no API call

// Use a shared store across multiple withCache instances
import { createCacheStore } from "structured-llm";
const store = createCacheStore();
const cachedA = withCache({ store, ttl: 60_000 });
const cachedB = withCache({ store, ttl: 60_000 });

The cache key includes model, prompt/messages, and schema JSON — different schemas for the same prompt are cached separately.

Providers

The library auto-detects the provider from your client instance. Just pass it in.

Native providers

| Provider | Install | Client class | |---|---|---| | OpenAI | npm i openai | new OpenAI() | | Anthropic | npm i @anthropic-ai/sdk | new Anthropic() | | Gemini | npm i @google/genai | new GoogleGenAI({ apiKey }) | | Mistral | npm i @mistralai/mistralai | new Mistral({ apiKey }) | | Cohere | npm i cohere-ai | new CohereClient({ token }) | | AWS Bedrock | npm i @aws-sdk/client-bedrock-runtime | new BedrockRuntimeClient({ region }) |

OpenAI-compatible providers

These all use the OpenAI SDK pointed at a different endpoint:

import OpenAI from "openai";

// Groq — fastest inference, great for real-time apps
const groq = new OpenAI({ apiKey: process.env.GROQ_API_KEY, baseURL: "https://api.groq.com/openai/v1" });
generate({ client: groq, model: "llama-3.3-70b-versatile", ... })

// xAI (Grok)
const xai = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1" });

// Together AI — large selection of open models
const together = new OpenAI({ apiKey: process.env.TOGETHER_API_KEY, baseURL: "https://api.together.xyz/v1" });
generate({ client: together, model: "meta-llama/Llama-3.3-70B-Instruct-Turbo", ... })

// Fireworks AI, Perplexity — same pattern

// Ollama — local models, completely free
const ollama = new OpenAI({ apiKey: "ollama", baseURL: "http://localhost:11434/v1" });
generate({ client: ollama, model: "llama3.2", mode: "json-mode" })

// Azure OpenAI
const azure = new OpenAI({
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  baseURL: "https://your-resource.openai.azure.com/openai/deployments/gpt-4o",
});

AWS Bedrock

import { BedrockRuntimeClient } from "@aws-sdk/client-bedrock-runtime";
import { generate } from "structured-llm";

const bedrock = new BedrockRuntimeClient({ region: "us-east-1" });

const { data } = await generate({
  client: bedrock,
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0",
  schema: MySchema,
  prompt: "...",
});

Or use the provider string to auto-initialize from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION):

generate({
  provider: "bedrock",
  model: "amazon.nova-pro-v1:0",
  schema: MySchema,
  prompt: "...",
})

Auto-initialize from environment

Skip client creation entirely — pass a provider string and the library reads from env:

generate({
  provider: "openai",     // OPENAI_API_KEY
  provider: "anthropic",  // ANTHROPIC_API_KEY
  provider: "gemini",     // GEMINI_API_KEY
  provider: "mistral",    // MISTRAL_API_KEY
  provider: "groq",       // GROQ_API_KEY
  provider: "xai",        // XAI_API_KEY
  provider: "together",   // TOGETHER_API_KEY
  provider: "fireworks",  // FIREWORKS_API_KEY
  provider: "perplexity", // PERPLEXITY_API_KEY
  provider: "cohere",     // COHERE_API_KEY
  provider: "bedrock",    // AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY + AWS_REGION
  provider: "ollama",     // no key needed
  model: "...",
  schema: ...,
  prompt: "...",
})

Extraction modes

The library automatically picks the best extraction mode based on the model's capabilities. You can also set it explicitly.

| Mode | How it works | Reliability | |---|---|---| | tool-calling | Schema becomes a tool definition. LLM is forced to "call" it, guaranteeing JSON. | Highest | | json-mode | Sets response_format: json_object. Schema embedded in system prompt. | High | | prompt-inject | Schema appended to user prompt. JSON extracted from response with fallback parsing. | Good |

Auto-selection logic:

Does the model support tool calling?
  YES → tool-calling  (GPT-4o, Claude 3+, Gemini 1.5+, Mistral Large, Groq)
  NO
    Does the model support JSON mode?
      YES → json-mode  (GPT-3.5, Gemini Flash, Perplexity, most modern models)
      NO  → prompt-inject  (works on any model, including Ollama local models)

Override when needed:

generate({ ..., mode: "json-mode" })
generate({ ..., mode: "prompt-inject" })

Retry logic

On invalid JSON or schema validation failure, the library retries automatically. Each retry includes the validation errors so the LLM can fix its own output.

Attempt 1:  LLM returns { "score": 1.8, "sentiment": "mixed" }
            → validation fails: score must be ≤ 1, sentiment must be "positive"|"negative"|"neutral"

Attempt 2:  "Your previous response had errors:
             - score: Number must be less than or equal to 1
             - sentiment: Invalid enum value
            Please fix and respond with corrected JSON."
            → LLM returns { "score": 0.8, "sentiment": "positive" }  ✓

Rate-limit errors (429, 502, 503, 529) are also retried automatically with exponential backoff — no configuration needed.

generate({
  ...,
  maxRetries: 3,          // default: 3 (set 0 to disable)
  retryOptions: {
    strategy: "exponential",   // "immediate" (default) | "linear" | "exponential"
    baseDelayMs: 500,          // base delay for linear/exponential strategies
  },
})

Fallback chain

Define a list of provider+model pairs to try in order. Falls back automatically if the primary provider fails (network error, rate limit, outage, etc.).

import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";

generate({
  // primary
  client: new OpenAI(),
  model: "gpt-4o",

  fallbackChain: [
    // first fallback — cheaper model, same provider
    { client: new OpenAI(), model: "gpt-4o-mini" },

    // second fallback — different provider
    { client: new Anthropic(), model: "claude-haiku-4-5-20251001" },

    // last resort — free local model
    { provider: "ollama", model: "llama3.2" },
  ],

  schema: ...,
  prompt: "...",
  hooks: {
    onError: ({ error }) => console.log("Primary failed, trying fallback:", error.message),
  },
})

Usage tracking

Pass trackUsage: true to get token counts and a cost estimate back with every call.

const { data, usage } = await generate({
  ...,
  trackUsage: true,
});

console.log(usage);
// {
//   promptTokens: 312,
//   completionTokens: 95,
//   totalTokens: 407,
//   estimatedCostUsd: 0.0000891,    // based on published pricing
//   latencyMs: 843,
//   attempts: 1,
//   model: "gpt-4o-mini",
//   provider: "openai",
// }

The cost estimate uses a built-in pricing table updated with each release. For unknown models, estimatedCostUsd is undefined.

Use the onSuccess hook to pipe usage data to your analytics or database:

createClient({
  ...,
  defaultOptions: {
    trackUsage: true,
    hooks: {
      onSuccess: ({ usage }) => {
        myAnalytics.record({
          model: usage?.model,
          tokens: usage?.totalTokens,
          cost: usage?.estimatedCostUsd,
          latency: usage?.latencyMs,
        });
      },
    },
  },
})

Hooks

Hooks run at each stage of the request lifecycle. Useful for logging, metrics, cost tracking, and debugging.

generate({
  ...,
  hooks: {
    // Fires before each LLM request (including retries)
    onRequest: ({ messages, model, provider, attempt }) => {
      logger.debug("LLM request", { model, provider, attempt, messageCount: messages.length });
    },

    // Fires when the LLM responds (before parsing/validation)
    onResponse: ({ rawResponse, attempt, model }) => {
      // useful for debugging what the LLM actually returned
    },

    // Fires on each partial update during generateStream() / generateArrayStream()
    onChunk: ({ partial, model }) => {
      broadcastToWebSocket(partial);
    },

    // Fires when a retry is about to happen
    onRetry: ({ attempt, maxRetries, error, model }) => {
      logger.warn(`Retrying (${attempt}/${maxRetries}): ${error}`);
    },

    // Fires when the final result passes validation
    onSuccess: ({ result, usage }) => {
      metrics.increment("llm.success", { model: usage?.model });
    },

    // Fires when all attempts fail
    onError: ({ error, allAttempts }) => {
      logger.error("LLM extraction failed", { error: error.message, attempts: allAttempts });
      alerting.send("LLM failure", error);
    },
  },
})

When using createClient, global hooks (set on the client) and per-call hooks both fire — you don't have to choose.

const llm = createClient({
  ...,
  defaultOptions: {
    hooks: { onSuccess: globalMetrics },   // always runs
  },
});

llm.generate({
  ...,
  hooks: { onSuccess: localLog },          // also runs, in addition to globalMetrics
});

Error handling

All errors extend StructuredLLMError so you can catch them broadly or specifically.

import {
  StructuredLLMError,   // base class
  ValidationError,      // schema validation failed after all retries
  ParseError,           // LLM returned non-JSON after all retries
  ProviderError,        // upstream API error (rate limit, auth, network)
  MaxRetriesError,      // exceeded maxRetries (shouldn't normally see this)
  SchemaError,          // invalid schema passed in
  MissingInputError,    // no prompt or messages provided
} from "structured-llm";

try {
  const { data } = await generate({ ... });
} catch (err) {
  if (err instanceof ValidationError) {
    // The LLM consistently returned data that didn't match your schema
    console.log(err.issues);        // array of validation error strings
    console.log(err.lastResponse);  // the raw JSON string the LLM returned
    console.log(err.attempts);      // how many times it tried (maxRetries + 1)
  }

  if (err instanceof ParseError) {
    // The LLM kept returning non-JSON (rare with tool-calling mode)
    console.log(err.lastResponse);
  }

  if (err instanceof ProviderError) {
    // The provider API returned an error
    console.log(err.provider);      // "openai"
    console.log(err.statusCode);    // 429 (rate limit), 401 (auth), etc.
    console.log(err.originalError); // the raw error from the SDK
  }

  if (err instanceof StructuredLLMError) {
    // catch-all for any structured-llm error
  }
}

Custom schemas

You don't have to use Zod. Any object with a jsonSchema and parse function works:

// Hand-rolled validator
const { data } = await generate({
  ...,
  schema: {
    jsonSchema: {
      type: "object",
      properties: {
        score: { type: "number", minimum: 0, maximum: 1 },
        label: { type: "string", enum: ["spam", "ham"] },
      },
      required: ["score", "label"],
    },
    parse: (input) => {
      const d = input as { score: number; label: string };
      if (d.score < 0 || d.score > 1) throw new Error("score out of range");
      if (!["spam", "ham"].includes(d.label)) throw new Error("invalid label");
      return d;
    },
  },
});

// With TypeBox
import { Type, type Static } from "@sinclair/typebox";
import { TypeCompiler } from "@sinclair/typebox/compiler";

const UserSchema = Type.Object({
  name: Type.String(),
  age: Type.Number({ minimum: 0 }),
  role: Type.Union([Type.Literal("admin"), Type.Literal("user")]),
});
type User = Static<typeof UserSchema>;
const compiled = TypeCompiler.Compile(UserSchema);

const { data } = await generate({
  client: openai,
  model: "gpt-4o-mini",
  schema: {
    jsonSchema: UserSchema,
    parse: (input: unknown): User => {
      const errors = [...compiled.Errors(input)];
      if (errors.length) throw new Error(errors.map((e) => e.message).join(", "));
      return input as User;
    },
  },
  prompt: "...",
});

Standard Schema (Valibot, ArkType)

Libraries that implement the Standard Schema v1 spec are auto-detected and work without any adapters. This includes Valibot, ArkType, Effect Schema, and Zod v4.

import * as v from "valibot";
import { generate } from "structured-llm";

const PersonSchema = v.object({
  name: v.string(),
  age: v.number(),
  email: v.pipe(v.string(), v.email()),
});

// Just pass it — no adapter needed
const { data } = await generate({
  client: openai,
  model: "gpt-4o-mini",
  schema: PersonSchema,
  prompt: "Extract: Alice Smith, 28, [email protected]",
});
// data.name, data.age, data.email are fully typed

import { type } from "arktype";

const UserType = type({ name: "string", score: "number" });

const { data } = await generate({
  client: openai,
  model: "gpt-4o-mini",
  schema: UserType,
  prompt: "...",
});

If you need to convert explicitly, use fromStandardSchema:

import { fromStandardSchema } from "structured-llm";
const schema = fromStandardSchema(valibotSchema);

Framework integrations

Next.js App Router

// app/api/analyze/route.ts — simple JSON endpoint
import { createStructuredRoute } from "structured-llm/next";
import { z } from "zod";

export const POST = createStructuredRoute({
  provider: "openai",
  model: "gpt-4o-mini",
  schema: z.object({
    category: z.enum(["bug", "feature", "question"]),
    priority: z.enum(["low", "medium", "high"]),
    summary: z.string(),
  }),
});
// Request:  POST /api/analyze  { "prompt": "App crashes on login" }
// Response: { "data": { "category": "bug", "priority": "high", "summary": "..." } }

// app/api/stream/route.ts — NDJSON streaming endpoint
import { createStreamingRoute } from "structured-llm/next";

export const POST = createStreamingRoute({
  provider: "openai",
  model: "gpt-4o",
  schema: ReportSchema,
});
// Streams: {"partial":{...},"isDone":false}\n{"partial":{...},"isDone":true,...}\n

// As a server action
import { withStructured } from "structured-llm/next";

export const classifyTicket = withStructured({
  provider: "openai",
  model: "gpt-4o-mini",
  schema: TicketSchema,
});

const result = await classifyTicket({ prompt: ticket.description });

Hono

import { Hono } from "hono";
import { structuredLLM, createStructuredHandler, createStreamingHandler } from "structured-llm/hono";

const app = new Hono();

// Middleware — attaches result to context, calls next()
app.post(
  "/extract",
  structuredLLM({
    provider: "openai",
    model: "gpt-4o-mini",
    schema: ContactSchema,
    promptFromBody: (body) => `Extract contact info from: ${body.text}`,
  }),
  (c) => c.json(c.get("structuredResult"))
);

// Route handler — responds directly
app.post("/analyze", createStructuredHandler({ provider: "openai", model: "gpt-4o-mini", schema: AnalysisSchema }));

// Streaming route handler
app.post("/stream", createStreamingHandler({ provider: "openai", model: "gpt-4o", schema: ReportSchema }));

Express

import express from "express";
import { structuredMiddleware, createStructuredHandler, createStreamingHandler } from "structured-llm/express";

const app = express();
app.use(express.json());

// Middleware — attaches to req.structured, calls next()
app.post(
  "/classify",
  structuredMiddleware({
    provider: "openai",
    model: "gpt-4o-mini",
    schema: IntentSchema,
    promptFromBody: (body) => body.message,
  }),
  (req, res) => res.json(req.structured)
);

// Route handler — responds directly
app.post("/analyze", createStructuredHandler({ provider: "openai", model: "gpt-4o-mini", schema: AnalysisSchema }));

// Streaming route handler (NDJSON)
app.post("/stream", createStreamingHandler({ provider: "openai", model: "gpt-4o", schema: ReportSchema }));

Model utilities

import { getModelCapabilities, listSupportedModels } from "structured-llm";

// check a specific model
const caps = getModelCapabilities("gpt-4o-mini");
// {
//   provider: "openai",
//   toolCalling: true,
//   jsonMode: true,
//   streaming: true,
//   contextWindow: 128000,
//   inputCostPer1M: 0.15,
//   outputCostPer1M: 0.6
// }

// list all supported models for a provider
listSupportedModels({ provider: "anthropic" });
// ["claude-opus-4-6", "claude-sonnet-4-6", "claude-3-7-sonnet-20250219", "claude-haiku-4-5-20251001", ...]

// list everything
listSupportedModels();
// all 40+ registered models

Newly added models in v0.2.0: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o4-mini, claude-3-7-sonnet-20250219, claude-haiku-4-5-20251001, gemini-2.5-pro, gemini-2.5-flash, llama-4-scout-17b-16e-instruct, llama-4-maverick-17b-128e-instruct

Examples

The examples/ directory has 40 full runnable examples:

git clone https://github.com/piyushgupta344/structured-llm
cd structured-llm && pnpm install

OPENAI_API_KEY=sk-... npx tsx examples/01-sentiment-analysis.ts

Core features:

| Example | What it demonstrates | |---|---| | 01-sentiment-analysis.ts | Batch sentiment scoring with confidence intervals | | 02-data-extraction.ts | Parse meeting notes into structured agenda / action items | | 03-multi-provider.ts | Run the same extraction across OpenAI, Anthropic, Gemini | | 04-fallback-chain.ts | Automatic fallback when primary provider is unavailable | | 05-streaming.ts | Real-time partial updates while generating a long report | | 06-generate-array.ts | Parse a bank statement into typed transaction objects | | 07-create-client.ts | Reusable client for an email triage pipeline | | 08-custom-schema.ts | Bring your own validator instead of Zod | | 09-fintech-analysis.ts | Parse earnings call transcripts and classify headlines | | 10-ollama-local.ts | Run everything locally with Ollama — zero API cost |

Document processing:

| Example | What it demonstrates | |---|---| | 11-resume-parsing.ts | Extract skills, experience, and education from a CV | | 12-invoice-extraction.ts | extract() helper — parse billing data from invoice text | | 15-legal-contract-analysis.ts | generateMultiSchema() — key terms + risk assessment from one document | | 18-medical-notes-extraction.ts | Extract vitals, symptoms, medications from clinical notes | | 28-academic-paper-analysis.ts | Metadata + contributions from research papers | | 39-multi-schema-document.ts | generateMultiSchema() — summary + quotes + actions from one document |

Classification & routing:

| Example | What it demonstrates | |---|---| | 13-content-moderation.ts | Multi-category content safety scoring | | 14-support-ticket-routing.ts | classify() — route tickets to the right team with confidence | | 34-multilingual-feedback.ts | generateBatch() — detect language, translate, and classify in bulk | | 38-bug-triage.ts | generateBatch() — severity, priority, and owner assignment |

Data pipelines:

| Example | What it demonstrates | |---|---| | 17-product-catalog-normalization.ts | generateBatch() — normalize messy product data | | 29-real-estate-listing.ts | generateArray() — parse multiple property listings at once | | 33-competitor-analysis.ts | generateBatch() — competitive intelligence at scale | | 37-caching-repeated-queries.ts | withCache() — avoid redundant API calls for identical inputs | | 40-market-research-template.ts | createTemplate() — run the same research framework across markets |

Contributing

Contributions are welcome. See CONTRIBUTING.md for how to get started, what kind of PRs are accepted, and how to add a new provider.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme