npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

extrait

v0.7.2

Published

High-level LLM text generation and structured JSON extraction with validation, repair, and streaming.

Readme

extrait

High-level LLM text generation and structured JSON extraction with validation, repair, and streaming.

Features

  • Multi-candidate JSON extraction from LLM responses
  • Automatic repair with jsonrepair
  • Zod schema validation and coercion
  • Optional self-healing for validation failures
  • Streaming support
  • MCP tools
  • Vector embeddings (OpenAI-compatible + Voyage AI)

Installation

Install extrait with your preferred package manager.

bun add extrait
# or
npm install extrait
# or
deno add npm:extrait

Quick Start

Use a custom OpenAI-compatible transport to point extrait at a local endpoint.

import { createLLM, prompt, s } from "extrait";
import { z } from "zod";

const llm = createLLM({
  provider: "openai-compatible",
  model: "mistralai/ministral-3-3b",
  transport: {
    baseURL: "http://localhost:1234/v1",
    apiKey: process.env.LLM_API_KEY ?? "local-demo-key",
  },
});

const RecipeSchema = s.schema(
  "Recipe",
  z.object({
    title: s.string().min(1).describe("Short recipe title"),
    ingredients: s.array(s.string()).min(1).describe("Ingredient list"),
  })
);

const result = await llm.structured(
  RecipeSchema,
  prompt`Extract a simple recipe from this text: """${text}"""`
);

console.log(result.data);

Examples at a Glance

These examples cover the most common usage patterns in the repository.

bun run dev simple "Bun.js runtime"
bun run dev generate "Bun.js runtime"
bun run dev streaming
bun run dev calculator-tool

API Reference

The sections below cover the main building blocks of the library.

Create an LLM Client

Use createLLM() to configure the provider, model, transport, and client defaults.

const llm = createLLM({
  provider: "openai-compatible" | "anthropic-compatible",
  model: "gpt-5-nano",
  baseURL: "https://api.openai.com",       // optional alias for transport.baseURL
  apiKey: process.env.LLM_API_KEY,         // optional alias for transport.apiKey
  transport: {
    baseURL: "https://api.openai.com",     // optional
    apiKey: process.env.LLM_API_KEY,       // optional
    path: "/v1/chat/completions",          // optional; anthropic-compatible usually uses /v1/messages
    headers: { "x-trace-id": "docs-demo" }, // optional extra headers
    defaultBody: { user: "docs-demo" },    // optional provider body defaults
    version: "2023-06-01",                 // anthropic-compatible only
    fetcher: fetch,                        // optional custom fetch implementation
  },
  defaults: {
    mode: "loose" | "strict",             // loose allows repair
    selfHeal: 1,                          // optional retry attempts
    debug: false,                         // optional structured debug output
    // or:
    // debug: { enabled: true, verbose: true },
    systemPrompt: "You are a helpful assistant.",
    timeout: {
      request: 30_000,
      tool: 10_000,
    },
  },
});

baseURL and apiKey at the top level are shorthand aliases for transport.baseURL and transport.apiKey. For request-specific options such as stream, request, schemaInstruction, and parse tuning, see the sections below.

Common setup patterns:

// OpenAI-compatible gateway or local endpoint with top-level aliases
const llm = createLLM({
  provider: "openai-compatible",
  model: "gpt-4o-mini",
  baseURL: process.env.LLM_BASE_URL ?? "http://localhost:1234/v1",
  apiKey: process.env.LLM_API_KEY ?? "local-demo-key",
});

// Anthropic-compatible endpoint with explicit API version
const anthropic = createLLM({
  provider: "anthropic-compatible",
  model: "claude-3-5-sonnet-latest",
  transport: {
    baseURL: "https://api.anthropic.com",
    apiKey: process.env.LLM_API_KEY,
    version: "2023-06-01",
  },
});

Defining Schemas

Use the s wrapper around Zod for schema names, descriptions, and a more ergonomic authoring flow.

import { s } from "extrait";
import { z } from "zod";

const Schema = s.schema(
  "SchemaName",
  z.object({
    // String fields
    text: s.string().min(1).describe("Field description"),
    optional: s.string().optional(),
    withDefault: s.string().default("value"),

    // Numbers
    count: s.number().int().min(0).max(100),
    score: s.number().min(0).max(1),

    // Arrays
    items: s.array(s.string()).min(1).max(10),

    // Nested objects
    nested: z.object({
      field: s.string(),
    }),

    // Enums (use native Zod)
    category: z.enum(["a", "b", "c"]),

    // Booleans
    flag: s.boolean(),
  })
);

Making Structured Calls

structured() accepts a schema plus either a tagged prompt, a fluent prompt builder, or a raw message payload.

// Simple prompt
const result = await llm.structured(
  Schema,
  prompt`Your prompt with ${variables}`
);

// Multi-part prompt
const result = await llm.structured(
  Schema,
  prompt()
    .system`You are an expert assistant.`
    .user`Analyze: """${input}"""`
);

// Multi-turn conversation
const conversationResult = await llm.structured(
  Schema,
  prompt()
    .system`You are an expert assistant.`
    .user`Hello`
    .assistant`Hi, how can I help?`
    .user`Analyze: """${input}"""`
);

// With options
const result = await llm.structured(
  Schema,
  prompt`Your prompt`,
  {
    mode: "loose",
    selfHeal: 1,
    debug: true,
    systemPrompt: "You are a helpful assistant.",
    stream: {
      to: "stdout",
      onData: (event) => {
        if (event.delta.text) {
          console.log("New visible text:", event.delta.text);
        }
        if (event.delta.reasoning) {
          console.log("New reasoning text:", event.delta.reasoning);
        }

        console.log("Current visible text:", event.snapshot.text);
        console.log("Current reasoning:", event.snapshot.reasoning);
        console.log("Current structured snapshot:", event.snapshot.data);

        if (event.done) {
          console.log("Streaming done.");
        }
      },
    },
    request: {
      signal: AbortSignal.timeout(30_000),  // optional AbortSignal
      reasoningEffort: "medium",            // optional reasoning effort hint
    },
    timeout: {
      request: 30_000,  // ms per LLM HTTP request
      tool: 10_000,     // ms per MCP tool call
    },
  }
);

prompt() builds an ordered messages payload. Use prompt`...` for a single string prompt, or the fluent builder for multi-turn conversations. The LLMMessage type is exported if you need to type your own message arrays.

In stream.onData, the event is split into two layers:

  • event.delta.text is only the newly received visible text since the previous event.
  • event.delta.reasoning is only the newly received reasoning text since the previous event.
  • event.snapshot.text is the full visible text accumulated so far.
  • event.snapshot.reasoning is the full normalized reasoning accumulated so far.
  • event.snapshot.data is the best structured JSON snapshot that can be parsed from the stream so far. It may stay unchanged while event.delta.text continues to grow.

Typical usage is:

  • render event.delta.text directly to a terminal or chat UI
  • optionally render event.delta.reasoning in a separate reasoning panel
  • use event.snapshot.data to drive partial structured UI state
  • use event.snapshot.text / event.snapshot.reasoning when you need the full accumulated state instead of only the latest increment

You can also pass provider request options through request:

const result = await llm.structured(
  Schema,
  prompt`Summarize this document: """${text}"""`,
  {
    request: {
      temperature: 0,
      maxTokens: 800,
      body: { user: "demo-user" },
    },
  }
);

Making Text Calls

generate() is the high-level API for non-structured generation. It accepts the same prompt shapes as structured(), but does not inject any schema or parse the output.

// Simple prompt
const result = await llm.generate(
  prompt`Write a short summary of ${topic}.`
);

// Multi-message prompt
const result = await llm.generate(
  prompt()
    .system`You are a concise assistant.`
    .user`Summarize: """${text}"""`
);

// Raw messages payload
const result = await llm.generate({
  prompt: {
    messages: [
      { role: "user", content: "Say hello in one sentence." },
    ],
  },
});

Streaming mirrors structured(), except the snapshot only contains text and reasoning:

const result = await llm.generate(
  prompt`Explain ${topic} in one short paragraph.`,
  {
    stream: {
      enabled: true,
      onData: (event) => {
        process.stdout.write(event.delta.text);

        console.log("Full text so far:", event.snapshot.text);
        console.log("Full reasoning so far:", event.snapshot.reasoning);

        if (event.done) {
          console.log("Streaming done.");
        }
      },
    },
  }
);

Provider request options and MCP tools still go through request:

const result = await llm.generate(
  prompt`Use tools if needed and answer the user clearly.`,
  {
    request: {
      temperature: 0,
      maxTokens: 800,
      reasoningEffort: "medium",
      mcpClients: [calculatorMCP],
      maxToolRounds: 10,
    },
  }
);

On openai-compatible, this is sent as reasoning_effort, with max mapped to xhigh. On anthropic-compatible, this is sent as output_config.effort and auto-enables thinking: { type: "adaptive" }.

For existing history or multi-turn conversations, pass messages directly:

const messages = conversation("You are a helpful assistant.", [
  { role: "user", text: "What is the speed of light?" },
  { role: "assistant", text: "Approximately 299,792 km/s in a vacuum." },
  { role: "user", text: "How long does light take to reach Earth from the Sun?" },
]);

const result = await llm.generate({ prompt: { messages } });

Use llm.adapter.complete(...) or llm.adapter.stream(...) only when you need the raw low-level provider interface.

Images (multimodal)

Use images() to build base64 image content blocks for vision-capable models.

import { images, prompt } from "extrait";
import { readFileSync } from "fs";

const base64 = readFileSync("photo.png").toString("base64");
const img = { base64, mimeType: "image/png" };

// With prompt() builder — pass LLMMessageContent array to .user() or .assistant()
const result = await llm.structured(Schema,
  prompt()
    .system`You are a vision assistant.`
    .user([{ type: "text", text: "Describe this image." }, ...images(img)])
);

// With raw messages array
const result = await llm.structured(Schema, {
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Describe this image." },
        ...images(img),
      ],
    },
  ],
});

// Multiple images
const content = [
  { type: "text", text: "Compare these two images." },
  ...images([
    { base64: base64A, mimeType: "image/png" },
    { base64: base64B, mimeType: "image/jpeg" },
  ]),
];

images() accepts a single { base64, mimeType } object or an array, and always returns an LLMImageContent[] that spreads directly into a content array.

Conversations (multi-turn history)

Use conversation() to build a LLMMessage[] from an existing conversation history. This is the idiomatic way to pass prior turns to the LLM.

import { conversation } from "extrait";

const messages = conversation("You are a helpful assistant.", [
  { role: "user",      text: "What is the speed of light?" },
  { role: "assistant", text: "Approximately 299,792 km/s in a vacuum." },
  { role: "user",      text: "How long does light take to reach Earth from the Sun?" },
]);

// High-level text generation
const response = await llm.generate({ prompt: { messages } });

// Or to structured extraction
const result = await llm.structured(Schema, { messages });

Entries with images produce multimodal content automatically:

const messages = conversation("You are a vision assistant.", [
  {
    role: "user",
    text: "What is in this image?",
    images: [{ base64, mimeType: "image/png" }],
  },
]);

Result Object

Successful generate() calls return normalized text/reasoning plus request metadata:

{
  text: string,
  reasoning: string,
  attempts: GenerateAttempt[],
  usage?: {
    inputTokens?: number,
    outputTokens?: number,
    totalTokens?: number,
    cost?: number,
  },
  finishReason?: string,
}

Each attempts entry includes:

{
  attempt: number,
  via: "complete" | "stream",
  text: string,
  reasoning: string,
  usage?: LLMUsage,
  finishReason?: string,
}

Successful structured() calls return validated data plus normalized text/reasoning and trace metadata.

{
  data: T,                      // Validated data matching schema
  text: string,                 // Visible model text, without inline <think> blocks
  reasoning: string,            // Normalized reasoning across dedicated fields and inline <think>
  json: unknown | null,         // Parsed JSON before validation
  attempts: StructuredAttempt<T>[], // One entry per parse / self-heal attempt
  usage?: {
    inputTokens?: number,
    outputTokens?: number,
    totalTokens?: number,
    cost?: number,
  },
  finishReason?: string,        // e.g., "stop"
}

Each attempts entry includes:

{
  attempt: number,
  selfHeal: boolean,
  via: "complete" | "stream",
  text: string,
  reasoning: string,
  json: unknown | null,
  candidates: string[],
  repairLog: string[],
  zodIssues: z.ZodIssue[],
  success: boolean,
  usage?: LLMUsage,
  finishReason?: string,
  parsed: ParseLLMOutputResult<T>,
}

Legacy inline <think>...</think> blocks are still supported, but the high-level structured() API now folds them into reasoning internally instead of exposing block metadata.

Error Handling

Catch StructuredParseError when repair and validation still fail.

import { StructuredParseError } from "extrait";

try {
  const result = await llm.structured(Schema, prompt`...`);
} catch (error) {
  if (error instanceof StructuredParseError) {
    console.error("Validation failed");
    console.error("Attempt:", error.attempt);
    console.error("Zod issues:", error.zodIssues);
    console.error("Repair log:", error.repairLog);
    console.error("Candidates:", error.candidates);
  }
}

Embeddings

Generate vector embeddings using llm.embed(). It always returns number[][] — one vector per input string.

// Create a dedicated embedder client (recommended)
const embedder = createLLM({
  provider: "openai-compatible",
  model: "text-embedding-3-small",
  transport: { apiKey: process.env.LLM_API_KEY },
});

// Single string
const { embeddings, model, usage } = await embedder.embed("Hello world");
const vector: number[] = embeddings[0];

// Multiple strings in one request
const { embeddings } = await embedder.embed(["text one", "text two", "text three"]);
// embeddings[0], embeddings[1], embeddings[2] — one vector each

// Optional: override model or request extra options per call
const { embeddings } = await embedder.embed("Hello", {
  model: "text-embedding-ada-002",
  dimensions: 512,              // supported by text-embedding-3-* models
  body: { user: "user-id" },    // pass-through to provider
});

Result shape:

{
  embeddings: number[][];  // one vector per input
  model: string;
  usage?: { inputTokens?: number; totalTokens?: number };
  raw?: unknown;           // full provider response
}

Anthropic / Voyage AI

Anthropic does not provide a native embedding API. Their recommended solution is Voyage AI, which uses the same OpenAI-compatible format:

const embedder = createLLM({
  provider: "openai-compatible",
  model: "voyage-3",
  transport: {
    baseURL: "https://api.voyageai.com",
    apiKey: process.env.LLM_API_KEY,
  },
});

const { embeddings } = await embedder.embed(["query", "document"]);

Calling llm.embed() on an anthropic-compatible adapter throws a descriptive error pointing to Voyage AI.

MCP Tools

Attach MCP clients at request time to let the model call tools during structured generation.

import { createMCPClient } from "extrait";

const mcpClient = await createMCPClient({
  id: "calculator",
  transport: {
    type: "stdio",
    command: "bun",
    args: ["run", "examples/calculator-mcp-server.ts"],
  },
});

const result = await llm.structured(
  Schema,
  prompt`Calculate 14 + 8`,
  {
    request: {
      mcpClients: [mcpClient],
      maxToolRounds: 5,
      toolDebug: {
        enabled: true,
        includeRequest: true,
        includeResult: true,
      },
      onToolExecution: (execution) => {
        console.log(execution.name, execution.durationMs);
      },
      // Optional: transform tool output before it is sent back to the LLM
      transformToolOutput: (output, execution) => {
        return { ...output, source: execution.name };
      },
      // Optional: transform tool arguments before the tool is called
      transformToolArguments: (args, call) => args,
      // Optional: transform the full MCP call payload, including _meta
      transformToolCallParams: (params, call) => ({
        ...params,
        _meta: {
          source: "extrait-docs",
          clientId: call.clientId,
        },
      }),
      // Optional: custom error message when an unknown tool is called
      unknownToolError: (toolName) => `Tool "${toolName}" is not available.`,
    },
  }
);

await mcpClient.close?.();

transformToolArguments() only receives the tool input object. transformToolCallParams() runs after it and receives the full MCPCallToolParams payload that will be sent to the MCP client:

type MCPCallToolParams = {
  name: string;
  arguments?: Record<string, unknown>;
  _meta?: Record<string, unknown>;
};

MCP toolsets may change between tool rounds. Providers call listTools() before each round, so a client can expose additional tools after a previous callTool() result, or remove tools that are no longer available.

Use transformToolCallParams() when you need to attach MCP-specific metadata, override the final remote tool name, or otherwise change the full request passed to client.callTool(). This hook is exported as LLMToolCallParamsTransformer.

Timeouts

Use timeout to set per-request and per-tool-call time limits without managing AbortSignal manually.

const result = await llm.structured(Schema, prompt`...`, {
  timeout: {
    request: 30_000,  // abort the LLM HTTP request after 30s
    tool: 5_000,      // abort each MCP tool call after 5s
  },
});

Both fields are optional. timeout.request creates an AbortSignal.timeout internally; it is ignored if you also pass request.signal (your signal takes precedence). timeout.tool wraps each MCP client transparently.

You can also set defaults on the client:

const llm = createLLM({
  provider: "openai-compatible",
  model: "gpt-5-nano",
  transport: { apiKey: process.env.LLM_API_KEY },
  defaults: {
    timeout: { request: 60_000 },
  },
});

Examples

Run repository examples with bun run dev <example-name>.

Available examples:

Pass arguments after the example name:

bun run dev generate "Why Bun is fast"
bun run dev streaming
bun run dev streaming-with-tools
bun run dev abort-signal 120 "JSON cancellation demo"
bun run dev timeout 5000
bun run dev simple "Bun.js runtime"
bun run dev sentiment-analysis "I love this product."
bun run dev multi-step-reasoning "Why is the sky blue?"
bun run dev embeddings "the cat sat on the mat" "a feline rested on the rug"

Environment Variables

These environment variables are used across the examples and common client setups.

  • LLM_PROVIDER - openai-compatible or anthropic-compatible
  • LLM_BASE_URL - API endpoint (optional)
  • LLM_MODEL - Model name (default: gpt-5-nano)
  • LLM_API_KEY - API key for the provider
  • STRUCTURED_DEBUG=1 - Enable debug output By default, structured debug prints text (public visible output) and reasoning (normalized reasoning). parseSource (the internal source used by parsing and self-heal) is only printed when debug.verbose is enabled.

Testing

Run the test suite with Bun.

bun run test