@ai2070/l0

v0.18.0

Published

5 days ago

L0: The Missing Reliability Substrate for AI

Downloads

1,842

L0 - Deterministic Streaming Execution Substrate (DSES) for AI

The missing reliability and observability layer for all AI streams.

LLMs produce high-value reasoning over a low-integrity transport layer. Streams stall, drop tokens, reorder events, violate timing guarantees, and expose no deterministic contract.
This breaks retries. It breaks supervision. It breaks reproducibility. It makes reliable AI systems impossible to build on top of raw provider streams.
L0 is the deterministic execution substrate that fixes the transport - with guardrails designed specifically for the streaming layer: stream-neutral, pattern-based, loop-safe, and timing-aware.
The result: production-grade, integrity-preserving, deterministic AI streams you can finally build real systems on.

It works with OpenAI, Vercel AI SDK, Mastra AI, and custom adapters. Supports multimodal streams, tool calls, and provides full deterministic replay.

npm install @ai2070/l0

Also available in Python: @ai-2070/l0-python uv add ai2070-l0 - native implementation with full lifecycle and event signature parity.

Production-grade reliability. Just pass your stream. L0'll take it from here.

L0 includes 3,000+ tests covering all major reliability features.

   Any AI Stream                    L0 Layer                         Your App
 ─────────────────    ┌──────────────────────────────────────┐    ─────────────
                      │                                      │
   Vercel AI SDK      │   Retry · Fallback · Resume          │      Reliable
   OpenAI / Mastra ──▶│   Guardrails · Timeouts · Consensus  │─────▶ Output
   Custom Streams     │   Full Observability                 │
                      │                                      │
                      └──────────────────────────────────────┘
 ─────────────────                                                ─────────────
  text / image /           L0 = Token-Level Reliability
  video / audio

Upcoming versions:

1.0.0 - API freeze

Features

| Feature | ------------------------------------------------ | 🔁 Smart Retries | 🌐 Network Protection | 🔀 Model Fallbacks | 💥 Zero-Token/Stall Protection | 📍 Last-Known-Good Token Resumption | 🧠 Drift Detection | 🧱 Structured Output | 🩹 JSON Auto-Healing + | 🛡️ Guardrails | ⚡ Race: Fastest-Model Wins | 🌿 Parallel: Fan-Out / Fan-In | 🔗 Pipe: Streaming Pipelines | 🧩 Consensus: Agreement | 📄 Document Windows | 🎨 Formatting Helpers | 📊 Monitoring | 🔔 Lifecycle Callbacks | 📡 Streaming-First Runtime | 📼 Atomic Event Logs | 🔄 Byte-for-Byte Replays | ⛔ Safety-First Defaults | ⚡ Tiny & Explicit | 🔌 Custom Adapters (BYOA) | 🖼️ Multimodal Support | 🚀 Nvidia Blackwell-Ready | 🧪 Battle-Tested | Description | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Model-aware retries with fixed-jitter backoff. Automatic retries for zero-token output, network stalls, SSE disconnects, and provider overloads. | | Automatic recovery from dropped streams, slow responses, backgrounding, 429/503 load shedding, DNS errors, and partial chunks. | | Automatically fallback to secondary models (e.g., 4o → 4o-mini → Claude/Gemini) with full retry logic. | | Detects when model produces nothing or stalls mid-stream. Automatically retries or switches to fallbacks. | | When a stream interrupts, L0 resumes generation from the last structurally valid token (Opt-in). | | Detects tone shifts, duplicated sentences, entropy spikes, markdown collapse, and meta-AI patterns before corruption. | | Guaranteed-valid JSON with Zod (v3/v4), Effect Schema, or JSON Schema. Auto-corrects missing braces, commas, and markdown fences. | Markdown Fence Repair | Automatic correction of truncated or malformed JSON (missing braces, brackets, quotes), and repair of broken Markdown code fences. Ensures clean extraction of structured data from noisy LLM output. | | JSON, Markdown, LaTeX, and pattern validation with fast/slow path execution. Delta-only checks run sync; full-content scans defer to async to never block streaming. | | Run multiple models or providers in parallel and return the fastest valid stream. Ideal for ultra-low-latency chat and high-availability systems. | | Start multiple streams simultaneously and collect structured or summarized results. Perfect for agent-style multi-model workflows. | | Compose multiple streaming steps (e.g., summarize → refine → translate) with safe state passing and guardrails between each stage. | Across Models | Combine multiple model outputs using unanimous, weighted, or best-match consensus. Guarantees high-confidence generation for safety-critical tasks. | | Built-in chunking (token, paragraph, sentence, character). Ideal for long documents, transcripts, or multi-page processing. | | Extract JSON/code from markdown fences, strip thinking tags, normalize whitespace, and clean LLM output for downstream processing. | | Built-in integrations with OpenTelemetry and Sentry for metrics, tracing, and error tracking. | | onStart, onComplete, onError, onEvent, onViolation, onRetry, onFallback, onToolCall - full observability into every stream phase. | | Thin, deterministic wrapper over streamText() with unified event types (token, error, complete) for easy UIs. | | Record every token, retry, fallback, and guardrail check as immutable events. Full audit trail for debugging and compliance. | | Deterministically replay any recorded stream to reproduce exact output. Perfect for testing, and time-travel debugging. | | Continuation off by default. Structured objects never resumed. No silent corruption. Integrity always preserved. | | 21KB gzipped core. Tree-shakeable with subpath exports (/core, /structured, /consensus, /parallel, /window). No frameworks, no heavy abstractions. | | Bring your own adapter for any LLM provider. Built-in adapters for Vercel AI SDK, OpenAI, and Mastra. | | Build adapters for image/audio/video generation (FLUX.2, Stable Diffusion, Veo 3, CSM). Progress tracking, data events, and state management for non-text outputs. | | Optimized for 1000+ tokens/s streaming. Ready for next-gen GPU inference speeds. | | 3,000+ unit tests and 250+ integration tests validating real streaming, retries, and advanced behavior. |

Know what you're doing? Skip the tutorial

Quick Start

With Vercel AI SDK: Minimal Usage

import { l0, recommendedGuardrails, recommendedRetry } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await l0({
  // Primary model stream
  stream: () =>
    streamText({
      model: openai("gpt-5-mini"),
      prompt,
    }),
});

// Read the stream
for await (const event of result.stream) {

Vercel AI SDK: With Reliability

import { l0, recommendedGuardrails, recommendedRetry } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await l0({
  stream: () => streamText({ model: openai("gpt-4o"), prompt }),
  fallbackStreams: [() => streamText({ model: openai("gpt-4o-mini"), prompt })],

  // Optional: Content-agnostic, text-based guardrails
  guardrails: recommendedGuardrails,

  // Optional: Retry configuration, default as follows
  retry: {
    attempts: 3, // LLM errors only
    maxRetries: 6, // Total (LLM + network)
    baseDelay: 1000,
    maxDelay: 10000,
    backoff: "fixed-jitter", // "exponential" | "linear" | "fixed" | "full-jitter"
  },
  // Or use presets:
  // minimalRetry       // { attempts: 2, maxRetries: 4, backoff: "linear" }
  // recommendedRetry   // { attempts: 3, maxRetries: 6, backoff: "fixed-jitter" }
  // strictRetry        // { attempts: 3, maxRetries: 6, backoff: "full-jitter" }
  // exponentialRetry   // { attempts: 4, maxRetries: 8, backoff: "exponential" }

  // Optional: Timeout configuration, default as follows
  timeout: {
    initialToken: 5000, // 5s to first token
    interToken: 10000, // 10s between tokens
  },

  onError: (error, willRetry) => console.log(`Error: ${error.message}`),
});

for await (const event of result.stream) {
  if (event.type === "token") process.stdout.write(event.value);
}

See Also: API.md for all options, ADVANCED.md for full examples

With OpenAI SDK

import OpenAI from "openai";
import { l0, openaiStream, recommendedGuardrails } from "@ai2070/l0";

const openai = new OpenAI();

const result = await l0({
  stream: openaiStream(openai, {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Generate a haiku about coding" }],
  }),
  guardrails: recommendedGuardrails,
});

for await (const event of result.stream) {
  if (event.type === "token") process.stdout.write(event.value);
}

With Mastra AI

import { Agent } from "@mastra/core/agent";
import { l0, mastraStream, recommendedGuardrails } from "@ai2070/l0";

const agent = new Agent({
  name: "haiku-writer",
  instructions: "You are a poet who writes haikus",
  model: "openai/gpt-4o",
});

const result = await l0({
  stream: mastraStream(agent, "Generate a haiku about coding"),
  guardrails: recommendedGuardrails,
});

for await (const event of result.stream) {
  if (event.type === "token") process.stdout.write(event.value);
}

Structured Output with Zod

import { structured } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const schema = z.object({
  name: z.string(),
  age: z.number(),
  occupation: z.string(),
});

const result = await structured({
  schema,
  stream: () =>
    streamText({
      model: openai("gpt-4o-mini"),
      prompt:
        "Generate a fictional person as JSON with name, age, and occupation",
    }),
  autoCorrect: true, // Fix trailing commas, missing braces, markdown fences
});

console.log(result.data); // { name: "Alice", age: 32, occupation: "Engineer" }

Lifecycle Events

import { l0 } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await l0({
  stream: () => streamText({ model: openai("gpt-4o"), prompt }),

  onEvent: (event) => {
    if (event.type === "token") process.stdout.write(event.value || "");
    if (event.type === "error") console.error("Error:", event.error);
    if (event.type === "complete") console.log("\nDone!");
  },
});

for await (const _ of result.stream) {
  // Events already handled by onEvent
}

Fallback Models & Providers

import { l0 } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";
import { anthropic } from "@ai-sdk/anthropic";

const result = await l0({
  // Primary model
  stream: () => streamText({ model: openai("gpt-4o"), prompt }),

  // Fallbacks: tried in order if primary fails (supports both model and provider fallbacks)
  fallbackStreams: [
    () => streamText({ model: openai("gpt-4o-mini"), prompt }),
    () => streamText({ model: anthropic("claude-sonnet-4-20250514"), prompt }),
  ],

  onFallback: (index, reason) => console.log(`Switched to fallback ${index}`),
});

Parallel Execution

import { parallel } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

const prompts = ["Name a fruit", "Name a color", "Name an animal"];

const results = await parallel(
  prompts.map((prompt) => ({
    stream: () => streamText({ model: openai("gpt-4o-mini"), prompt }),
  })),
  { concurrency: 3 },
);

results.results.forEach((r, i) => {
  console.log(`${prompts[i]}: ${r?.state.content.trim()}`);
});

Pipe: Streaming Pipelines

import { pipe } from "@ai2070/l0";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await pipe(
  [
    {
      name: "summarize",
      fn: (input) => ({
        stream: () =>
          streamText({
            model: openai("gpt-4o"),
            prompt: `Summarize: ${input}`,
          }),
      }),
    },
    {
      name: "translate",
      fn: (summary) => ({
        stream: () =>
          streamText({
            model: openai("gpt-4o"),
            prompt: `Translate to French: ${summary}`,
          }),
      }),
    },
  ],
  longDocument,
);

console.log(result.output); // French translation of summary

Philosophy

No magic - Everything is explicit and predictable
Streaming-first - Built for real-time token delivery
Signals, not rewrites - Guardrails detect issues, don't modify output
Model-agnostic - Works with any model
Zero dependencies - Only (optional) peer dependency is the Vercel AI SDK, the OpenAI SDK, or Mastra AI

Bundle sizes (minified):

| Import | Size | Gzipped | Description | | ----------------------- | ----- | ------- | ------------------------ | | @ai2070/l0 (full) | 191KB | 56KB | Everything | | @ai2070/l0/core | 71KB | 21KB | Runtime + retry + errors | | @ai2070/l0/structured | 61KB | 18KB | Structured output | | @ai2070/l0/consensus | 72KB | 21KB | Multi-model consensus | | @ai2070/l0/parallel | 58KB | 17KB | Parallel/race operations | | @ai2070/l0/window | 62KB | 18KB | Document chunking | | @ai2070/l0/guardrails | 18KB | 6KB | Validation rules | | @ai2070/l0/monitoring | 27KB | 7KB | OTel/Sentry | | @ai2070/l0/drift | 4KB | 2KB | Drift detection | | @ai2070/l0/zod | 12KB | 4KB | Zod 4 validation schemas |

Dependency-free. Tree-shakeable subpath exports for minimal bundles.

Most applications should simply use import { l0 } from "@ai2070/l0". Only optimize imports if you're targeting edge runtimes or strict bundle constraints.

Zod Validation Schemas

L0 exports Zod 4 schemas for runtime validation of all L0 types:

import { L0StateSchema, L0EventSchema, GuardrailViolationSchema } from "@ai2070/l0/zod";

// Validate runtime data
const state = L0StateSchema.parse(unknownData);

// Type-safe validation
const result = L0EventSchema.safeParse(event);
if (result.success) {
  console.log(result.data.type);
}

Schemas are available for all core types: L0State, L0Event, L0Telemetry, RetryOptions, GuardrailViolation, ConsensusResult, PipelineResult, and more.

Documentation

| Guide | Description | | -------------------------------------------------------------- | ----------------------------- | | ADVANCED.md | Advanced usage | | QUICKSTART.md | 5-minute getting started | | API.md | Complete API reference | | GUARDRAILS.md | Guardrails and validation | | STRUCTURED_OUTPUT.md | Structured output guide | | CONSENSUS.md | Multi-generation consensus | | DOCUMENT_WINDOWS.md | Document chunking guide | | NETWORK_ERRORS.md | Network error handling | | ERROR_HANDLING.md | Error handling guide | | PERFORMANCE.md | Performance tuning | | INTERCEPTORS_AND_PARALLEL.md | Interceptors and parallel ops | | MONITORING.md | Telemetry and metrics | | EVENT_SOURCING.md | Record/replay, audit trails | | FORMATTING.md | Formatting helpers | | CUSTOM_ADAPTERS.md | Build your own adapters | | MULTIMODAL.md | Image/audio/video support |

Support

L0 is developed and maintained independently. If your company depends on L0 or wants to support ongoing development (including the Python version, website docs, and future tooling), feel free to reach out:

[email protected]

License

Apache-2.0