@langextract-ts/langextract

v0.1.0

Published

13 days ago

TypeScript information extraction library built on AI SDK v6, inspired by Google langextract.

0High
0Medium
0Low

tristanmanchester

langextract information-extraction llm ai-sdk nlp gemini openai typescript

Purpose

@langextract-ts/langextract is the TypeScript-native public package for language extraction workflows. This package defines stable entrypoints and contracts while implementation details stay behind internal module boundaries.

Attribution

This package is an independent TypeScript port of the original Google langextract project:

It is not an official Google package.

Public API

Current public API surface:

extract: extraction orchestration and high-level pipeline entrypoints.
io: document/file input and output contracts.
progress: terminal-friendly progress formatting and descriptors.
visualization: result rendering and visual summary contracts.
providers: model/provider wiring contracts for AI SDK v6 integrations.
types: shared public type exports for callers and adapters.
errors: exported error classes and error code contracts.

Public import example:

import { extract, createProviderRegistry } from "@langextract-ts/langextract";

Subpath entrypoint examples:

import { extract } from "@langextract-ts/langextract/extract";
import { resolveModel } from "@langextract-ts/langextract/providers";

Legacy compatibility aliases (for migration ergonomics):

@langextract-ts/langextract/extraction -> extract
@langextract-ts/langextract/factory -> providers
@langextract-ts/langextract/exceptions -> errors

Import Rules

Import through package entrypoints only.
Do not import from src/internal/*.
Do not use relative imports across package boundaries.

Provider Routing Policy

Routing is registry-first: provider/model resolution flows through the provider registry before model creation.
Default public model route is google/gemini-3-flash.
Alias lifecycle policy for this route is documented as Active -> Deprecated -> Sunset -> Removed in migration docs.
Sunset aliases are blocked by default; set LANGEXTRACT_ALLOW_SUNSET_ALIASES=1 only for temporary migration overrides.

Warning Codes

extract(...) supports onWarning and emits stable warning codes:

alias_lifecycle: model alias is deprecated/sunset (routing still resolved).
batch_length_below_max_workers: maxWorkers exceeds batchLength.
missing_examples: examples were omitted for extraction calls.
prompt_alignment_failed: prompt validation found extraction text that cannot align to source text.
prompt_alignment_non_exact: prompt validation found non-exact alignment.
schema_fences_incompatible: schema constraints are enabled while resolver fences are enabled for raw-output schema providers.
schema_wrapper_incompatible: schema constraints are enabled while resolver wrapper key mode is enabled for raw-output schema providers.
schema_constraints_ignored_with_explicit_model: caller provided model directly while enabling useSchemaConstraints.
provider_environment: provider environment policy produced warnings (for example conflicting API key env vars).

Validation Controls

extract(...) exposes two independent validation controls:

promptValidationLevel (off | warn | error): controls example alignment preflight validation (failed / non-exact extraction alignment checks).
promptLintLevel (off | warn | error, default off): controls prompt-string lint checks (empty prompt, unresolved template variables, missing JSON instruction when applicable).

Compatibility alias:

prompt_lint_level is supported for snake_case callers.

Example:

import { extract } from "@langextract-ts/langextract";

await extract({
  textOrDocuments: "Invoice total is 124.50 EUR",
  promptDescription: "Extract the currency amount",
  examples: [
    {
      text: "Subtotal is 10.00 USD",
      extractions: [{ extractionClass: "amount", extractionText: "10.00 USD" }],
    },
  ],
  promptValidationLevel: "warn", // alignment preflight
  promptLintLevel: "off", // prompt lint (default is off)
});

Status

Migration hardening is active. Public contracts above are versioned, covered by parity-focused tests, and protected by architecture/release governance checks.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme