npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

effect-langextract

v0.2.0

Published

Effect-native LLM-powered structured extraction with source-grounded character spans

Downloads

103

Readme

effect-langextract

Effect-native LLM-powered structured extraction with source-grounded character spans.

An Effect TypeScript port of google/langextract — use LLMs to extract structured information from unstructured text, with every extraction mapped to exact character positions in the source.

Features

  • Source grounding — extractions map to exact CharInterval positions in the original text
  • Schema-constrained output — structured JSON enforced by the LLM provider
  • Long document support — chunking, parallel processing, multiple extraction passes
  • Fuzzy alignment — handles paraphrased/reworded extractions via token-level matching
  • Multi-provider — Gemini, OpenAI, Anthropic, Ollama (extensible via services)
  • Interactive visualization — self-contained HTML with color-coded highlights
  • Effect-native — services, layers, typed errors, structured concurrency throughout

Installation

npm install effect-langextract
# or
bun add effect-langextract

Peer dependency: typescript ^5

Quick Start — Library API

import { extract } from "effect-langextract"
import { makeExtractionExecutionLayer } from "effect-langextract"
import { BunHttpClient } from "@effect/platform-bun"
import { Effect } from "effect"

const program = extract({
  ingestion: {
    source: { _tag: "text", text: "Alice visited Paris last summer." },
    format: "text"
  },
  prompt: {
    description: "Extract people and places mentioned in the text.",
    examples: [
      {
        text: "Bob went to London.",
        extractions: [
          { class: "person", text: "Bob" },
          { class: "place", text: "London" }
        ]
      }
    ]
  },
  annotate: {
    maxCharBuffer: 50000,
    batchLength: 5,
    batchConcurrency: 2,
    providerConcurrency: 4,
    extractionPasses: 1
  }
})

const layer = makeExtractionExecutionLayer({
  provider: "openai",
  modelId: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY ?? "",
  providerConcurrency: 4,
  primedCacheNamespace: "openai"
})

program.pipe(
  Effect.provide(layer),
  Effect.provide(BunHttpClient.layer),
  Effect.runPromise
).then(console.log)

Streaming

Use extractStream for incremental AnnotatedDocument output:

import { extractStream } from "effect-langextract"
import { Stream } from "effect"

extractStream(request).pipe(
  Stream.runForEach((doc) => Effect.log(doc.documentId))
)

Rendering

import { renderDocuments } from "effect-langextract"

// JSON, JSONL, or self-contained HTML visualization
renderDocuments({ documents, format: "html" })

CLI Usage

The CLI provides two subcommands: extract and visualize.

# Extract entities from text
effect-langextract extract \
  --input "Alice visited Paris last summer." \
  --input-format text \
  --examples-file ./examples.json \
  --provider openai \
  --prompt "Extract people and places."

# Generate HTML visualization from extraction output
effect-langextract visualize \
  --input ./annotated-document.json \
  --output-path ./output.html

Provider Configuration

Each provider reads configuration from environment variables:

| Provider | API Key Env Var | Optional Env Vars | |---|---|---| | OpenAI | OPENAI_API_KEY | OPENAI_MODEL_ID, OPENAI_BASE_URL, OPENAI_ORGANIZATION | | Gemini | GEMINI_API_KEY | GEMINI_MODEL_ID, GEMINI_BASE_URL | | Anthropic | ANTHROPIC_API_KEY | ANTHROPIC_MODEL_ID, ANTHROPIC_BASE_URL | | Ollama | (none) | OLLAMA_MODEL_ID, OLLAMA_BASE_URL |

Core Pipeline

Input Text -> Chunking -> Prompt Building -> LLM Inference -> Parsing -> Alignment -> Output
  1. Chunking — split documents into chunks respecting maxCharBuffer, optional context windows
  2. Prompting — few-shot prompts with description + examples + query chunk
  3. Inference — batch LLM calls with parallel workers
  4. Parsing — extract JSON/YAML from LLM output
  5. Alignment — map extraction text to source character positions using token-level matching
  6. Merge — combine results from multiple passes

Key Data Types

  • Extraction — class, text, charInterval, attributes, alignmentStatus
  • Document — text, documentId, additionalContext
  • AnnotatedDocument — document + extractions + tokenizedText
  • CharInterval — startPos, endPos
  • AlignmentStatusmatch_exact | match_greater | match_lesser | match_fuzzy

Attribution

This project is an Effect TypeScript port of google/langextract, originally written in Python. The core extraction pipeline, alignment algorithm, and data model are derived from that work.


Development

Run

bun install
bun run typecheck
bun run test
bun run build

CLI (development)

bun run cli -- extract --input "Alice visited Paris" --input-format text --examples-file ./examples.json --provider anthropic
bun run cli -- visualize --input ./annotated-document.json --output-path ./output.html

Node-ready runtime:

bun run cli:node -- extract --input "Alice visited Paris" --input-format text --examples-file ./examples.json --provider anthropic

Performance Harness

bun run perf:annotator
bun run perf:annotator:report

Parity Diff Harness

Fixture-driven parity regression checks against the Python reference:

bun run parity:diff
bun run parity:diff:report
bun run parity:diff:update   # refresh baselines after intentional changes

Testing Convention

Services expose canonical Effect.Service test APIs:

  • Stateless/simple services: static readonly Test
  • Configurable/stateful services: static testLayer(...)

Live provider smoke tests are opt-in:

LANGEXTRACT_LIVE_PROVIDER_SMOKE=true bun run test
bun run test:smoke:providers

License

MIT