npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@claritylabs/cl-sdk

v1.3.8

Published

Deterministic insurance intelligence primitives for regulated AI agents

Readme

CL-SDK

Deterministic insurance intelligence primitives for regulated AI agents.

Documentation | npm | GitHub

Installation

npm install @claritylabs/cl-sdk pdf-lib zod

What It Does

  • Document Extraction — Deterministic extraction pipeline with focused model calls that turns insurance PDFs or host-provided Docling documents into structured data with page-level provenance, quality gates, first-class definitions and covered reasons, referential coverage resolution, cost-aware formatting, and automatic declarations-to-schema promotion (limits, deductibles, locations, broker, loss payees, premium, taxes/fees, summary)
  • Source Grounding — Shared source spans, hierarchical table row/cell evidence, source chunks, source stores, quoted evidence validation, and deterministic evidence ordering across extraction, query, application, PCE, and case workflows
  • Query Agent — Citation-backed question answering over stored documents, source spans, and inbound photos/PDFs/text with sub-question decomposition, bounded retrieval planning, attachment-only reasoning when retrieval is unnecessary, and grounding verification
  • Application Processing — Bounded workflows handle intake with deterministic planning — field extraction, prior-answer backfill, context auto-fill, document lookup gating, topic-based question batching, reply parsing, source-backed field provenance, and PDF mapping
  • Policy Change Endorsements — PCE intake, evidence collection, missing-info handling, quality gates, execution mode selection, and reviewable submission packets
  • Case Workflows — Shared primitives for evidence-backed proposals, missing information, validation issues, stable IDs, and packet artifacts
  • Agent System — Composable prompt modules for building insurance-aware agents across email, chat, SMS, Slack, and Discord with human-reviewable behavior
  • Storage — DocumentStore, MemoryStore, SourceStore, and ApplicationStore interfaces with reference implementations where appropriate

Quick Start

import { createExtractor } from "@claritylabs/cl-sdk";

const extractor = createExtractor({
  generateText: async ({ prompt, system, maxTokens, taskKind, budgetDiagnostics, providerOptions }) => {
    const result = await yourProvider.generate({ prompt, system, maxTokens, taskKind, budgetDiagnostics, providerOptions });
    return { text: result.text, usage: result.usage };
  },
  generateObject: async ({ prompt, system, schema, maxTokens, taskKind, budgetDiagnostics, providerOptions }) => {
    // Pass providerOptions.pdfBase64 and/or providerOptions.images to your model
    const result = await yourProvider.generateStructured({ prompt, system, schema, maxTokens, taskKind, budgetDiagnostics, providerOptions });
    return { object: result.object, usage: result.usage };
  },
  concurrency: 3,
  pageMapConcurrency: 3,
  extractorConcurrency: 4,
  formatConcurrency: 2,
  reviewMode: "auto",
});

const result = await extractor.extract(pdfBase64);
console.log(result.document);     // Typed InsuranceDocument
console.log(result.chunks);       // DocumentChunk[] for vector storage
console.log(result.sourceSpans);  // SourceSpan[] when supplied by the host
console.log(result.reviewReport); // Quality gate results

Optional Docling input

If your host pre-processes a PDF with Docling, pass the serialized DoclingDocument JSON instead of a PDF. CL-SDK does not install or run Python Docling; it consumes the parsed document, builds source spans, and runs the same focused structuring pipeline over Docling page text. Docling tables are represented as table, row, and cell source spans; row spans are treated as the canonical evidence for extracted table facts.

const result = await extractor.extract({
  kind: "docling_document",
  document: doclingDocumentJson,
  sourceKind: "policy_pdf",
}, "policy-123");

Source Grounding

Source spans are the v1 evidence layer. Build spans from PDF text, OCR, emails, attachments, or structured fields, then pass them into extraction and downstream workflows:

import { buildPageSourceSpans, MemorySourceStore, createExtractor } from "@claritylabs/cl-sdk";

const pageOneText = "..."; // text from your PDF text/OCR pipeline
const sourceSpans = buildPageSourceSpans([
  { documentId: "policy-123", sourceKind: "policy_pdf", pageNumber: 1, text: pageOneText },
]);

const sourceStore = new MemorySourceStore();
const extractor = createExtractor({ generateText, generateObject, sourceStore });

const result = await extractor.extract(pdfBase64, "policy-123", { sourceSpans });

When source spans are available, section and endorsement extraction returns a compact index with page ranges, short excerpts, and sourceSpanIds/sourceTextHash instead of asking the model to reproduce full policy wording. Table-derived records prefer parent row spans over isolated cells, and coverage schedule rows can be recovered deterministically when the model misses explicit table evidence. Store result.sourceSpans/source chunks as the canonical evidence corpus for Q&A and source viewers; use result.chunks for structured facts and navigation metadata.

See the full documentation for architecture, provider setup, API reference, and more.

Multimodal Querying

createQueryAgent() now accepts user-supplied attachments on each query. This is meant for flows like:

  • an SMS user texting a photo of apartment damage
  • a broker or insured emailing a COI or other PDF for context
  • a caller pasting text from an email thread alongside a question
import { createQueryAgent } from "@claritylabs/cl-sdk";

const agent = createQueryAgent({
  generateText,
  generateObject,
  documentStore,
  memoryStore,
  sourceRetriever,
});

const result = await agent.query({
  question: "What details do we still need, and does this relate to the stored policy?",
  conversationId: "conv-123",
  attachments: [
    {
      kind: "image",
      name: "damage.jpg",
      mimeType: "image/jpeg",
      base64: damagePhotoBase64,
    },
    {
      kind: "pdf",
      name: "coi.pdf",
      mimeType: "application/pdf",
      base64: coiPdfBase64,
    },
  ],
});

The query workflow first interprets each attachment into structured evidence, then uses the query classifier to decide whether stored-document retrieval is needed. Simple or attachment-only questions can skip retrieval and reason over the available evidence directly; document-backed questions still retrieve chunks, reason over citations, and run grounding verification. Verification can request targeted retry retrieval for weak sub-answers.

Important: your generateObject callback must actually forward multimodal payloads from providerOptions to the model request:

  • providerOptions.attachments for generic image/pdf/text inputs
  • providerOptions.pdfBase64 for PDF inputs
  • providerOptions.images for image inputs
  • providerOptions.doclingText for host-provided Docling document inputs
  • providerOptions.sourceSpans and providerOptions.sourceChunks for source evidence when your host passes them through

If your callback ignores those fields, the model will only see the text prompt.

Model routing metadata

Every SDK model callback may receive taskKind, budgetDiagnostics, and trace. Hosts can use these provider-agnostic fields for cheap-first routing, fallback, and telemetry without the SDK hardcoding model names. Example task kinds include extraction_classify, extraction_focused, extraction_review, query_reason, application_extract_fields, and pce_impact_analysis. budgetDiagnostics includes the resolved output-token cap, the lower preferred task budget, and truncation-risk warnings for the current subtask. When model capabilities include maxOutputTokens, the SDK uses that model maximum as the request cap instead of treating low task preferences as hard limits. trace identifies the current extractor, page range, format batch, or source-backed call so host logs can show what was being generated instead of a generic model-call label.

Bounded Agentic Workflows

CL-SDK uses deterministic scaffolding with agentic decision points rather than fixed all-tools-all-the-time chains:

  • Extraction page mapping and review choose focused follow-up extractors from the live extractor catalog. Definitions and covered reasons can fall back through section extraction when a focused run returns no usable records.
  • Supplementary extraction runs only when page assignments, form inventory, existing extracted text, or review follow-up tasks indicate regulatory, claims, notice, cancellation, or contact facts are likely present.
  • Referential coverage resolution tries cheap local section/form matches first, then uses bounded target-specific actions for declarations, schedules, sections, page-location lookup, or skip.
  • Page mapping, focused extractors, referential lookup, and formatting use separate concurrency controls. Page-scoped PDF and image ranges are cached so overlapping extractor tasks do not repeatedly slice or render the same pages.
  • Formatting skips the LLM cleanup pass for plain prose and formats long or noisy markdown/table/list content in parallel batches.
  • reviewMode: "auto" skips the expensive LLM review pass when deterministic checks are clean and source spans are available. Use "always" for maximum review coverage or "skip" when the host owns quality review separately.
  • Application processing plans optional backfill, context auto-fill, document search, batching, reply parsing, lookup, explanations, and next-batch email generation based on current state.

These gates reduce unnecessary provider calls while preserving reliability for edge cases where additional focused extraction or retrieval is needed.

Development

npm install
npm run build      # ESM + CJS + types via tsup
npm run dev        # Watch mode
npm run typecheck  # tsc --noEmit
npm test           # vitest

Zero framework dependencies. Peer deps: pdf-lib, zod. Optional: better-sqlite3.

License

Apache-2.0