npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

intext

v0.0.2

Published

Package to extract JSON from long text using sliding-window chunking and user-supplied LLM client

Readme

intext

intext — a minimal TypeScript library to extract structured JSON from arbitrarily long text using:

  • sliding-window chunking (word-based by default)
  • per‑chunk extraction prompts to your LLM client
  • a single, schema‑aware final reduction that returns the final JSON object

No runtime dependencies. You bring an OpenAI‑compatible client (object exposing chat.completions.create(args)).


Table of contents

  1. Overview
  2. Install
  3. Quickstart
  4. Example OpenAI client
  5. API Reference
  6. Chunking & tuning
  7. Aggregation (final reduction)
  8. Examples
  9. Tests & build
  10. Security & privacy
  11. License

1 — Overview

intext helps you extract fields (you define the fields) from long documents by:

  • tokenizing / chunking the text with a sliding window
  • sending per‑chunk extraction prompts to your provided LLM client
  • parsing and normalizing per‑chunk results
  • aggregating them with a final schema‑aware reduction (LLM returns the final JSON object)
  • returning final JSON plus provenance (which chunks contributed to each field)

Key ideas:

  • Minimal runtime dependencies (zero)
  • You provide the LLM client; intext never calls external APIs directly
  • Schema‑driven: you tell intext what to extract using a simple JSON‑Schema‑like object

2 — Install

After publishing to npm:

npm install intext
# or
pnpm add intext
yarn add intext

During local development, use the source directly or npm pack.


3 — Quickstart

import { createIntext, SchemaField } from "intext"; // or from "./src/index" in this repo

// Minimal OpenAI‑compatible client
function createOpenAIClient(apiKey: string, baseURL = "https://api.openai.com/v1") {
  return {
    chat: {
      completions: {
        create: async (args: Record<string, any>) => {
          const res = await fetch(`${baseURL}/chat/completions`, {
            method: "POST",
            headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
            body: JSON.stringify(args),
          });
          if (!res.ok) throw new Error(`LLM error ${res.status}`);
          return res.json();
        },
      },
    },
  };
}

// Create an intext instance
const openai = createOpenAIClient(process.env.OPENAI_API_KEY!);
const intext = createIntext({
  openai,
  clientParams: { model: "gpt-4o", temperature: 0 },
  // optional library‑level defaults, e.g. { stream: false }
  // defaultRequestParams: { stream: false },
});

// Define a JSON‑Schema‑like target object shape
const schema: SchemaField = {
  type: "object",
  properties: {
    issue: { type: "string", description: "one‑sentence summary" },
    next_moves: {
      type: "array",
      description: "list of actions",
      items: {
        type: "object",
        properties: {
          text: { type: "string" },
          owner: { type: "string" },
          due: { type: "string" },
        },
      },
    },
    status: {
      type: "string",
      description: "overall state",
      enum: ["open", "blocked", "done"],
    },
  },
};

const longText = `...very long transcript...`;

const result = await intext.extract(longText, {
  schema,
  chunkTokens: 1500,
  overlapTokens: 300,
  concurrency: 3,
  // per‑call overrides take precedence over clientParams/defaultRequestParams
  // llmCallOptions: { temperature: 0.2 },
});

console.log(JSON.stringify(result.json, null, 2));
console.log(result.metadata.perFieldProvenance);

4 — Example OpenAI client

A copy‑ready client using fetch (Node >= 18 or compatible runtimes):

export function createOpenAIClient(apiKey: string, baseURL = "https://api.openai.com/v1") {
  return {
    chat: {
      completions: {
        create: async (args: Record<string, any>) => {
          const r = await fetch(`${baseURL}/chat/completions`, {
            method: "POST",
            headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
            body: JSON.stringify(args),
          });
          if (!r.ok) {
            const t = await r.text();
            throw new Error(`OpenAI error ${r.status}: ${t}`);
          }
          return r.json();
        },
      },
    },
  };
}

5 — API Reference

createIntext(params) => { extract }

Parameters:

  • openai: OpenAICompatibleClient
    • Must expose chat.completions.create(args) => Promise<any>
  • clientParams: ClientPreferredParams
    • e.g., { model: string; temperature?: number; max_tokens?: number; ... }
  • defaultRequestParams?: Record<string, any>
    • Optional library‑level defaults merged into every request

Returns an object with:

  • extract(text: string, opts: ExtractOptions): Promise<ExtractResult>

Types

type SchemaEnumValue = string | number | boolean | null;

type SchemaNodeBase = {
  description?: string;
  enum?: SchemaEnumValue[];
};

type PrimitiveSchemaNode = SchemaNodeBase & {
  type: "string" | "number" | "boolean";
};

type ArraySchemaNode = SchemaNodeBase & {
  type: "array";
  items: SchemaNode;
};

type ObjectSchemaNode = SchemaNodeBase & {
  type: "object";
  properties: Record<string, SchemaNode>;
  required?: string[];
};

export type SchemaNode = PrimitiveSchemaNode | ArraySchemaNode | ObjectSchemaNode;

export type SchemaField = ObjectSchemaNode;

enum constrains a field (or array items) to a set of literal values. intext includes these hints in the prompts and drops per‑chunk values that fall outside the allowed set.

export type ExtractOptions = {
  schema: SchemaField;
  chunkTokens?: number;    // default 1500
  overlapTokens?: number;  // default 300
  concurrency?: number;    // default 3
  tokenizer?: (text: string) => string[];
  llmCallOptions?: Record<string, any>;
  debug?: boolean;
};

export type ExtractResult = {
  json: Record<string, any>;
  metadata: {
    chunkCount: number;
    perFieldProvenance: Record<string, { sourceChunks: number[] }>;
    rawChunkResults: Array<{ chunkId: number; parsed: Record<string, any>; raw?: string }>;
  };
};

export type OpenAICompatibleClient = {
  chat: { completions: { create: (args: Record<string, any>) => Promise<any> } };
};

export type ClientPreferredParams = {
  model: string;
  temperature?: number;
  max_tokens?: number;
  [key: string]: any;
};

6 — Chunking & tuning

  • Default tokenizer is word‑based (text.split(/\s+/)). Bring your own tokenizer for exact model tokens.
  • Defaults: chunkTokens = 1500, overlapTokens = 300.
  • Overlap ensures items cut across boundaries are seen at least once fully.
  • For lower latency, reduce chunk size or increase concurrency (watch cost).

7 — Aggregation (final reduction)

intext aggregates per‑chunk results using a single final reduction prompt that includes your JSON schema and all per‑chunk results. The LLM returns ONLY the final JSON object. The library returns that JSON and provenance (which chunks contributed non‑null values per field).

You can implement your own aggregation externally by consuming metadata.rawChunkResults.


8 — Examples

This repo includes runnable examples:

  • examples/basic-example.ts
  • examples/meeting-analysis.ts

Run with:

npm run examples:basic
npm run examples:meeting

Make sure OPENAI_API_KEY is set in your environment.


9 — Tests & build

Run tests:

npm test

Build the library (emits to dist/):

npm run build

10 — Security & privacy

  • intext does not call the network by itself — you provide the client.
  • Handle API keys securely and respect data governance policies.
  • Consider redaction pre‑tokenization if sending sensitive text to external services.

11 — License

MIT