npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-json-extract

v0.6.1

Published

Extract & validate JSON from messy LLM output. Designed for Claude / Codex CLI workflows where tool-use is unavailable. XML-tag aware, with multi-stage fallbacks and jsonrepair.

Readme

llm-json-extract

npm version CI npm downloads bundle size types license

Extract and validate JSON from messy LLM output — the model is free to think out loud, explain itself, or wrap its answer in prose. As long as the actual JSON is somewhere in the response (ideally inside <result>...</result> tags), you'll get a clean parsed object back.

Designed for workflows where provider-native structured output is not available — Claude Code CLI, Codex CLI, agent frameworks, or any pipeline that asks a model for JSON via prompt rather than tool_use / response_format.

import { extractJson } from "llm-json-extract";

const llmOutput = `
<thinking>The user wants a list of fruits...</thinking>
<result>
{
  "items": ["apple", "banana", "cherry"],
  "count": 3,  // trailing comma — fine
}
</result>
`;

const data = extractJson(llmOutput);
// data === { items: ["apple", "banana", "cherry"], count: 3 }

With a schema:

import { extractJsonWith } from "llm-json-extract";
import { z } from "zod";

// llmOutput is defined in the example above
const Schema = z.object({ items: z.array(z.string()), count: z.number() });
const value = extractJsonWith(llmOutput, Schema);
// fully typed, validated

Why?

The Anthropic API has tool_use. OpenAI has Structured Outputs. But CLIs don't expose them. If you're shelling out to claude -p or codex exec from a batch script, the only thing you get back is free-form text — possibly with reasoning, prose, code fences, or trailing commas mixed in.

And even when you can enforce JSON-only output, doing so often hurts answer quality on reasoning-heavy tasks. Letting the model think freely and just pulling the JSON back out of its response is usually the better trade-off.

This package implements the de-facto pattern Anthropic recommends in their docs: ask the model to wrap its answer in an XML tag, then extract it. With fallbacks for the common cases where the model didn't quite follow instructions.

Features

  • Prose-tolerant by design — the model can think out loud; only the tagged answer is extracted
  • XML tag aware — finds <result>...</result>, <json>...</json>, etc. (configurable)
  • Multi-stage fallbacks — tag → fenced code block → bare {...} / [...] in raw text
  • Parse-aware fallthrough — if the preferred candidate fails to parse (or fails your schema), the next candidate is tried automatically; object/array results are preferred over stray primitives
  • Document-position pickLast — when the model echoes a prompt example, picks the real answer at the end
  • jsonrepair integrated — fixes trailing commas, single quotes, comments, unquoted keys
  • Schema-agnostic validation — pass zod.parse, valibot, arktype, or any (unknown) => T
  • No required peer deps — works standalone, opt-in validation
  • Typed errorsLlmJsonExtractError with stage (extract / parse / validate) and the raw text for debugging
  • ESM + CJS dual build, full .d.ts, npm provenance signed

Install

npm install llm-json-extract
# or
pnpm add llm-json-extract
# or
yarn add llm-json-extract

Usage

Prompt the model

The whole point of this library is that the model doesn't need to output JSON only — it can think out loud, explain itself, apologize, add a friendly closing line, whatever. As long as the actual answer is wrapped in <result>...</result> somewhere, you'll get clean JSON out. This is a feature, not a bug: forcing JSON-only output often degrades answer quality, especially for reasoning-heavy tasks.

Minimal — wrap the answer, prose is fine:

Wrap your final JSON answer in <result>...</result>. You can explain
your reasoning freely before or after.

Encourage reasoning (often improves quality):

Think through the problem step by step. When you're ready, put the
final JSON answer in <result>...</result>. You don't need to suppress
your reasoning — anything outside the tags is ignored.

With a schema:

Return JSON matching this schema, wrapped in <result>...</result>:

{
  "name": string,
  "age": integer,
  "hobbies": string[]
}

Trailing commas, comments, and single quotes are tolerated. Prose
outside the tags is fine.

Avoid example-echo collisions:

If your prompt shows an example like <result>{"score": 0}</result>, the model may echo it as part of its reasoning. pickLast (default) grabs the last <result> block, which is normally the real answer — but you can be explicit by using a different tag for the example:

Example format (do not copy these values):
<example>{"score": 0}</example>

Your real answer goes in <result>...</result>.

Extract

import { extractJson } from "llm-json-extract";

const data = extractJson(llmOutput); // returns unknown — cast or validate as needed

Extract + validate (zod)

import { extractJsonWith } from "llm-json-extract";
import { z } from "zod";

const User = z.object({
  name: z.string(),
  age: z.number(),
  hobbies: z.array(z.string()),
});

const user = extractJsonWith(llmOutput, User);
//    ^ type is z.infer<typeof User>

// `User.parse` also works if you prefer the function form:
// const user = extractJsonWith(llmOutput, User.parse);

Extract + validate (valibot)

import { extractJsonWith } from "llm-json-extract";
import * as v from "valibot";

const User = v.object({ name: v.string(), age: v.number() });
const user = extractJsonWith(llmOutput, (x) => v.parse(User, x));

Extract + validate (arktype)

import { extractJsonWith } from "llm-json-extract";
import { type } from "arktype";

const User = type({ name: "string", age: "number" });
const user = extractJsonWith(llmOutput, (x) => User.assert(x));

Just the string (for piping)

import { extractJsonString } from "llm-json-extract";

const jsonStr = extractJsonString(llmOutput); // string | null — not parsed

All candidates (advanced)

import { extractJsonCandidates } from "llm-json-extract";

const candidates = extractJsonCandidates(llmOutput);
// e.g. ["<final answer>", "<earlier echo>", "<fence body>", "<stray bare JSON>"]

Useful when you want to score, log, or pick candidates yourself. extractJson and extractJsonWith already try each candidate automatically, so most users don't need this.

CLI example

Pipe Claude Code CLI output directly:

claude -p 'List 3 fruits. Reply as <result>{"items":[...]}</result>.' \
  --output-format json \
  | jq -r .result \
  | node -e '
      import("llm-json-extract").then(({ extractJson }) => {
        let buf=""; process.stdin.on("data",d=>buf+=d).on("end",()=>{
          console.log(extractJson(buf));
        });
      });
    '

Or in a Node script that calls codex exec / claude -p:

import { execSync } from "node:child_process";
import { extractJsonWith } from "llm-json-extract";
import { z } from "zod";

const out = execSync(`codex exec "List 3 fruits as <result>{...}</result>"`, {
  encoding: "utf8",
});

const Schema = z.object({ items: z.array(z.string()) });
const { items } = extractJsonWith(out, Schema);

Options

All options are optional — the values below are the defaults:

extractJson(llmOutput, {
  tags: ["result", "json", "output"], // tag names; document position decides priority (not list order)
  pickLast: true,         // when multiple matches, prefer the one closer to the end
  tryCodeFence: true,     // also collect ```json``` / ``` ``` blocks as candidates
  tryBareJson: true,      // also collect balanced {...} / [...] runs as candidates
  repair: true,           // run jsonrepair before JSON.parse
});

Errors

import { LlmJsonExtractError } from "llm-json-extract";

try {
  const data = extractJson(llmOutput);
} catch (e) {
  if (e instanceof LlmJsonExtractError) {
    console.error(e.stage);     // "extract" | "parse" | "validate"
    console.error(e.raw);       // the original input
    console.error(e.extracted); // the substring we tried to parse (or null)
  }
}

Extraction strategy

A list of candidate JSON strings is built in this order:

  1. Tag match<result>, <json>, <output> by default (case-insensitive, attributes OK). The preferred match (last in document by default; controlled by pickLast) goes first, then other matches in document order.
  2. Code fence```json blocks first, then bare ``` ``` blocks, in document order.
  3. Bare JSON — balanced {...} / [...] runs in the text, respecting strings and escapes.

Then extractJson walks the candidate list, running jsonrepair and JSON.parse on each, returning the first one that yields an object or array. If only primitives (strings, numbers, etc.) parse — which happens when jsonrepair turns a stray prose word like nope into a string — those are returned only as a last resort. extractJsonWith does the same, additionally skipping candidates that fail your validator.

When not to use this

If you can call the Anthropic or OpenAI API directly, prefer tool use (Claude) or Structured Outputs (OpenAI) — they enforce schemas at the decoding step, with stronger guarantees than any post-hoc parser can give. This library is for the cases where you can't.

License

MIT