llm-json-repair

v0.1.11

Published

4 days ago

Repair and parse broken JSON from LLM output (OpenAI, Claude, …) — fixes code fences, trailing commas, and truncated streams, then validates against your schema. Type-safe, zero-dependency, Standard Schema compatible

llm-json-repair

Repair and parse broken JSON from LLM output (OpenAI, Claude, …) — strips code fences, fixes trailing commas and single quotes, closes truncated streams, then optionally validates against your schema and gives you back fully typed data. Tree-shakeable, ships ESM + CJS + types, and has zero runtime dependencies

The problem: LLMs love to "return JSON" and then wrap it in ```json, add a trailing comma, use 'single quotes', prepend "Sure, here you go!", or get cut off mid-stream. JSON.parse throws on all of it. This library fixes the JSON first, then validates it — once, correctly, without exceptions to catch

At a glance

| Export | What it does | | --- | --- | | repairJson | Repair + (optionally) validate → typed RepairResult<T> | | repairJsonAsync | Same, for schemas with async validation/refinements | | repairJsonOrThrow | Returns the value directly; throws on failure | | repairJsonOrThrowAsync | Async variant of repairJsonOrThrow | | repairJsonStream | Incremental parser for JSON arriving in chunks | | repairJsonFromStream | Drain an AsyncIterable<string> → final result | | repairToString | Repair and return a canonical, valid JSON string | | tolerantParse | The low-level repairing parser → unknown |

There's also a CLI for piping JSON through the repairer from a shell.

Install

npm install llm-json-repair

node >= 20. No runtime dependencies — bring your own Standard Schema validator (Zod, Valibot, ArkType, …) only if you want validation

Quick start

import { repairJson } from "llm-json-repair";

const raw = '```json\n{ name: "Ada", admin: true, } // the boss\n```';

const result = repairJson(raw);
if (result.ok) {
  console.log(result.value);    // { name: "Ada", admin: true }
  console.log(result.repaired); // true — input wasn't clean JSON
  console.log(result.repairs);  // [{ kind: "code_fence" }, { kind: "unquoted_key" }, …]
}

Working with a streaming model? Jump to Streaming. Calling it from a shell? There's a CLI.

`repairJson(input, schema?, options?)`

Repairs messy JSON and, if you pass a schema, validates the result and infers the output type. Never throws — failures come back as a discriminated union:

import { repairJson } from "llm-json-repair";
import { z } from "zod";

const User = z.object({ name: z.string(), age: z.number() });

const result = repairJson("{name:'Ada',age:36,}", User);

if (result.ok) {
  result.value; // typed as { name: string; age: number }
} else {
  result.error.code;   // "validation_error"
  result.error.issues; // standard-schema issues, ready to display
}

type RepairResult<T> =
  | { ok: true;  value: T; repaired: boolean; repairs: RepairEvent[] }
  | { ok: false; error: { code: RepairErrorCode; message: string; issues?; cause? } };

type RepairErrorCode = "empty_input" | "parse_error" | "validation_error" | "async_schema";

| Field | Description | | --- | --- | | value | The parsed (and schema-validated, if a schema was given) value | | repaired | true when the input wasn't already JSON.parse-able and had to be repaired | | repairs | The specific fixes that were applied, in order | | error.code | Discriminator for the failure — branch on it without parsing messages | | error.issues | Standard Schema issues, present on validation_error |

What gets repaired

| Input | Result | | --- | --- | | ```json\n{"a":1}\n``` | { a: 1 } | | Here you go: {"a":1} | { a: 1 } | | {"a":1,"b":2,} | { a: 1, b: 2 } | | {'a':1} / {a:1} | { a: 1 } | | {"a":True,"b":None} | { a: true, b: null } | | {"a":1,"b":[1,2,3 (truncated) | { a: 1, b: [1, 2, 3] } | | // comment + /* block */ | stripped |

Safety & correctness

The repairer is built to be predictable on hostile or sloppy input:

No prototype pollution. A "__proto__" key is kept as an own property (exactly like JSON.parse), never merged into the prototype chain.
Bounded recursion. Deeply nested input fails with parse_error at maxDepth (default 512) instead of overflowing the stack.
Strict numbers. Only valid JSON number syntax becomes a number — 0x1F and 007 are kept verbatim as strings rather than silently turning into 31 / 7. Opt into bigint to keep large integers exact.

Async schemas

For schemas with async validation or refinements, use repairJsonAsync — the sync repairJson returns an async_schema error if it encounters one:

import { repairJsonAsync } from "llm-json-repair";

const result = await repairJsonAsync(raw, MyAsyncSchema);

Prefer throwing?

repairJsonOrThrow (and its async variant) return the value directly and throw a JsonRepairError carrying the same code and issues:

import { repairJsonOrThrow, JsonRepairError } from "llm-json-repair";

try {
  const user = repairJsonOrThrow(raw, User); // typed
} catch (err) {
  if (err instanceof JsonRepairError) {
    console.error(err.code, err.issues);
  }
}

`repairToString`

When you just want clean JSON text back (not a parsed value), repairToString repairs the input and re-serializes it canonically. Throws JsonRepairError if nothing parseable is found:

import { repairToString } from "llm-json-repair";

repairToString("{name:'Ada',}"); // '{"name":"Ada"}'

Streaming

LLMs emit structured output token by token. repairJsonStream parses the buffer as it grows, closing whatever isn't finished yet, so you can render a live, partial UI. push() returns the best-effort value so far; end() runs the final schema validation:

import { repairJsonStream } from "llm-json-repair";

const stream = repairJsonStream(User);

for await (const token of llm) {
  const partial = stream.push(token);
  if (partial.ok) render(partial.value); // updates as tokens arrive
}

const final = stream.end(); // validated against `User`

Prefer to hand over an async iterable? repairJsonFromStream drains it for you and reports each partial via onPartial:

import { repairJsonFromStream } from "llm-json-repair";

const result = await repairJsonFromStream(response, User, {
  onPartial: (p) => p.ok && render(p.value),
});

During streaming the schema is not applied (partial data would fail validation) — push() gives you the best-effort value, and validation runs once at end().

Inspecting the repairs

Every successful result carries a repairs array describing exactly what was changed — handy for logging or measuring how messy a model's output is:

const r = repairJson("```json\n{name:'Ada',}\n```");
r.ok && r.repairs.map((e) => e.kind);
// ["code_fence", "unquoted_key", "non_standard_quotes", "trailing_comma"]

Kinds: code_fence, surrounding_prose, comment, leading_comma, trailing_comma, unquoted_key, non_standard_quotes, bareword_string, closed_string, closed_object, closed_array. Each event also has an index into the parsed content.

Options

All repair* functions accept a RepairOptions bag — as the second argument when there's no schema, or the third alongside one:

repairJson(raw, { maxDepth: 64 });
repairJson(raw, User, { bigint: true });

| Option | Default | Description | | --- | --- | --- | | maxDepth | 512 | Nesting limit; deeper input fails with parse_error instead of overflowing the stack | | bigint | false | Parse integers beyond Number.MAX_SAFE_INTEGER as bigint instead of losing precision |

`tolerantParse`

The low-level repairing parser. Takes a string, returns unknown (no schema, no Result wrapper) — useful if you want to plug repair into your own pipeline:

import { tolerantParse } from "llm-json-repair";

const value = tolerantParse("[1,2,3,"); // [1, 2, 3]

CLI

The package ships an llm-json-repair bin that repairs JSON from a file or stdin and prints canonical JSON to stdout:

cat broken.json | llm-json-repair          # repair a file/stream
llm-json-repair data.json --pretty         # pretty-print
echo '{a:1,b:2,}' | npx llm-json-repair    # one-off, no install

Flags: --pretty, --bigint, --quiet, --help, --version. Exits 1 when nothing parseable is found.

Benchmarks

Run npm run bench (Vitest, vs jsonrepair). Indicative numbers on an M-series laptop, higher is better:

| Input | llm-json-repair | jsonrepair | | --- | --- | --- | | clean JSON | 91k ops/s | 7k ops/s | | messy (quotes, comments, commas) | 125k ops/s | 131k ops/s | | truncated | 16k ops/s | 8k ops/s | | markdown-fenced | 14k ops/s | — (not supported) |

Clean input takes a JSON.parse fast path, so there's no repair overhead until something is actually broken. Your numbers will vary with payload and hardware.

Development

npm install
npm test          # Vitest (incl. fast-check property tests)
npm run bench     # Vitest benchmarks vs jsonrepair
npm run lint      # Biome
npm run typecheck
npm run build     # ESM + CJS + .d.ts via tsup

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

llm-json-repair

At a glance

Install

Quick start

repairJson(input, schema?, options?)

What gets repaired

Safety & correctness

Async schemas

Prefer throwing?

repairToString

Streaming

Inspecting the repairs

Options

tolerantParse

CLI

Benchmarks

Development

License

`repairJson(input, schema?, options?)`

`repairToString`

`tolerantParse`