@isdk/json-repair
v0.1.2
Published
A powerful, schema-guided JSON repair library for fixing broken JSON generated by LLMs, featuring semantic coercion and fuzzy matching.
Downloads
214
Maintainers
Readme
@isdk/json-repair
【English|中文】
A powerful, schema-guided JSON repair library specifically designed to fix "broken" JSON generated by LLMs (Large Language Models).
🚀 Why this library?
LLMs often output JSON that is syntactically invalid or semantically "noisy". Common issues include:
- Missing quotes around keys or values.
- Unescaped quotes inside strings (e.g.,
"a"b"). - Missing commas or closing braces.
- Natural language mixed into values (e.g.,
"age": "about 30 years old"). - Implicit structures (e.g., an array of objects without curly braces).
Standard json-repair libraries only fix syntax. @isdk/json-repair uses your JSON Schema as a map to intelligently navigate, repair, and coerce the output into valid, structured data.
✨ Key Features
- Schema-Guided Repair: Uses your Schema to resolve ambiguities (e.g., knowing if a colon marks a new key or is part of a string).
- Greedy String Capture: Intelligently captures unquoted or broken strings until the next valid Schema key.
- Balanced Delimiter Tracking: Correctly handles structural markers (commas, colons) inside parentheses, brackets, or braces within DSL-like strings (e.g.,
@func(a, b)). - The "Parity Rule": Solves nested quote ambiguity (e.g., correctly distinguishing between
"A" OR "B"and"a"b"). - Semantic Coercion:
- Fuzzy Enum Matching: Matches "Processing!" to
"processing"if defined in Schema enums. - Noisy Number Extraction: Extracts
1200.5from"Approx. 1,200.50 USD". - Boolean Variants: Recognizes
yes/no,on/off,1/0,确定/取消as booleans.
- Fuzzy Enum Matching: Matches "Processing!" to
- Implicit Structure Support: Automatically repairs missing braces in objects and arrays.
- Performance: Support for reusing
SchemaWalkerinstances for high-throughput batch processing.
📦 Installation
npm install @isdk/json-repair @apidevtools/json-schema-ref-parser
# or
pnpm add @isdk/json-repair @apidevtools/json-schema-ref-parser🛠 Usage
1. Schema-Guided Repair (Recommended)
This is the most powerful way to use the library. Providing a schema enables advanced features like semantic coercion, greedy capture, and accurate disambiguation.
import { jsonRepair } from '@isdk/json-repair';
const schema = {
type: 'object',
properties: {
query: { type: 'string' },
status: { enum: ['success', 'error'] }
}
};
const brokenJson = '{ query: "python" OR "js", status: Success! }';
const result = await jsonRepair(brokenJson, schema);
console.log(result);
// Output: { query: '"python" OR "js"', status: 'success' }2. Simple Usage (Without Schema)
For basic syntax repair where schema guidance isn't required:
import { jsonRepair } from '@isdk/json-repair';
const result = await jsonRepair('{ name: John, age: 30 }');
console.log(result); // { name: 'John', age: 30 }Advanced: Reusing SchemaWalker (Batch Processing)
For better performance when processing many items with the same schema:
import { jsonRepair, SchemaWalker } from '@isdk/json-repair';
const walker = await SchemaWalker.create(mySchema);
for (const item of brokenItems) {
const result = await jsonRepair(item, walker);
// ...
}⚠️ Known Limitations & Disambiguation
- The "Parity Rule": When a string is wrapped in quotes and contains internal quotes, we count the internal ones.
- Odd count (e.g.,
"a"b"): We assume the outer quotes are delimiters ->a"b. - Even count (e.g.,
"A" OR "B"): We assume the outer quotes are part of the expression ->"A" OR "B". - Limitation: Highly unusual patterns like
"a"b"c"might remain ambiguous.
- Odd count (e.g.,
- Markdown Blocks: This library focuses on the JSON content. Please strip
json ...markers before passing the string if your LLM includes them.
📄 License
MIT © Riceball LEE
