@solvers-hub/llm-json
v0.1.11
Published
A TypeScript SDK to extract and correct JSON from LLM outputs
Readme
LLM-JSON Extractor
A TypeScript SDK for extracting and correcting JSON data from LLM outputs.
Overview
LLM-JSON is a lightweight library designed to parse and extract JSON objects from large language model (LLM) outputs. It can handle multiple JSON objects within text, extract text separately from JSON, and even attempt to fix malformed JSON.
Key Features
- Text/JSON Separation: Cleanly separates text content from JSON data in LLM outputs
- Multiple JSON Support: Extracts multiple JSON objects or arrays from a single input
- JSON Validation & Correction: Automatically fixes common JSON formatting errors from LLMs
- Code Block Support: Extracts JSON from markdown code blocks (```json)
- Schema Validation: Validates extracted JSON against provided schemas
- TypeScript Support: Written in TypeScript with full type definitions
Quick Start
Installation
npm install @solvers-hub/llm-jsonBasic Usage
import { LlmJson } from '@solvers-hub/llm-json';
const llmOutput = `Here's some text followed by JSON:
{
"name": "John",
"age": 30,
"skills": ["JavaScript", "TypeScript", "React"]
}`;
const llmJson = new LlmJson({ attemptCorrection: true });
const { text, json } = llmJson.extract(llmOutput);
console.log(text); // ['Here\'s some text followed by JSON:']
console.log(json); // [{ name: 'John', age: 30, skills: ['JavaScript', 'TypeScript', 'React'] }]Schema Validation
You can validate extracted JSON against schemas:
import { LlmJson } from '@solvers-hub/llm-json';
const schemas = [
{
name: 'person',
schema: {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'integer' }
},
required: ['name', 'age']
}
}
];
const llmJson = new LlmJson({
attemptCorrection: true,
schemas
});
const llmOutput = `Here's a person: {"name": "John", "age": 30}
And some other data: {"title": "Meeting notes"}`;
const result = llmJson.extract(llmOutput);
// Note: All extracted JSON objects are included in the json array
console.log(result.json);
// [
// { name: 'John', age: 30 },
// { title: 'Meeting notes' }
// ]
// The validatedJson array includes validation results for each JSON object
console.log(result.validatedJson);
// [
// {
// json: { name: 'John', age: 30 },
// matchedSchema: 'person',
// isValid: true
// },
// {
// json: { title: 'Meeting notes' },
// matchedSchema: null,
// isValid: false,
// validationErrors: [...] // Validation errors
// }
// ]
Advanced Usage
Using with Zod Schemas
While llm-json uses JSON Schema internally, you can easily integrate with Zod using the zod-to-json-schema library:
npm install zod zod-to-json-schemaimport { LlmJson } from '@solvers-hub/llm-json';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
// Define your Zod schema
const productSchema = z.object({
productId: z.string(),
quantity: z.number().min(0)
});
// Convert to JSON Schema
const llmJson = new LlmJson({
attemptCorrection: true,
schemas: [{
name: 'product',
schema: zodToJsonSchema(productSchema)
}]
});
// Process LLM output
const result = llmJson.extract('Found product: {"productId": "abc-123", "quantity": 10}');
// Get validated result
if (result.validatedJson[0]?.isValid) {
const validatedProduct = productSchema.parse(result.validatedJson[0].json);
console.log(validatedProduct); // Type-safe!
}Handling Nested Validation Errors
When validating complex nested schemas, you can identify specific validation failures:
const playlistSchema = {
name: 'playlist',
schema: {
type: 'object',
properties: {
playlistId: { type: 'string' },
tracks: {
type: 'array',
items: {
type: 'object',
properties: {
id: { type: 'string' },
title: { type: 'string' },
duration: { type: 'number' }
},
required: ['id', 'title', 'duration']
}
}
},
required: ['playlistId', 'tracks']
}
};
const result = llmJson.extract(llmOutput);
if (!result.validatedJson[0]?.isValid) {
// Identify which array item failed
result.validatedJson[0].validationErrors?.forEach(error => {
console.log('Path:', error.instancePath); // e.g., "/tracks/1/duration"
console.log('Message:', error.message);
// Extract the failed item index
const match = error.instancePath?.match(/\/tracks\/(\d+)/);
if (match) {
const failedIndex = parseInt(match[1]);
console.log('Failed track:', result.json[0].tracks[failedIndex]);
}
});
}Two-Stage Parsing for Performance
For high-throughput pipelines, use a two-stage strategy:
class OptimizedParser {
private fastParser = new LlmJson({ attemptCorrection: false }); // Stage 1
private fallbackParser = new LlmJson({ attemptCorrection: true }); // Stage 2
parse(input: string) {
// Stage 1: Fast path (no correction)
const result = this.fastParser.extract(input);
// Only use Stage 2 if Stage 1 failed AND input contains JSON-like patterns
if (result.json.length === 0 && this.hasJsonPattern(input)) {
return this.fallbackParser.extract(input); // Stage 2: Fallback with correction
}
return result;
}
private hasJsonPattern(input: string): boolean {
return /\{[^}]*:/.test(input) || /\[[^\]]*\{/.test(input);
}
}Trigger conditions for Stage 2:
- Stage 1 returns empty
jsonarray - Input contains JSON-like patterns (
{,[,:, etc.) - Distinguishes between "no JSON" vs "malformed JSON"
Streaming/Chunked Input Handling
For real-time LLM streaming, buffer chunks until complete JSON is detected:
class StreamingHandler {
private buffer = '';
private llmJson = new LlmJson({ attemptCorrection: true });
processChunk(chunk: string) {
this.buffer += chunk;
// Detect complete JSON by counting braces
const complete = this.findCompleteJson(this.buffer);
if (complete) {
const result = this.llmJson.extract(complete.json);
this.buffer = complete.remaining;
return result.json;
}
return []; // Still buffering
}
private findCompleteJson(text: string): { json: string; remaining: string } | null {
let depth = 0;
let start = text.indexOf('{');
if (start === -1) return null;
for (let i = start; i < text.length; i++) {
if (text[i] === '{') depth++;
if (text[i] === '}') depth--;
if (depth === 0) {
return {
json: text.substring(start, i + 1),
remaining: text.substring(i + 1)
};
}
}
return null; // Incomplete
}
}Primary challenges:
- Detecting JSON boundaries in incomplete streams
- Handling braces inside strings (must track string state)
- Supporting nested objects (depth counting)
- Managing buffer size for long streams
Processing Multiple Schema Types
Build a typed processor for different data types:
class LogProcessor {
private llmJson: LlmJson;
constructor(schemas: { errorLog: any; infoLog: any }) {
this.llmJson = new LlmJson({
attemptCorrection: true,
schemas: [
{ name: 'errorLog', schema: schemas.errorLog },
{ name: 'infoLog', schema: schemas.infoLog }
]
});
}
process(logEntry: string): { errors: any[]; infos: any[] } {
const result = this.llmJson.extract(logEntry);
const errors: any[] = [];
const infos: any[] = [];
result.validatedJson?.forEach(validated => {
if (validated.isValid) {
if (validated.matchedSchema === 'errorLog') {
errors.push(validated.json);
} else if (validated.matchedSchema === 'infoLog') {
infos.push(validated.json);
}
}
});
return { errors, infos };
}
}Examples
See the examples directory for comprehensive examples:
- example.ts - Basic extraction and correction
- array-example.ts - Extracting arrays from LLM output
- schema-validation-example.ts - JSON Schema validation
- schema-validation-examples.ts - Complex nested schemas
- zod-integration-example.ts - Using Zod schemas with llm-json
- advanced-patterns-example.ts - Streaming, two-stage parsing, error detection
Run examples:
npm run example:run -- examples/example.ts
npm run example:run -- examples/zod-integration-example.ts
npm run example:run -- examples/advanced-patterns-example.tsAPI Reference
ExtractResult
interface ExtractResult {
text: string[]; // Non-JSON text segments
json: any[]; // Extracted JSON objects/arrays
validatedJson?: Array<{ // Validation results (if schemas provided)
json: any;
matchedSchema: string | null;
isValid: boolean;
validationErrors?: Array<{
instancePath?: string; // Path to failed property (e.g., "/tracks/1/duration")
schemaPath?: string; // Path in schema
message?: string; // Error description
}>;
}>;
}Distinguishing Parse Failures from Empty Input
function diagnoseResult(input: string, result: ExtractResult): string {
// Case 1: Success
if (result.json.length > 0) return 'success';
// Case 2: No JSON-like patterns = genuinely empty
const hasJsonPattern = /\{[^}]*:/.test(input) || /\[[^\]]*\{/.test(input);
if (!hasJsonPattern) return 'no_json';
// Case 3: Has JSON patterns but extraction failed = parse failure
return 'parse_failure';
}License
MIT © 2025
