llm-json-fix

v1.0.0

Published

a year ago

Fix malformed JSON outputs from Large Language Models (LLMs)

0High
0Medium
0Low

drawoh

json repair fix llm ai openai anthropic claude gpt json-schema validation

LLM JSON Fix

A comprehensive library for repairing malformed JSON outputs from Large Language Models (LLMs).

Why This Library?

JSON outputs from LLMs are powerful but notoriously inconsistent. Even a small 1% failure rate in JSON formatting can cause system failures that are difficult to debug. This library automatically identifies and repairs common issues in LLM-generated JSON, making your AI integrations more robust and reliable.

Features

LLM-Specific Repairs: Handles unique issues in AI-generated content
Markdown Cleanup: Removes code blocks, explanatory text, and other non-JSON content
Streaming Support: Process infinitely large documents with minimal memory usage
Schema Flexibility: Works with any JSON structure
Model-Specific Optimizations: Can be configured for OpenAI, Anthropic, or other LLMs

Installation

# Using npm
npm install llm-json-fix

# Using yarn
yarn add llm-json-fix

# Using pnpm
pnpm add llm-json-fix

Requirements

Node.js 14.0.0 or higher
Works in both CommonJS and ESM environments

Basic Usage

import { fixLLMJson } from 'llm-json-fix';

// Fix malformed JSON from an LLM
const response = `Here's the JSON you requested: \`\`\`json
{
  name: "John",
  items: ['apple', 'banana', ...],
  active: True
}
\`\`\``;

// Repair the JSON
const fixedJson = fixLLMJson(response);

// Use the fixed JSON
const data = JSON.parse(fixedJson);
console.log(data);

Issues Fixed

Incomplete JSON Structures

Truncated outputs where closing brackets are missing
Unfinished arrays or objects due to token limits
Partial final elements

Quote Inconsistencies

Mixing of single and double quotes
Unclosed quotes
Incorrectly escaped quotes within strings

Schema Violations

Property names without quotes
Extra or missing commas
Trailing commas (valid in JavaScript but invalid in JSON)

Markdown Artifacts

Code block markers (```) included in the JSON
Explanation text mixed with JSON output
Markdown formatting within JSON strings

LLM Hallucinations

Explanatory comments included in the JSON
"..." or "[more items]" placeholders
Natural language interruptions mid-JSON

Nested JSON Formatting Issues

Inconsistent indentation
Improperly escaped nested JSON strings
Confusion between string representations of objects and actual objects

API Reference

Regular API

fixLLMJson(text: string, options?: FixLLMJsonOptions): string

Options

interface FixLLMJsonOptions {
  // Whether to apply model-specific fixes (default: true)
  applyModelSpecificFixes?: boolean;
  
  // The specific LLM model being used, for optimized repairs
  // Supported values: 'openai', 'anthropic', 'general'
  model?: 'openai' | 'anthropic' | 'general';
  
  // Whether to preserve comments in the JSON (default: false)
  preserveComments?: boolean;
  
  // Whether to be verbose about changes being made
  verbose?: boolean;
}

Streaming API

For processing large files or streams:

import { createLLMJsonFixStream } from 'llm-json-fix/stream';
import { createReadStream, createWriteStream } from 'fs';
import { pipeline } from 'stream';

const inputStream = createReadStream('broken.json');
const outputStream = createWriteStream('fixed.json');
const fixStream = createLLMJsonFixStream({ 
  bufferSize: 64 * 1024, // 64KB
  model: 'openai'
});

pipeline(inputStream, fixStream, outputStream, (err) => {
  if (err) {
    console.error('Error:', err);
  } else {
    console.log('JSON successfully repaired!');
  }
});

Command Line Interface

This package provides a command-line tool for repairing JSON files:

# Install globally
npm install -g llm-json-fix

# Repair a file
llm-json-fix broken.json > fixed.json

# Or with options
llm-json-fix broken.json --output fixed.json --model openai --verbose

CLI Options

--version, -v       Show application version
--help,    -h       Display help for command
--output,  -o       Output file
--overwrite         Overwrite the input file
--buffer            Buffer size in bytes, for example 64K (default) or 1M
--model             Specify the LLM model (openai, anthropic, general)
--verbose           Show detailed repair information
--preserve-comments Preserve comments in the output

Examples

See the examples directory for more usage examples:

Common Patterns & Integration Tips

With OpenAI

try {
  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      { role: "system", content: "Respond with valid JSON only." },
      { role: "user", content: prompt }
    ]
  });
  
  const content = response.choices[0].message.content;
  const fixedJson = fixLLMJson(content, { model: 'openai' });
  const data = JSON.parse(fixedJson);
  
  // Use the data...
} catch (error) {
  console.error('Error:', error);
}

With Anthropic Claude

try {
  const response = await anthropic.messages.create({
    model: "claude-3-opus-20240229",
    max_tokens: 4000,
    messages: [
      { role: "user", content: "Return this data as JSON: " + prompt }
    ],
    system: "Return only valid JSON data with no additional text."
  });
  
  const content = response.content[0].text;
  const fixedJson = fixLLMJson(content, { model: 'anthropic' });
  const data = JSON.parse(fixedJson);
  
  // Use the data...
} catch (error) {
  console.error('Error:', error);
}

License

MIT License

Package Contents

The npm package includes:

CommonJS build for Node.js environments
ESM build for modern JavaScript environments
UMD build for browser usage
TypeScript type definitions
CLI executable
Full documentation

For more information, see the changelog.