loot-json

v0.5.0

Published

4 months ago

Don't just parse. Loot it. The ultimate JSON extractor for unstable LLM outputs.

0High
0Medium
0Low

rossjang

json parser llm extractor ai openai gpt claude markdown repair robust

💎 loot-json

Don't just parse. Loot it. The ultimate JSON extractor for unstable LLM outputs.

loot-json is a robust TypeScript library designed to salvage valid JSON objects from the messy, unpredictable text generated by Large Language Models (LLMs). Whether it's buried in Markdown, mixed with Chain-of-Thought reasoning, or structurally broken—we loot it.

✨ Features

🛡️ Durable: Extracts JSON from Markdown blocks, plain text, or mixed content
🔧 Forgiving: Auto-repairs common LLM syntax errors (trailing commas, single quotes, comments)
📦 Zero Dependency: Lightweight and fast
🇹 Type-Safe: Full TypeScript generics support
⚡ Streaming: Incremental parsing for real-time field extraction
🔍 Field Extraction: Extract specific fields without parsing the entire document
✅ Validation: Built-in JSON Schema validation

📦 Installation

npm install loot-json

🚀 Quick Start

import { loot } from 'loot-json';

const dirtyText = `
  Sure! I found the item for you.
  \`\`\`json
  {
    "id": "sword_01",
    "damage": 50, // It's strong!
  }
  \`\`\`
`;

const item = loot(dirtyText);
console.log(item); // { id: "sword_01", damage: 50 }

📖 API Reference

`loot<T>(text, options?)`

Extract and parse JSON from text.

// Basic usage
const data = loot<Item>(text);

// With options
const data = loot(text, {
  silent: true,      // Returns null instead of throwing
  repair: true,      // Auto-repair malformed JSON (default: true)
  all: true,         // Extract all JSON objects found
  reportRepairs: true // Return repair logs
});

Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | silent | boolean | false | Return null instead of throwing on failure | | repair | boolean | true | Attempt to repair malformed JSON | | all | boolean | false | Extract all JSON objects found | | reportRepairs | boolean | false | Include repair logs in result |

Examples

// Extract all JSON objects
const items = loot(text, { all: true });
// [{ id: 1 }, { id: 2 }, { id: 3 }]

// Silent mode (no throwing)
const result = loot(maybeHasJson, { silent: true });
if (result === null) {
  console.log('No JSON found');
}

// With repair report
const { result, repairs } = loot(text, { reportRepairs: true });
console.log(repairs);
// [{ type: 'trailing_comma', position: 45, description: '...', fixed: true }]

Error Handling with `isLootError`

Type-safe error handling for loot operations.

import { loot, isLootError } from 'loot-json';

try {
  const output = getLLMOutput();
  const data = loot<MySchema>(output);
  // use data...
} catch (error) {
  if (isLootError(error)) {
    // TypeScript knows error is LootError
    console.log('Loot failed:', error.code, error.message);
    
    switch (error.code) {
      case 'EMPTY_INPUT':
        // Handle empty input
        break;
      case 'NO_JSON_FOUND':
        // Handle no JSON found
        break;
      case 'PARSE_FAILED':
        // Handle parse failure
        break;
    }
  }
}

Error Codes

| Code | Description | |------|-------------| | EMPTY_INPUT | Input text is empty or not a string | | NO_JSON_FOUND | No valid JSON found in text | | PARSE_FAILED | JSON parsing failed even after repair | | FIELD_NOT_FOUND | Requested field not found (lootField) | | VALIDATION_FAILED | Schema validation failed |

`lootField<T>(text, path, options?)`

Extract a specific field without parsing the entire document.

import { lootField } from 'loot-json';

// Simple field
const name = lootField<string>(text, 'name');

// Nested field (dot notation)
const city = lootField(text, 'user.address.city');

// Bracket notation for special keys
const value = lootField(text, 'data["special.key"]');

`IncrementalLoot<T>`

Streaming JSON parser for real-time field extraction during LLM streaming.

import { IncrementalLoot } from 'loot-json';

interface ChatResponse {
  dialogue: string;
  emotion: string;
  pose: string;
}

const parser = new IncrementalLoot<ChatResponse>({
  fields: ['dialogue', 'emotion', 'pose'],
  onFieldComplete: (field, value) => {
    if (field === 'dialogue') {
      startTTS(value as string); // Start TTS immediately!
    }
  },
});

for await (const chunk of llmStream) {
  const result = parser.addChunk(chunk);
  
  if (result.isFieldComplete('dialogue')) {
    console.log('Dialogue ready:', result.getField('dialogue'));
  }
  
  if (result.isComplete()) {
    break;
  }
}

const finalResult = parser.getResult();

`validate<T>(data, schema)`

Validate data against a JSON Schema (subset of Draft-07).

import { validate } from 'loot-json';

const result = validate(data, {
  type: 'object',
  properties: {
    dialogue: { type: 'string', minLength: 1 },
    emotion: { type: 'string', enum: ['happy', 'sad', 'angry'] },
    affinity: { type: 'number', minimum: -10, maximum: 10 },
  },
  required: ['dialogue', 'emotion'],
});

if (result.valid) {
  console.log('Valid:', result.data);
} else {
  console.log('Errors:', result.errors);
}

Supported Schema Keywords

| Category | Keywords | |----------|----------| | Type | type (string, number, integer, boolean, object, array, null) | | String | minLength, maxLength, pattern, format | | Number | minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf | | Object | properties, required, additionalProperties | | Array | items, minItems, maxItems, uniqueItems | | Enum | enum, const |

`repairJson(text, trackRepairs?)`

Repair malformed JSON without extraction.

import { repairJson } from 'loot-json';

// Simple repair
const fixed = repairJson('{"key": "value",}');
// '{"key": "value"}'

// With repair tracking
const { text, repairs } = repairJson('{"key": "value",}', true);
console.log(repairs);
// [{ type: 'trailing_comma', ... }]

🔧 What It Fixes

| Issue | Example | Fixed | |-------|---------|-------| | Trailing commas | { "a": 1, } | { "a": 1 } | | Single quotes | { 'key': 'value' } | { "key": "value" } | | Comments | { "a": 1 // comment } | { "a": 1 } | | Unquoted keys | { key: "value" } | { "key": "value" } | | Invalid values | { "a": undefined } | { "a": null } | | Unescaped newlines | { "a": "line1\nline2" } | { "a": "line1\\nline2" } |

🏗️ Advanced Usage

Utility Functions

import { 
  findJsonCandidates, 
  extractFromMarkdown, 
  extractByBraces 
} from 'loot-json';

// Find all JSON candidates in text
const candidates = findJsonCandidates(messyText);

// Extract from markdown code blocks only
const fromMarkdown = extractFromMarkdown(text);

// Extract by balanced braces
const byBraces = extractByBraces(text);

📊 Use Cases

1. LLM Response Parsing

const llmResponse = await openai.chat.completions.create({ ... });
const data = loot<ResponseSchema>(llmResponse.choices[0].message.content);

2. Streaming with Early TTS

const parser = new IncrementalLoot({
  fields: ['dialogue'],
  onFieldComplete: (field, value) => {
    if (field === 'dialogue') {
      ttsEngine.speak(value); // Start speaking before full response!
    }
  },
});

for await (const chunk of stream) {
  parser.addChunk(chunk.choices[0].delta.content || '');
}

3. Structured Output Validation

const { result, repairs } = loot(llmOutput, { reportRepairs: true });

if (repairs.length > 0) {
  logger.warn('LLM output required repairs:', repairs);
}

const validated = validate(result, mySchema);
if (!validated.valid) {
  logger.error('Schema validation failed:', validated.errors);
}

📄 License

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme