@lakshyarohilaa/llmforge

v1.0.0

Published

4 months ago

Production-grade, framework-agnostic library for normalizing and transforming LLM responses.

Downloads

0High
0Medium
0Low

lakshyarohilaa

llm normalize openai anthropic gemini ai nlp text-processing json-extraction markdown

LLMForge

A production-grade, framework-agnostic JavaScript/TypeScript library for normalizing and transforming LLM responses.

LLMForge provides a comprehensive, extensible solution for cleaning, formatting, and extracting structured data from Large Language Model (LLM) outputs. It handles common annoyances like markdown wrapping, extra whitespace, emojis, PII, and malformed JSON.

🚀 Key Features

Zero dependencies for core functionality
Tree-shakeable modular design — only bundle what you use
TypeScript-first with full type safety and excellent IntelliSense
Dual package support (ESM and CommonJS)
Environment agnostic (Works in Node.js, Browsers, React, Vue, Svelte, etc.)
Chainable API for intuitive transformation workflows
Middleware architecture for easy extensibility
Async support for complex transformation pipelines
30+ transformation functions with comprehensive edge case handling

📦 Installation

npm install llmforge
# or
yarn add llmforge
# or
pnpm add llmforge

🛠️ Quick Start

import { normalize } from 'llmforge';

// Basic usage
const cleaned = normalize("  Hello 🌍!  ")
  .trim()
  .removeEmojis()
  .value();
// Output: "Hello !"

// Extract JSON from LLM response
const json = normalize('```json\n{"name": "John"}\n```').getJSON();
// Output: { name: "John" }

// Chain multiple transformations
const result = normalize("**Bold** text with https://example.com")
  .removeMD()
  .removeURLs()
  .trim()
  .value();
// Output: "Bold text with"

📖 Complete API Reference

Core Methods

| Method | Description | Returns | |--------|-------------|---------| | .value() | Returns the current normalized value | string | | .transform(fn) | Applies a custom transformation function | this | | .orDefault(fallback) | Returns fallback if value is empty | this |

Text Transformations (20+ Functions)

Whitespace Management

| Method | Description | Example | |--------|-------------|---------| | .trim() | Removes leading and trailing whitespace | " hello " → "hello" | | .collapseSpaces() | Collapses multiple spaces/tabs into one | "a b" → "a b" | | .normalizeLineBreaks() | Standardizes line endings to \n (handles \r\n, \r) | "a\r\nb" → "a\nb" | | .removeExtraNewlines() | Collapses multiple newlines into one | "a\n\n\nb" → "a\nb" |

Markdown & Code

| Method | Description | Example | |--------|-------------|---------| | .removeMD() | Strips markdown formatting (bold, italic, headers, links, code) | "**bold**" → "bold" | | .removeCodeBlocks() | Removes both fenced and inline code blocks | "text `code` more" → "text more" |

Direct Functions:

import * as textTransforms from 'llmforge';

textTransforms.extractCodeBlocks(text); // Returns string[] of code blocks

URLs & Citations

| Method | Description | Example | |--------|-------------|---------| | .removeURLs() | Removes HTTP/HTTPS URLs | "Check https://example.com" → "Check " | | .removeCitations() | Removes citations like [1], (Smith, 2020) | "Text [1]" → "Text " |

HTML

| Method | Description | Example | |--------|-------------|---------| | .stripHTML() | Removes all HTML tags | "<p>Hello</p>" → "Hello" | | .escapeHTML() | Escapes HTML special characters | "<div>" → "<div>" | | .unescapeHTML() | Unescapes HTML entities | "<div>" → "<div>" |

Case Transformations

| Method | Description | Example | |--------|-------------|---------| | .toTitleCase() | Capitalizes first letter of each word | "hello world" → "Hello World" | | .toSentenceCase() | Capitalizes only the first letter | "HELLO WORLD" → "Hello world" |

Punctuation

| Method | Description | Example | |--------|-------------|---------| | .normalizeQuotes() | Converts smart quotes to standard quotes | ""smart"" → "\"smart\"" | | .normalizePunctuation() | Normalizes excessive punctuation | "What!!!!" → "What!" |

Truncation & Splitting

| Method | Description | Example | |--------|-------------|---------| | .truncate(maxLength, ellipsis?) | Truncates text with optional ellipsis | truncate("Hello World", 8) → "Hello..." |

Direct Functions:

import * as textTransforms from 'llmforge';

textTransforms.splitIntoChunks(text, 100); // Returns string[] of chunks

Privacy & Security

| Method | Description | Example | |--------|-------------|---------| | .maskPII() | Masks emails, phones, SSN, credit cards | "[email protected]" → "[EMAIL]" | | .censor(words) | Censors specified words with asterisks | censor("bad word", ["bad"]) → "*** word" |

Emoji Transformations (4 Functions)

| Method | Description | Example | |--------|-------------|---------| | .removeEmojis() | Removes all emojis | "Hello 👍" → "Hello " | | .replaceEmojis(map) | Replaces emojis based on mapping | replaceEmojis("👍", {"👍": "like"}) → "like" | | .addEmojis(options) | Adds sentiment-based emojis | addEmojis("Great", {sentiment: "positive"}) → "Great 😊" | | .translateEmojis() | Translates emojis to text | "👍 ❤️" → "thumbs up heart" |

Supported emoji translations:

👍 → "thumbs up"
👎 → "thumbs down"
❤️ → "heart"
😊 → "smile"
😂 → "laugh"
😢 → "cry"
🔥 → "fire"
✅ → "check"
❌ → "cross"
⚠️ → "warning"
🎉 → "celebration"
💯 → "100"

Structure Transformations (7+ Functions)

JSON Handling

| Method | Description | Example | |--------|-------------|---------| | .extractJSON() | Extracts and stringifies JSON (chainable) | '{"key": "value"}' → '{"key":"value"}' | | .getJSON() | Extracts and returns parsed JSON (terminator) | '{"key": "value"}' → { key: "value" } |

Direct Functions:

import * as structureTransforms from 'llmforge';

// Validate JSON against schema
structureTransforms.validateJSON(json, { 
  required: ['name', 'email'],
  type: 'object'
});
// Returns: { isValid: boolean, errors: string[] }

JSON extraction handles:

JSON wrapped in markdown code blocks (```json ... ```)
JSON embedded in surrounding text
Nested JSON objects and arrays
Malformed JSON (returns null gracefully)

Lists

| Method | Description | Example | |--------|-------------|---------| | .normalizeLists() | Converts numbered lists to bullets | "1. First" → "• First" | | .convertBulletsToNumbers() | Converts bullets to numbered lists | "• First" → "1. First" |

Direct Functions:

import * as structureTransforms from 'llmforge';

// Extract list items as array
structureTransforms.extractListItems("• Item 1\n• Item 2");
// Returns: ["Item 1", "Item 2"]

// Extract XML tag contents
structureTransforms.extractXMLTags("<name>John</name>", "name");
// Returns: ["John"]

// Parse structured LLM output (Claude-style)
structureTransforms.parseStructuredOutput("<thinking>...</thinking><answer>42</answer>");
// Returns: { thinking: "...", answer: "42", json?: any }

🧩 Middleware & Extensibility

Create custom processing pipelines with the middleware system.

Basic Middleware Usage

import { createNormalizer, middleware } from 'llmforge';

const pipeline = createNormalizer([
  middleware.trimWhitespace(),
  middleware.removeEmojis(),
  middleware.collapseSpaces(),
  middleware.validateLength({ min: 1, max: 1000 })
]);

const result = pipeline.process("  Hello 🌍  ");
// Output: "Hello"

Built-in Middleware

| Middleware | Description | |------------|-------------| | middleware.trimWhitespace() | Trims leading/trailing whitespace | | middleware.removeEmojis() | Removes all emojis | | middleware.collapseSpaces() | Collapses multiple spaces | | middleware.validateLength(options) | Validates text length (throws on failure) | | middleware.custom(fn, name) | Creates custom middleware |

Custom Middleware

const uppercase = middleware.custom(
  (text) => text.toUpperCase(),
  "uppercase"
);

const normalizer = createNormalizer([uppercase]);
normalizer.process("hello"); // "HELLO"

Presets

Pre-configured middleware for common use cases:

import { presets, createNormalizer } from 'llmforge';

// Chat message normalization
const chatNormalizer = createNormalizer(presets.chatMessage);

// Email formatting (removes emojis, normalizes whitespace)
const emailNormalizer = createNormalizer(presets.emailFormat);

// API response cleaning
const apiNormalizer = createNormalizer(presets.apiResponse);

// Code extraction
const codeNormalizer = createNormalizer(presets.codeExtraction);

// Content moderation
const moderationNormalizer = createNormalizer(presets.contentModeration);

⚡ Async Support

Handle asynchronous transformations with AsyncNormalizer.

import { normalizeAsync } from 'llmforge';

const result = await normalizeAsync("  HELLO  ")
  .transform((text) => text.trim())
  .transform(async (text) => {
    // Simulate async operation (API call, etc.)
    return await Promise.resolve(text.toLowerCase());
  })
  .value();
// Output: "hello"

Async Middleware

import { createNormalizer, middleware } from 'llmforge';

const pipeline = createNormalizer([
  middleware.trimWhitespace(),
  middleware.custom(async (text) => {
    const response = await fetch('/api/process', {
      method: 'POST',
      body: text
    });
    return await response.text();
  }, 'apiProcess')
]);

const result = await pipeline.processAsync(text);

📚 Usage Examples

Extract JSON from LLM Response

import { normalize } from 'llmforge';

const llmResponse = `
Here's the user data:
\`\`\`json
{
  "name": "Alex",
  "email": "[email protected]",
  "age": 30
}
\`\`\`
`;

const userData = normalize(llmResponse).getJSON();
// Output: { name: "Alex", email: "[email protected]", age: 30 }

Clean Chat Messages

function cleanUserMessage(message: string): string {
  return normalize(message)
    .trim()
    .collapseSpaces()
    .normalizePunctuation()
    .truncate(500)
    .value();
}

cleanUserMessage("  Hello!!!   How are you????  ");
// Output: "Hello! How are you?"

Remove PII for Privacy

function sanitizeContent(text: string): string {
  return normalize(text)
    .maskPII()
    .removeURLs()
    .value();
}

sanitizeContent("Contact me at [email protected] or 555-123-4567");
// Output: "Contact me at [EMAIL] or [PHONE]"

Format Email from LLM

import { normalize, presets, createNormalizer } from 'llmforge';

const emailNormalizer = createNormalizer(presets.emailFormat);

function formatEmail(llmResponse: string): string {
  return emailNormalizer.process(llmResponse);
}

Extract Code Blocks

import { textTransforms } from 'llmforge';

const llmResponse = `
Here's the solution:
\`\`\`javascript
console.log('Hello');
\`\`\`
And here's another:
\`\`\`python
print('World')
\`\`\`
`;

const codeBlocks = textTransforms.extractCodeBlocks(llmResponse);
// Output: ["console.log('Hello');", "print('World')"]

Parse Structured Output (Claude-style)

import { structureTransforms } from 'llmforge';

const response = "<thinking>Let me analyze...</thinking><answer>42</answer>";
const parsed = structureTransforms.parseStructuredOutput(response);
// Output: { thinking: "Let me analyze...", answer: "42" }

🧪 Edge Case Handling

llmforge is battle-tested with 51 comprehensive tests covering:

✅ Empty strings - All functions handle empty input gracefully
✅ Very long strings - Tested with 10,000+ character strings
✅ Unicode support - Full support for emojis and international characters
✅ Malformed JSON - Returns null instead of throwing errors
✅ Special characters - Handles null bytes and control characters
✅ Nested structures - Supports deeply nested JSON and HTML
✅ Mixed content - Handles text with multiple formats simultaneously

📊 Direct Transform Functions

All transformations are also available as standalone functions for maximum flexibility:

import * as textTransforms from 'llmforge';
import * as emojiTransforms from 'llmforge';
import * as structureTransforms from 'llmforge';

// Use directly without chaining
const trimmed = textTransforms.trimWhitespace("  hello  ");
const json = structureTransforms.extractJSON('{"key": "value"}');
const noEmojis = emojiTransforms.removeEmojis("Hello 👍");

Complete Function List

Text Transforms:

trimWhitespace(text)
collapseSpaces(text)
normalizeLineBreaks(text)
removeExtraNewlines(text)
removeMDFormatting(text)
removeCodeBlocks(text)
extractCodeBlocks(text) → string[]
removeURLs(text)
removeCitations(text)
stripHTMLTags(text)
escapeHTML(text)
unescapeHTML(text)
toTitleCase(text)
toSentenceCase(text)
normalizeQuotes(text)
normalizePunctuation(text)
truncate(text, maxLength, ellipsis?)
splitIntoChunks(text, chunkSize) → string[]
maskPII(text)
censor(text, words[])

Emoji Transforms:

removeEmojis(text)
replaceEmojis(text, map)
addEmojis(text, options?)
translateEmojis(text)

Structure Transforms:

extractJSON(text) → any | null
validateJSON(json, schema) → {isValid, errors}
normalizeLists(text)
convertBulletsToNumbers(text)
extractListItems(text) → string[]
extractXMLTags(text, tagName) → string[]
parseStructuredOutput(text) → {thinking?, answer?, json?}

🎯 TypeScript Support

Full TypeScript support with comprehensive type definitions:

import { 
  Normalizer, 
  AsyncNormalizer,
  NormalizerOptions, 
  TransformFn,
  Middleware,
  ValidationResult 
} from 'llmforge';

const options: NormalizerOptions = {
  strictMode: true,
  onWarning: (warning) => console.warn(warning),
  onError: (error) => console.error(error),
};

const normalizer: Normalizer = normalize("text", options);

🚀 Performance

Lazy evaluation - Transformations execute only when .value() is called
Optimized regex - Efficient pattern matching for all operations
Zero dependencies - Minimal bundle size (<20KB gzipped)
Tree-shakeable - Import only what you need

🌐 Browser Support

Works in all modern browsers and Node.js 16+.

<!-- Via CDN -->
<script type="module">
  import { normalize } from 'https://unpkg.com/llmforge';
  
  const result = normalize("  text  ").trim().value();
</script>

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

LLMForge

🚀 Key Features

📦 Installation

🛠️ Quick Start

📖 Complete API Reference

Core Methods

Text Transformations (20+ Functions)

Whitespace Management

Markdown & Code

URLs & Citations

HTML

Case Transformations

Punctuation

Truncation & Splitting

Privacy & Security

Emoji Transformations (4 Functions)

Structure Transformations (7+ Functions)

JSON Handling

Lists

🧩 Middleware & Extensibility

Basic Middleware Usage

Built-in Middleware

Custom Middleware

Presets

⚡ Async Support

Async Middleware

📚 Usage Examples

Extract JSON from LLM Response

Clean Chat Messages

Remove PII for Privacy

Format Email from LLM

Extract Code Blocks

Parse Structured Output (Claude-style)

🧪 Edge Case Handling

📊 Direct Transform Functions

Complete Function List

🎯 TypeScript Support

🚀 Performance

🌐 Browser Support

📜 License

🤝 Contributing

📖 Documentation

🔗 Links