npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

unllm

v2.1.0

Published

Clean LLM output to keyboard-printable text

Downloads

513

Readme

unllm

npm version Test License: MIT

Convert LLM output to clean, human-like text by removing AI artifacts and normalizing typography.

import { clean } from 'unllm';

const llmOutput = "Hey there! 👋 This\u00A0message uses\u2014fancy chars\u2026 🚀";
const result = clean(llmOutput);
// → "Hey there! 👋 This message uses-fancy chars... 🚀"

Why?

LLMs (ChatGPT, Claude, etc.) often generate text with problematic Unicode characters that make output look artificial:

  • Control characters: NULL (\u0000), invisible formatting marks
  • Typographic Unicode: Em dashes (\u2014), fancy spaces (\u00A0), ellipsis
  • Invisible chars: Zero-width spaces, byte order marks (BOM), direction marks

This library normalizes LLM output to look natural while preserving emojis, quotes, and international text (Arabic, Chinese, Cyrillic, etc.).

What it does

| Input | Output | Type | |-------|--------|------| | "Hello\u0000World" | "HelloWorld" | Removes NULL | | "Hello\u00A0World" | "Hello World" | NBSP → space | | "foo\u2014bar" | "foo-bar" | Em dash → hyphen (opt-in) | | "Wait\u2026" | "Wait..." | Ellipsis → dots (opt-in) | | "He said "Hi"" | "He said 'Hi'" | Smart quotes → ' (opt-in) | | "Hi 👋 مرحبا" | "Hi 👋 مرحبا" | Preserves emojis & international text |

Installation

npm install unllm
# or
pnpm add unllm
# or
bun add unllm

API

clean(text: string, options?: CleanOptions): string

Removes LLM artifacts and normalizes typography to clean, human-like text.

Options:

interface CleanOptions {
  invisible?: boolean;        // Remove control/invisible chars (default: true)
  spaces?: boolean;           // Normalize Unicode spaces (default: true)
  dashes?: boolean;           // Normalize em/en dashes (default: false)
  ellipsis?: boolean;         // Normalize ellipsis (default: false)
  quotes?: boolean | string;  // Normalize smart quotes (default: false)
                              // true = normalize to ', string = normalize to that char
}

What it preserves:

  • Emojis (including multi-part with ZWJ: 👨‍👩‍👧‍👦)
  • International text (Arabic, Chinese, Cyrillic, etc.)
  • Line breaks and tabs
  • Regular punctuation and symbols

Examples:

import { clean } from 'unllm';

// Basic usage (invisible + spaces only)
clean("Hello\u00A0World");
// → "Hello World"

// Enable all normalizations
clean("Text\u0000\u00A0\u2014test\u2026", {
  invisible: true,
  spaces: true,
  dashes: true,
  ellipsis: true,
  quotes: true
});
// → "Text -test..."

// Normalize smart quotes to single quote
clean("He said \u201CHello\u201D", { quotes: true });
// → "He said 'Hello'"

// Normalize smart quotes to double quote
clean("He said \u201CHello\u201D", { quotes: '"' });
// → 'He said "Hello"'

// Disable everything (pass-through)
clean("Keep\u00A0all\u2014chars", {
  invisible: false,
  spaces: false
});
// → "Keep\u00A0all\u2014chars"

// Preserves international text
clean("C'est génial\u00A0!");
// → "C'est génial !"

inspect(text: string, options?: CleanOptions): Issue[]

Analyzes text and returns array of issues found. Uses the same options as clean().

Returns:

interface Issue {
  char: string;        // The problematic character
  code: number;        // Unicode code point
  hex: string;         // Hex representation (e.g., "U+00A0")
  position: number;    // Position in string
  type: 'control' | 'invisible' | 'typography';
  name: string;        // Human-readable name
}

Usage:

import { inspect } from 'unllm';

const issues = inspect("Hello\u00A0World");

console.log(issues);
// [
//   {
//     char: '\u00A0',
//     code: 160,
//     hex: 'U+00A0',
//     position: 5,
//     type: 'typography',
//     name: 'NO-BREAK SPACE'
//   }
// ]

// Detect smart quotes (disabled by default)
const quoteIssues = inspect("He said \u201CHello\u201D", { quotes: true });
// → 2 issues: LEFT/RIGHT DOUBLE QUOTATION MARK

// Quick check
if (issues.length > 0) {
  const cleaned = clean("Hello\u00A0World");
}

Use Cases

  • LLM output normalization: Clean ChatGPT/Claude responses for consistent formatting
  • Translation quality: Normalize AI-translated text to remove artifacts
  • Database storage: Ensure clean text before storing LLM output
  • API responses: Remove problematic characters that break JSON/XML
  • Content moderation: Detect and fix LLM-generated formatting issues
  • Text comparison: Normalize before diffing or deduplication

Character Categories

Control Characters (removed)

  • NULL (\u0000)
  • Other C0/C1 control characters
  • Backspace, vertical tab, form feed, etc.

Invisible Characters (removed)

  • Zero-width space (\u200B)
  • Zero-width non-joiner (\u200C)
  • Left-to-right/right-to-left marks
  • Word joiner, invisible operators
  • Byte order mark (BOM) (\uFEFF)

Typography (normalized)

  • Unicode spaces: NBSP (\u00A0), em space, en space, etc. → regular space
  • Dashes: em dash (\u2014), en dash (\u2013), minus (\u2212) → -
  • Ellipsis: \u2026...
  • Soft hyphen: \u00AD → removed
  • Smart quotes (opt-in): " " ' ' « » etc. → ' or custom char

Design Principles

  • Simple API: Just two functions (clean and inspect)
  • Zero configuration: Works out of the box with sensible defaults
  • International-friendly: Preserves all legitimate text (Arabic, Chinese, etc.)
  • Emoji-aware: Intelligently handles complex emoji sequences
  • Zero dependencies: Lightweight and secure
  • Type-safe: Full TypeScript support

License

MIT © Teimur Gasanov