npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

remark-math-sanitizer

v2.0.3

Published

Pre-processing pipeline that fixes the most common ways LLM output breaks remark-math/KaTeX

Readme

remark-math-sanitizer

A pre-processing pipeline that fixes the most common ways LLM output breaks remark-math / KaTeX rendering.

npm version license


The problem

LLMs produce markdown with math that consistently breaks standard remark-math in five ways:

| # | Failure mode | Example input | What breaks | |---|---|---|---| | 1 | Currency before math | Cost $50 then $E=mc^2$ done. | The $ on $50 steals the opening delimiter of $E=mc^2$, leaving a dangling $ — KaTeX errors out | | 2 | Garbled prose in $...$ | displacement is $7.2 m at 33.7° above the positive $x$ | KaTeX renders each English word as spaced italic characters | | 3 | Bare LaTeX environments | \begin{equation}E=mc^2\end{equation} | remark-math ignores environments without surrounding $$ delimiters | | 4 | % inside math | $50%$ complete | KaTeX treats % as a comment and silently drops everything after it | | 5 | Unicode in math spans | $\alpha" + 1$ | KaTeX strict-mode errors on smart quotes, em-dashes, etc. |

None of these are fixable by upgrading remark-math or switching to remark-math-extended — they happen in the text before it reaches the parser.


Install

Install this package and its peer dependencies:

npm install remark-math-sanitizer react-markdown remark-math rehype-katex katex

No runtime dependencies of its own. ESM only ("type": "module").


Usage

Basic — ReactMarkdown

The most common setup. Call sanitizeLatexContent on the LLM output before passing it to ReactMarkdown, then load the KaTeX stylesheet once in your app shell.

// app/layout.tsx (or _app.tsx / index.html)
import 'katex/dist/katex.min.css';
// components/ChatMessage.tsx
'use client';

import ReactMarkdown from 'react-markdown';
import remarkMath from 'remark-math';
import rehypeKatex from 'rehype-katex';
import { sanitizeLatexContent } from 'remark-math-sanitizer';

interface ChatMessageProps {
  content: string;
}

export function ChatMessage({ content }: ChatMessageProps) {
  return (
    <ReactMarkdown
      remarkPlugins={[remarkMath]}
      rehypePlugins={[rehypeKatex]}
    >
      {sanitizeLatexContent(content)}
    </ReactMarkdown>
  );
}

Streaming (LLM token-by-token)

sanitizeLatexContent is a pure synchronous function with no state, so you can safely call it on the accumulated string at every token. React will only re-render changed nodes.

'use client';

import { useState } from 'react';
import ReactMarkdown from 'react-markdown';
import remarkMath from 'remark-math';
import rehypeKatex from 'rehype-katex';
import { sanitizeLatexContent } from 'remark-math-sanitizer';

export function StreamingMessage() {
  const [raw, setRaw] = useState('');

  async function startStream() {
    const res = await fetch('/api/chat', { method: 'POST' });
    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
    let accumulated = '';

    while (true) {
      const { value, done } = await reader.read();
      if (done) break;
      accumulated += decoder.decode(value, { stream: true });
      setRaw(accumulated);          // store raw; sanitize at render time
    }
  }

  return (
    <>
      <button onClick={startStream}>Ask</button>
      <ReactMarkdown
        remarkPlugins={[remarkMath]}
        rehypePlugins={[rehypeKatex]}
      >
        {sanitizeLatexContent(raw)}  {/* called on every render */}
      </ReactMarkdown>
    </>
  );
}

Tip: If you memoize the sanitized output, key the memo on the raw string:

const clean = useMemo(() => sanitizeLatexContent(raw), [raw]);

Next.js App Router

Load the KaTeX stylesheet in your root layout and use a Client Component for the message renderer (ReactMarkdown needs browser APIs).

// app/layout.tsx
import 'katex/dist/katex.min.css';

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html lang="en">
      <body>{children}</body>
    </html>
  );
}
// components/message.tsx
'use client';

import ReactMarkdown from 'react-markdown';
import remarkMath from 'remark-math';
import rehypeKatex from 'rehype-katex';
import { sanitizeLatexContent } from 'remark-math-sanitizer';

export function Message({ content }: { content: string }) {
  return (
    <ReactMarkdown
      remarkPlugins={[remarkMath]}
      rehypePlugins={[rehypeKatex]}
    >
      {sanitizeLatexContent(content)}
    </ReactMarkdown>
  );
}

Without React — unified / remark pipeline

Works with any unified-based pipeline (Astro, Vite, Node.js scripts, etc.):

import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkMath from 'remark-math';
import remarkRehype from 'remark-rehype';
import rehypeKatex from 'rehype-katex';
import rehypeStringify from 'rehype-stringify';
import { sanitizeLatexContent } from 'remark-math-sanitizer';

const processor = unified()
  .use(remarkParse)
  .use(remarkMath)
  .use(remarkRehype)
  .use(rehypeKatex)
  .use(rehypeStringify);

const html = String(await processor.process(sanitizeLatexContent(llmOutput)));

Reducing sanitization with the system prompt

Add LATEX_FORMATTING_GUIDELINES to your LLM system prompt to instruct the model to emit well-formed LaTeX from the start:

import { LATEX_FORMATTING_GUIDELINES } from 'remark-math-sanitizer';

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'system',
      content: `You are a helpful assistant.\n\n${LATEX_FORMATTING_GUIDELINES}`,
    },
    { role: 'user', content: userMessage },
  ],
});

This tells the model to use $…$ / $$…$$ delimiters, avoid mixing currency and math on the same line, and use standard LaTeX command names — reducing how much the sanitizer needs to fix at render time.


Examples

A complete, runnable sample client lives in examples/demo. It pipes the five failure-mode inputs through a real unifiedremark-mathrehype-katex pipeline and asserts both the sanitized string and the rendered HTML.

# from the repo root, build once so the demo's file:../.. dep resolves
npm install
npm run build

cd examples/demo
npm install
npm test          # automated PASS/FAIL checks
npm run render    # writes output.html for visual side-by-side comparison

Example inputs and sanitized outputs

| # | Failure mode | Raw input | Sanitized output | |---|---|---|---| | 1 | Currency before math | Cost $50 then $E=mc^2$ done. | Cost &#36;50 then $E=mc^2$ done. | | 2 | Garbled prose in $…$ | The displacement is $7.2 m at 33.7° above the positive $x$ direction. | The displacement is &#36;7.2 m at 33.7° above the positive &#36;x$ direction. | | 3 | Bare LaTeX environment | \begin{equation}E=mc^2\end{equation} | $$\n\begin{equation}E=mc^2\end{equation}\n$$ | | 4 | % inside math | We are $50\%$ complete. | We are $50\%$ complete. (preserved — already escaped) | | 5 | Unicode in math spans | Let $\alpha” + 1$ be defined. | Let $\alpha" + 1$ be defined. (smart quote → ASCII) |

2.0 default — entity escaping. Currency dollar signs are escaped as the HTML character reference &#36; rather than \$. Entities are tokenised separately from math delimiters by every CommonMark-conformant parser and survive any plugin order, custom transformer, or middleware that might un-escape backslash sequences before math parsing. Pass { currencyEscape: 'backslash' } to opt back into the 1.x output style — see Options below.

What each case proves

  • Case 1 — the $ on $50 is escaped to &#36;, so the opening $ of $E=mc^2$ is no longer stolen by remark-math; KaTeX renders the equation correctly.
  • Case 2 — both stray $ are replaced with &#36;, so KaTeX never sees the garbled span. The rendered HTML contains no class="katex" for this paragraph.
  • Case 3 — the bare environment is wrapped in $$…$$, producing a katex-display block.
  • Case 4 — the explicit \% survives the pipeline; KaTeX renders 50%.
  • Case 5 — the smart right-double-quote (\u201D) inside the math span is replaced with ASCII ", avoiding a KaTeX strict-mode error.

The demo also runs sanity checks on the smaller exported helpers (containsMathExpressions, normalizeLatexDelimiters, wrapBareLatexEnvironments).


Exports

| Export | Description | |---|---| | sanitizeLatexContent(str, options?) | Main function. Runs the full 12-step pipeline. | | wrapBareLatexEnvironments(str) | Wraps \begin{equation}…\end{equation} (and other display environments) in $$…$$. | | stripCurrencyDollarBeforeMathResult(str) | Collapses $calc = $RESULT$ (e.g. $15(18) + 5(22) = $380$) into a single math span by removing the spurious $ before the result. | | fixAdjacentInlineAndDisplayMath(str) | Inserts \n\n between $inline$ and an immediately following $$display$$ so remark-math parses each correctly (otherwise \begin{cases}… bodies render verbatim). | | escapeGarbledInlineMath(str) | Detects and escapes $…$ spans that contain prose rather than LaTeX. | | escapeCurrencyDollars(str) | Escapes $50, $5M, $4.0T etc. so they are not parsed as math. | | escapeCurrencyRanges(str) | Escapes both $ in ranges like $5–$10. | | escapeMathPercent(str) | Escapes % inside $…$ so KaTeX doesn't treat it as a comment. | | sanitizeMathUnicode(str) | Replaces smart quotes and Unicode dashes inside math spans with ASCII. | | normalizeLatexDelimiters(str) | Converts \(…\)$…$ and \[…\]$$…$$. | | containsMathExpressions(str) | Returns true if the string contains any math expression. | | escapeLatexSpecialChars(str) | Escapes standalone $ followed by whitespace. | | LATEX_FORMATTING_GUIDELINES | System-prompt snippet instructing LLMs to emit well-formed LaTeX. |


Options

All escape helpers and sanitizeLatexContent accept an optional second argument:

interface SanitizeOptions {
  /**
   * How currency dollar signs are escaped so remark-math does not pair them.
   *
   * - `'entity'`     (default)  emit `&#36;`. Survives any plugin order or
   *                              middleware that might un-escape backslashes.
   * - `'backslash'`              emit `\$`. 1.x behaviour. Use only if your
   *                              downstream renderer doesn't decode HTML
   *                              entities (rare).
   */
  currencyEscape?: 'entity' | 'backslash';
}
import { sanitizeLatexContent } from 'remark-math-sanitizer';

// Default — entity escaping (recommended)
sanitizeLatexContent('Cost $50 then $E=mc^2$ done.');
// → 'Cost &#36;50 then $E=mc^2$ done.'

// Opt-in 1.x backslash escaping
sanitizeLatexContent('Cost $50 then $E=mc^2$ done.', { currencyEscape: 'backslash' });
// → 'Cost \\$50 then $E=mc^2$ done.'

Migrating from 1.x

The only breaking change in 2.0 is the default escape style. If you assert on literal substrings of sanitized output (e.g. in tests), either:

  1. Update assertions from \$ to &#36;, or
  2. Pass { currencyEscape: 'backslash' } everywhere to preserve old output.

The rendered HTML is identical in both modes — &#36; and \$ both decode to a literal $ character in the final DOM.


Pipeline diagram

sanitizeLatexContent runs these steps in order:

LLM output
    │
    ▼
0.  wrapBareLatexEnvironments
    └─ \begin{equation}…\end{equation} → $$\n\begin{equation}…\end{equation}\n$$
    │
    ▼
0c. stripCurrencyDollarBeforeMathResult
    └─ $calc = $RESULT$  →  $calc = RESULT$
       Required pattern: parenthesised group + `=` + `$<digits>$`.
       e.g. `$15(18) + 5(22) = $380$` → `$15(18) + 5(22) = 380$`
       (KaTeX then renders the entire calculation as one valid span).
    │
    ▼
1.  PROTECT real math spans
    └─ 1a. $$…$$ display math (atomic, left-to-right)
    └─ 1b. $…$ inline math, paired by consecutive-position scan that
          PREFERS math-token-containing inner over lazy left-to-right
          pairing — correctly identifies $E=mc^2$ in
          "Cost $50 then formula $E=mc^2$ done."
    │
    ▼
2.  escapeGarbledInlineMath   (on non-protected content)
    └─ prose/CJK/bold inside $…$ → \$…\$
    │
    ▼
3.  escapeMathPercent          (on non-protected content)
4.  escapeCurrencyRanges       (safe: real math is shielded)
5.  escapeCurrencyDollars      (safe: real math is shielded)
    │
    ▼
6.  RESTORE protected spans
    └─ \0MATHn\0 → original $…$
    │
    ▼
7.  escapeGarbledInlineMath   (second pass — catches protected-but-garbled spans,
    │                          e.g. $7.2 m at 33.7^\circ above the positive $)
    ▼
8.  normalizeLatexDelimiters   \(…\) → $…$   \[…\] → $$…$$
9.  escapeMathPercent          (second pass — catches % in newly-created spans)
10. sanitizeMathUnicode        (replace Unicode in all math spans)
    │
    ▼
11. fixAdjacentInlineAndDisplayMath
    └─ `$inline$ $$display$$` (same line) → `$inline$\n\n$$display$$`
       remark-math otherwise treats the display block as raw text and
       `\begin{cases}…\end{cases}` renders verbatim.
    │
    ▼
  sanitized output  →  ReactMarkdown + remarkMath + rehypeKatex

The double-pass on escapeGarbledInlineMath (steps 2 and 7) is necessary because step 1 must protect spans that look like math (contain ^ or _) before currency escaping runs, but some of those spans turn out to be physics prose (e.g. $7.2 m at 33.7^\circ above the positive $). Step 7 catches those after restoration.


Comparison with remark-math

| Capability | remark-math | remark-math-sanitizer | |---|:---:|:---:| | Parse $…$ / $$…$$ as math | ✅ | — (still uses remark-math for parsing) | | Recognise \(…\) / \[…\] delimiters | ❌ | ✅ (normalises to $…$ first) | | Fix currency-before-math parity bug | ❌ | ✅ | | Detect garbled prose in $…$ | ❌ | ✅ | | Wrap bare \begin{equation} | ❌ | ✅ | | Escape % inside math | ❌ | ✅ | | Sanitize Unicode in math spans | ❌ | ✅ | | No runtime dependencies | ✅ | ✅ |

remark-math-sanitizer is a pre-processor, not a replacement for remark-math. You still need remark-math + rehype-katex in your render stack — this library just makes sure the text fed to them is correct.


Using the system-prompt snippet

import { LATEX_FORMATTING_GUIDELINES } from 'remark-math-sanitizer';

const systemPrompt = `You are a helpful assistant.\n\n${LATEX_FORMATTING_GUIDELINES}`;

This instructs the LLM to use $…$ / $$…$$ delimiters, avoid mixing currency and math on the same line, and use proper LaTeX command names — reducing the amount of sanitization needed at render time.


Contributing

git clone https://github.com/arunrao/remark-math-sanitizer
cd remark-math-sanitizer
npm install
npm test          # run tests with vitest
npm run build     # compile to dist/

All new heuristics in escapeGarbledInlineMath must include a failing test case that demonstrates the real-world LLM output being fixed, and a passing test case confirming the nearest valid math expression is still preserved.


License

MIT © Arun Rao