@tpmjs/tools-html-sanitize

v0.1.0

Published

15 days ago

Sanitize HTML to prevent XSS attacks using isomorphic-dompurify

Downloads

0High
0Medium
0Low

thomasdavis

tpmjs security html sanitize xss dompurify

@tpmjs/tools-html-sanitize

Sanitize HTML to prevent XSS attacks using isomorphic-dompurify.

Installation

npm install @tpmjs/tools-html-sanitize
# or
pnpm add @tpmjs/tools-html-sanitize
# or
yarn add @tpmjs/tools-html-sanitize

Usage

With Vercel AI SDK

import { htmlSanitizeTool } from '@tpmjs/tools-html-sanitize';
import { generateText } from 'ai';

const result = await generateText({
  model: yourModel,
  tools: {
    htmlSanitize: htmlSanitizeTool,
  },
  prompt: 'Sanitize this HTML to make it safe',
});

Direct Usage

import { htmlSanitizeTool } from '@tpmjs/tools-html-sanitize';

const result = await htmlSanitizeTool.execute({
  html: '<p>Safe content</p><script>alert("XSS")</script>',
});

console.log(result.sanitized);
// <p>Safe content</p>

console.log(result);
// {
//   sanitized: '<p>Safe content</p>',
//   removedCount: 1,
//   warnings: ['Removed script tags to prevent XSS']
// }

Features

XSS Prevention - Removes dangerous scripts and event handlers
Safe Defaults - Pre-configured with common safe HTML tags
Customizable - Configure allowed tags and attributes
Warnings - Reports what dangerous content was removed
Isomorphic - Works in Node.js and browser environments
Protocol Filtering - Removes javascript: and unsafe data: URLs

Parameters

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | html | string | Yes | The HTML string to sanitize | | options | SanitizeOptions | No | Configuration for allowed tags and attributes |

SanitizeOptions

{
  allowedTags?: string[];           // Array of allowed HTML tag names
  allowedAttributes?: Record<string, string[]>;  // Tag -> attributes mapping
}

Returns

{
  sanitized: string;      // The sanitized HTML
  removedCount: number;   // Number of elements removed
  warnings: string[];     // Descriptions of what was removed
}

Examples

Basic XSS Prevention

const result = await htmlSanitizeTool.execute({
  html: '<p onclick="alert(1)">Click me</p><script>alert("XSS")</script>',
});

console.log(result.sanitized);
// <p>Click me</p>

console.log(result.warnings);
// ['Removed inline event handlers (onclick, onerror, etc.)', 'Removed script tags to prevent XSS']

Custom Allowed Tags

const result = await htmlSanitizeTool.execute({
  html: '<p>Paragraph</p><div>Div</div><script>alert(1)</script>',
  options: {
    allowedTags: ['p'],  // Only allow <p> tags
  },
});

console.log(result.sanitized);
// <p>Paragraph</p>Div

Custom Allowed Attributes

const result = await htmlSanitizeTool.execute({
  html: '<a href="https://example.com" onclick="alert(1)" data-custom="value">Link</a>',
  options: {
    allowedTags: ['a'],
    allowedAttributes: {
      'a': ['href'],  // Only allow href attribute on <a> tags
    },
  },
});

console.log(result.sanitized);
// <a href="https://example.com">Link</a>

Remove Dangerous Protocols

const result = await htmlSanitizeTool.execute({
  html: '<a href="javascript:alert(1)">Click</a>',
});

console.log(result.sanitized);
// <a>Click</a>

console.log(result.warnings);
// ['Removed javascript: protocol from links']

Remove iframes and Embeds

const result = await htmlSanitizeTool.execute({
  html: '<p>Safe</p><iframe src="evil.com"></iframe><embed src="malware.swf">',
});

console.log(result.sanitized);
// <p>Safe</p>

console.log(result.warnings);
// ['Removed iframe tags', 'Removed object or embed tags']

Preserve Safe Images

const result = await htmlSanitizeTool.execute({
  html: '<img src="photo.jpg" alt="Photo" onerror="alert(1)">',
});

console.log(result.sanitized);
// <img src="photo.jpg" alt="Photo">

console.log(result.warnings);
// ['Removed inline event handlers (onclick, onerror, etc.)']

Complex HTML Sanitization

const result = await htmlSanitizeTool.execute({
  html: `
    <div class="container">
      <h1>Title</h1>
      <p>Safe paragraph</p>
      <script>alert("XSS")</script>
      <style>body { display: none; }</style>
      <a href="javascript:void(0)">Bad link</a>
      <a href="https://safe.com">Good link</a>
    </div>
  `,
});

console.log(result.sanitized);
// <div class="container">
//   <h1>Title</h1>
//   <p>Safe paragraph</p>
//   <a>Bad link</a>
//   <a href="https://safe.com">Good link</a>
// </div>

console.log(result.removedCount);
// 2

console.log(result.warnings);
// ['Removed script tags to prevent XSS', 'Removed javascript: protocol from links', 'Removed style tags']

Default Allowed Tags

['p', 'br', 'span', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
 'strong', 'em', 'b', 'i', 'u', 'ul', 'ol', 'li',
 'a', 'img', 'blockquote', 'code', 'pre']

Default Allowed Attributes

{
  'a': ['href', 'title', 'target'],
  'img': ['src', 'alt', 'title', 'width', 'height'],
  '*': ['class', 'id']  // Allowed on all tags
}

Security Features

| Feature | Description | |---------|-------------| | Script removal | Removes <script> tags | | Event handler removal | Removes onclick, onerror, etc. | | Protocol filtering | Blocks javascript:, unsafe data: | | iframe removal | Removes <iframe> by default | | Object/embed removal | Removes <object> and <embed> | | Style removal | Removes <style> tags by default |

Common Use Cases

Sanitize User-Generated Content

const userComment = '<p>Great post!</p><script>stealCookies()</script>';
const result = await htmlSanitizeTool.execute({ html: userComment });
// Safe to display: <p>Great post!</p>

Allow Only Text Formatting

const result = await htmlSanitizeTool.execute({
  html: richTextEditorContent,
  options: {
    allowedTags: ['p', 'br', 'strong', 'em', 'u'],
    allowedAttributes: {},
  },
});

Preserve Links with Validation

const result = await htmlSanitizeTool.execute({
  html: markdownConverted,
  options: {
    allowedTags: ['p', 'a', 'strong', 'em'],
    allowedAttributes: {
      'a': ['href', 'title'],
    },
  },
});

Error Handling

try {
  const result = await htmlSanitizeTool.execute({
    html: null,  // Invalid input
  });
} catch (error) {
  console.error(error.message);
  // "HTML input must be a string"
}

Best Practices

Use Default Settings - The defaults are secure for most use cases
Whitelist, Don't Blacklist - Only allow known-safe tags and attributes
Check Warnings - Review warnings to understand what was removed
Validate Context - Different contexts may need different allowed tags
Defense in Depth - Combine with Content Security Policy (CSP)

Limitations

Does not validate HTML syntax errors
Does not check link destinations (only protocols)
Does not sanitize CSS within style attributes
May remove legitimate content if too restrictive

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@tpmjs/tools-html-sanitize

Installation

Usage

With Vercel AI SDK

Direct Usage

Features

Parameters

SanitizeOptions

Returns

Examples

Basic XSS Prevention

Custom Allowed Tags

Custom Allowed Attributes

Remove Dangerous Protocols

Remove iframes and Embeds

Preserve Safe Images

Complex HTML Sanitization

Default Allowed Tags

Default Allowed Attributes

Security Features

Common Use Cases

Sanitize User-Generated Content

Allow Only Text Formatting

Preserve Links with Validation

Error Handling

Best Practices

Limitations

License