harden-urls

v1.0.0

Published

5 months ago

🛡️ URL hardening and sanitization utilities with safe defaults, pattern-based allow/block lists, and cleanup for links and mailto URIs.

Downloads

harden-urls 🛡️ Core URL Sanitizer Utilities

npm bundle size

harden-urls Banner

The robust, protocol-aware, and dependency-free URL sanitizer for secure markdown and user-generated web content.

🎯 Why `harden-urls`? The Security Gap

Most URL sanitization techniques (like basic URL prefix checks) are insufficient for preventing sophisticated attacks in modern web environments, especially when rendering user-submitted or ai-generated (e.g., compromised due to prompt poisoning) content or markdown.

Libraries like Vercel Lab's markdown-sanitizers are a great start, but they often ignore deep-seated vulnerabilities:

❌ Protocol Evasion: Using encoded control characters (%0A or \u200B) to hide malicious protocols like javascript: or data:.
❌ Homoglyph & Unicode Spoofing: Using visually similar Unicode characters to bypass domain allow-lists.
❌ Tracking & Phishing: Failure to clean up unsafe or excessive tracking parameters (e.g., in mailto: links).
❌ Prefix Boundary Confusion: Simple prefix-matching libraries may trust a URL if it starts with an allowed domain, but fail to detect malicious subdomains or path components: // If allowed prefix is 'https://allowed-prefix.com' // ⚠️ Prefix-based libraries often allow this malicious URL: https://allowed-prefix.com.evil.com/bypass.js Reason: These libraries only check the starting string, missing the critical step of parsing the hostname to confirm the actual origin.

👉 harden-urls bridges this gap by offering a multi-layered defense as the foundation for secure URL processing.

✨ Features at a Glance

🚀 Installation & Setup

pnpm add harden-urls # Recommended

npm install harden-urls

yarn add harden-urls

Basic Usage

Use createUrlSanitizer to pre-configure your security rules for optimal performance.

import { createUrlSanitizer, toRegexps } from "harden-urls";

// Helper to convert simple strings/patterns into robust regexes
const trustedDomains = toRegexps([
  "*.mycorp.com", // Allow subdomains
  "partner-api.io", // Exact domain match
]);

const sanitizer = createUrlSanitizer({
  // Only allow HTTPS and Mailto protocols
  allowedProtocols: ["https:", "mailto:"],

  // Allow URLs matching our specific domain patterns
  allowedPatterns: trustedDomains,

  // Automatically remove common tracking params from all URLs
  stripParams: ["utm_", "fbclid", "gclid"],

  // OPTIONAL: Configure for specific edge cases
  allowSchemaRelative: true, // Allow //example.com (defaults to false)
  // blockPathTraversal defaults to true, blocking /../
});

// Example 1: Clean and safe
sanitizer(
  "[https://sub.mycorp.com/docs?utm_source=email](https://sub.mycorp.com/docs?utm_source=email)"
);
// → "[https://sub.mycorp.com/docs](https://sub.mycorp.com/docs)" (tracking param stripped)

// Example 2: Blocked by protocol
sanitizer("ftp://insecure.net/file");
// → null (ftp: is not in allowedProtocols)

// Example 3: Blocked by pattern
sanitizer("[https://evil-tracking.com/path](https://evil-tracking.com/path)");
// → null (does not match allowedPatterns)

⚙️ API Reference

`sanitizeUrl(url: string, options?: SanitizeOptions): string | null`

The core function. Cleans the input URL and returns the sanitized string, or null if the URL is blocked by any rule.

`isSafeUrl(url: string, options?: SanitizeOptions): boolean`

A boolean check. Returns true if sanitizeUrl would return a non-null string.

`createUrlSanitizer(options?: SanitizeOptions): (url: string) => string | null`

The preferred method. Returns a pre-configured sanitizer function for performance and cleaner code.

`SanitizeOptions` (Key Security Flags)

🔒 Best Practices and Gotchas

1. Always Use `allowedProtocols`

The protocol whitelist is your most important defense. If you need to allow data: URLs, ensure your pattern list tightly restricts the media type (e.g., data:image/png).

2. Check Global Flags (`/g`)

Gotcha: If you manually construct a RegExp for patterns, do not use the global (/g) flag. The test() method with /g maintains state, which can lead to security checks being accidentally skipped. Use helpers like toRegexps to avoid this.

3. `harden-urls` is Not an HTML Sanitizer

Crucial: This library only sanitizes the URL attribute value. It does not protect against arbitrary HTML or script tags within the content itself.

If rendering markdown, you must use a comprehensive HTML sanitizer like rehype-sanitize alongside this library if you allow any kind of raw HTML or components.

🤝 Contribution

We welcome contributions! If you find a security vulnerability, have a feature request, or want to fix a bug, please:

Open an Issue: Discuss the change you wish to make. Security bugs should be reported privately first, if possible.
Submit a Pull Request: Ensure your code passes all tests and is formatted correctly. New features require tests!

This project is maintained by a small team and relies on community feedback and contributions. Thank you!

Comparison Table

| Library | Safe Defaults | Pattern-Based | Param Cleanup | Protocol Config | Unicode Hardening | | :---------------------------------- | :--------------------------- | :------------ | :------------ | :-------------- | :---------------- | | harden-urls | ✅ Comprehensive | ✅ Yes | ✅ Yes | ✅ Granular | ✅ NFKC | | vercel-labs/markdown-sanitizers | ⚠️ Partial (URL prefix only) | ❌ | ❌ | ⚠️ Limited | ❌ |

⚠️ Important: Neither harden-urls nor Vercel’s harden-react-markdown protects against arbitrary HTML — you must use rehype-sanitize if using rehype-raw.

Related Packages

rehype-harden-urls — plug into markdown pipelines
harden-react-markdown-urls — secure React wrapper

Acknowledgments

Grateful to:

They laid the groundwork — this package aims to bridge the gap between safety and flexibility.

License

This library is licensed under the MIT open-source license.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme