harden-urls
v1.0.0
Published
🛡️ URL hardening and sanitization utilities with safe defaults, pattern-based allow/block lists, and cleanup for links and mailto URIs.
Downloads
36
Maintainers
Keywords
Readme
harden-urls 🛡️ Core URL Sanitizer Utilities

The robust, protocol-aware, and dependency-free URL sanitizer for secure markdown and user-generated web content.
🎯 Why harden-urls? The Security Gap
Most URL sanitization techniques (like basic URL prefix checks) are insufficient for preventing sophisticated attacks in modern web environments, especially when rendering user-submitted or ai-generated (e.g., compromised due to prompt poisoning) content or markdown.
Libraries like Vercel Lab's markdown-sanitizers are a great start, but they often ignore deep-seated vulnerabilities:
- ❌ Protocol Evasion: Using encoded control characters (
%0Aor\u200B) to hide malicious protocols likejavascript:ordata:. - ❌ Homoglyph & Unicode Spoofing: Using visually similar Unicode characters to bypass domain allow-lists.
- ❌ Tracking & Phishing: Failure to clean up unsafe or excessive tracking parameters (e.g., in
mailto:links). - ❌ Prefix Boundary Confusion: Simple prefix-matching libraries may trust a URL if it starts with an allowed domain, but fail to detect malicious subdomains or path components: // If allowed prefix is 'https://allowed-prefix.com' // ⚠️ Prefix-based libraries often allow this malicious URL: https://allowed-prefix.com.evil.com/bypass.js Reason: These libraries only check the starting string, missing the critical step of parsing the hostname to confirm the actual origin.
👉 harden-urls bridges this gap by offering a multi-layered defense as the foundation for secure URL processing.
✨ Features at a Glance
| Feature | Description | Security Posture |
| :--------------------------- | :--------------------------------------------------------------------------------------------- | :---------------------- |
| Protocol-First | Only protocols in the safeProtocols list (e.g., https:, mailto:) are allowed by default. | Sane Defaults |
| Pattern-Based Filtering | Allows granular control via allowedPatterns (domains/paths) and blockedPatterns. | Explicit Opt-In |
| Query Param Cleaning | Strips common tracking/malicious parameters (utm_, body, subject, etc.) automatically. | Defense-in-Depth |
| Unicode Hardening (NFKC) | Normalizes input and strips control/zero-width characters to mitigate obfuscation attacks. | Obfuscation Defense |
| Minimal & Typed | Zero dependencies, highly performant, and built 100% in TypeScript. | Reliability |
🚀 Installation & Setup
pnpm add harden-urls # Recommendedor
npm install harden-urlsor
yarn add harden-urlsBasic Usage
Use createUrlSanitizer to pre-configure your security rules for optimal performance.
import { createUrlSanitizer, toRegexps } from "harden-urls";
// Helper to convert simple strings/patterns into robust regexes
const trustedDomains = toRegexps([
"*.mycorp.com", // Allow subdomains
"partner-api.io", // Exact domain match
]);
const sanitizer = createUrlSanitizer({
// Only allow HTTPS and Mailto protocols
allowedProtocols: ["https:", "mailto:"],
// Allow URLs matching our specific domain patterns
allowedPatterns: trustedDomains,
// Automatically remove common tracking params from all URLs
stripParams: ["utm_", "fbclid", "gclid"],
// OPTIONAL: Configure for specific edge cases
allowSchemaRelative: true, // Allow //example.com (defaults to false)
// blockPathTraversal defaults to true, blocking /../
});
// Example 1: Clean and safe
sanitizer(
"[https://sub.mycorp.com/docs?utm_source=email](https://sub.mycorp.com/docs?utm_source=email)"
);
// → "[https://sub.mycorp.com/docs](https://sub.mycorp.com/docs)" (tracking param stripped)
// Example 2: Blocked by protocol
sanitizer("ftp://insecure.net/file");
// → null (ftp: is not in allowedProtocols)
// Example 3: Blocked by pattern
sanitizer("[https://evil-tracking.com/path](https://evil-tracking.com/path)");
// → null (does not match allowedPatterns)⚙️ API Reference
sanitizeUrl(url: string, options?: SanitizeOptions): string | null
The core function. Cleans the input URL and returns the sanitized string, or null if the URL is blocked by any rule.
isSafeUrl(url: string, options?: SanitizeOptions): boolean
A boolean check. Returns true if sanitizeUrl would return a non-null string.
createUrlSanitizer(options?: SanitizeOptions): (url: string) => string | null
The preferred method. Returns a pre-configured sanitizer function for performance and cleaner code.
SanitizeOptions (Key Security Flags)
| Option | Type | Default | Description |
| :-------------------- | :--------- | :------------------------- | :----------------------------------------------------------- |
| allowedProtocols | string[] | ['https:', 'http:', ...] | Protocols permitted. Primary security control. |
| allowedPatterns | RegExp[] | [] | Only URLs matching these patterns are allowed (if provided). |
| blockedPatterns | RegExp[] | [] | URLs matching these are blocked immediately. |
| stripParams | string[] | ['utm_', 'fbclid', ...] | Query parameter names/prefixes to strip. |
| allowSchemaRelative | boolean | false | Allows //example.com. Requires explicit opt-in. |
| blockPathTraversal | boolean | true | Prevents relative paths (/path) containing .. segments. |
🔒 Best Practices and Gotchas
1. Always Use allowedProtocols
The protocol whitelist is your most important defense. If you need to allow data: URLs, ensure your pattern list tightly restricts the media type (e.g., data:image/png).
2. Check Global Flags (/g)
Gotcha: If you manually construct a RegExp for patterns, do not use the global (/g) flag. The test() method with /g maintains state, which can lead to security checks being accidentally skipped. Use helpers like toRegexps to avoid this.
3. harden-urls is Not an HTML Sanitizer
Crucial: This library only sanitizes the URL attribute value. It does not protect against arbitrary HTML or script tags within the content itself.
- If rendering markdown, you must use a comprehensive HTML sanitizer like
rehype-sanitizealongside this library if you allow any kind of raw HTML or components.
🤝 Contribution
We welcome contributions! If you find a security vulnerability, have a feature request, or want to fix a bug, please:
- Open an Issue: Discuss the change you wish to make. Security bugs should be reported privately first, if possible.
- Submit a Pull Request: Ensure your code passes all tests and is formatted correctly. New features require tests!
This project is maintained by a small team and relies on community feedback and contributions. Thank you!
Comparison Table
| Library | Safe Defaults | Pattern-Based | Param Cleanup | Protocol Config | Unicode Hardening | | :---------------------------------- | :--------------------------- | :------------ | :------------ | :-------------- | :---------------- | | harden-urls | ✅ Comprehensive | ✅ Yes | ✅ Yes | ✅ Granular | ✅ NFKC | | vercel-labs/markdown-sanitizers | ⚠️ Partial (URL prefix only) | ❌ | ❌ | ⚠️ Limited | ❌ |
⚠️ Important: Neither
harden-urlsnor Vercel’sharden-react-markdownprotects against arbitrary HTML — you must userehype-sanitizeif usingrehype-raw.
Related Packages
rehype-harden-urls— plug into markdown pipelinesharden-react-markdown-urls— secure React wrapper
Acknowledgments
Grateful to:
They laid the groundwork — this package aims to bridge the gap between safety and flexibility.
License
This library is licensed under the MIT open-source license.
