sanitize-contents
v1.0.3
Published
A robust, lightweight, and customizable library to sanitize/redact sensitive information (passwords, tokens, API keys, connection strings) from strings and HTML content.
Maintainers
Readme
sanitize-contents
A robust, lightweight, and customizable library to sanitize sensitive information (passwords, tokens, API keys, connection strings, PCI/PII data) from strings and HTML content.
Perfect for logging, error reporting, and audit trails where you need to keep data clean and safe.
✨ Features
- 🚀 Smart Detection: Automatically identifies connection strings, raw SQL inserts, query parameters, auth headers, and more.
- 🛡️ Connection Strings: Redacts credentials from
mongodb://,postgres://,redis://,mysql://, and generic URIs. - 🔑 Keys & Tokens: Redacts common keys like
api_key,access_token,secret,jwt,bearertokens. - 📝 Credential Labels: Identifies patterns like
Password: ...,API Key: ...in unstructured text. - 🌐 Network Safe: Sanitizes
Authorizationheaders andcurlcommand flags (-u). - 🎨 Customizable: Choose your replacement string (e.g.,
[HIDDEN]) or remove content entirely. - 📄 HTML Support: Safely sanitizes text content within HTML while preserving valid structure.
- ⚡ Lightweight: Minimal overhead, high performance.
📦 Installation
npm install sanitize-contents🚀 Usage
Basic Sanitization
import { sanitizeContent } from 'sanitize-contents';
// Connection Strings
const dbUrl = 'mongodb://admin:superSecretPassword@localhost:27017/my_app';
console.log(sanitizeContent(dbUrl));
// Output: mongodb://[REDACTED]@localhost:27017/my_app
// API Keys in URLs
const apiUrl = 'https://api.example.com?api_key=sk-123456789&lang=en';
console.log(sanitizeContent(apiUrl));
// Output: https://api.example.com?api_key=[REDACTED]&lang=en
// Authorization Headers
const authHeader = 'Authorization: Bearer my-confidential-token-value';
console.log(sanitizeContent(authHeader));
// Output: Authorization: Bearer [REDACTED]🛠️ Custom Options
You can customize the replacement behavior using the options object.
Custom Replacement Text & Removal
// Replacement
sanitizeContent(log, { replacement: '[HIDDEN]' }); // Output: ... password=[HIDDEN]
// Removal
sanitizeContent(sensitive, { replacement: '' }); // Output: api_key=Excluding Specific Words
Prevent specific words from being redacted, even if they follow a sensitive label.
sanitizeContent("password: REDACTED", { excludeWords: ["REDACTED"] });
// Output: "password: REDACTED"📄 HTML Content
To sanitize content inside HTML without breaking tags, use sanitizeDescription. This preserves your HTML structure while only cleaning the text nodes.
import { sanitizeDescription } from 'sanitize-contents';
const html = '<div class="user-info">Password: <b>secret123</b></div>';
const result = sanitizeDescription(html);
// Output: <div class="user-info">Password: <b>[REDACTED]</b></div>🔍 What it Detects
- Databases: MongoDB, MySQL, Postgres, Redis connection strings.
- Web: HTTP Query Params (
api_key,token,password,secret,session, etc.). - Headers:
Authorization(Bearer/Basic),Cookievalues (if resembling tokens). - Commands:
curlauth flags. - Code: variable assignments like
PASSWORD = "...",const secret = '...'. - Text: Natural language credential dumps (
"password is ...","login with ..."). - Formats: JSON (heuristic), XML tags, URL-encoded values.
💡 User Help & Guidelines
To ensure the best balance between security and readability, follow these guidelines:
🎯 Choosing the Right Method
sanitizeContent(text, options): Best for plain strings, logs, URLs, and code snippets. It is fast and handles raw patterns directly.sanitizeDescription(html, options): Requirement for HTML documents. It uses a DOM parser to ensure that sensitive data inside tags is redacted without accidentally breaking your HTML layout or attributes.
✅ Recommended Patterns
The library is optimized to catch sensitive data following these labels:
password,pass,pwd,secret,token,keyapi_key,access_token,auth_tokencredentials,creds
⚠️ Avoiding False Positives
We have implemented smart filtering for technical terms and common words so they are NOT redacted:
- Transitions:
in,on,at,by,with,as,of,to,is,was,it - Technical Status:
enabled,disabled,status,mode,reference,ref - Known Services:
lastpass,home - Temporary Indicators:
temp,temporary
Examples:
| Input | Output | Result |
| :--- | :--- | :--- |
| password in home | password in home | ✅ Ignored |
| password lastpass | password lastpass | ✅ Ignored |
| temporary: my value | temporary: my value | ✅ Ignored |
| temporary password: 123 | temporary password: [REDACTED] | 🛡️ Redacted |
| enbled key refrence | enbled key refrence | ✅ Ignored |
| status code: 200 | status code: 200 | ✅ Ignored |
📋 Best Practices for Data Handling
- Be Specific: If you are logging a generic value like
"I forgot my password in lastpass", the library will correctly ignore the context. - Exclude Already Redacted Data: If your data source already has some fields marked as "REDACTED" or "PRIVATE", use the
excludeWordsoption to prevent double-redaction.- Example:
sanitizeContent("pass: [HIDDEN]", { excludeWords: ["[HIDDEN]"] })
- Example:
- Typo Protection: The library handles common typos automatically (e.g.,
passwrod,enbled,refrence) to ensure safety even with human error.
📄 License
ISC
