sanitize-contents

v1.0.3

Published

12 days ago

A robust, lightweight, and customizable library to sanitize/redact sensitive information (passwords, tokens, API keys, connection strings) from strings and HTML content.

0High
0Medium
0Low

kishan.npmlib

sanitize redact security password sensitive content pii gdpr token auth secret cleaner filter data-masking masking

sanitize-contents

NPM Version License Downloads

A robust, lightweight, and customizable library to sanitize sensitive information (passwords, tokens, API keys, connection strings, PCI/PII data) from strings and HTML content.

Perfect for logging, error reporting, and audit trails where you need to keep data clean and safe.

✨ Features

🚀 Smart Detection: Automatically identifies connection strings, raw SQL inserts, query parameters, auth headers, and more.
🛡️ Connection Strings: Redacts credentials from mongodb://, postgres://, redis://, mysql://, and generic URIs.
🔑 Keys & Tokens: Redacts common keys like api_key, access_token, secret, jwt, bearer tokens.
📝 Credential Labels: Identifies patterns like Password: ..., API Key: ... in unstructured text.
🌐 Network Safe: Sanitizes Authorization headers and curl command flags (-u).
🎨 Customizable: Choose your replacement string (e.g., [HIDDEN]) or remove content entirely.
📄 HTML Support: Safely sanitizes text content within HTML while preserving valid structure.
⚡ Lightweight: Minimal overhead, high performance.

📦 Installation

npm install sanitize-contents

🚀 Usage

Basic Sanitization

import { sanitizeContent } from 'sanitize-contents';

// Connection Strings
const dbUrl = 'mongodb://admin:superSecretPassword@localhost:27017/my_app';
console.log(sanitizeContent(dbUrl)); 
// Output: mongodb://[REDACTED]@localhost:27017/my_app

// API Keys in URLs
const apiUrl = 'https://api.example.com?api_key=sk-123456789&lang=en';
console.log(sanitizeContent(apiUrl)); 
// Output: https://api.example.com?api_key=[REDACTED]&lang=en

// Authorization Headers
const authHeader = 'Authorization: Bearer my-confidential-token-value';
console.log(sanitizeContent(authHeader)); 
// Output: Authorization: Bearer [REDACTED]

🛠️ Custom Options

You can customize the replacement behavior using the options object.

Custom Replacement Text & Removal

// Replacement
sanitizeContent(log, { replacement: '[HIDDEN]' }); // Output: ... password=[HIDDEN]

// Removal
sanitizeContent(sensitive, { replacement: '' }); // Output: api_key=

Excluding Specific Words

Prevent specific words from being redacted, even if they follow a sensitive label.

sanitizeContent("password: REDACTED", { excludeWords: ["REDACTED"] });
// Output: "password: REDACTED"

📄 HTML Content

To sanitize content inside HTML without breaking tags, use sanitizeDescription. This preserves your HTML structure while only cleaning the text nodes.

import { sanitizeDescription } from 'sanitize-contents';

const html = '<div class="user-info">Password: <b>secret123</b></div>';
const result = sanitizeDescription(html);
// Output: <div class="user-info">Password: <b>[REDACTED]</b></div>

🔍 What it Detects

Databases: MongoDB, MySQL, Postgres, Redis connection strings.
Web: HTTP Query Params (api_key, token, password, secret, session, etc.).
Headers: Authorization (Bearer/Basic), Cookie values (if resembling tokens).
Commands: curl auth flags.
Code: variable assignments like PASSWORD = "...", const secret = '...'.
Text: Natural language credential dumps ("password is ...", "login with ...").
Formats: JSON (heuristic), XML tags, URL-encoded values.

💡 User Help & Guidelines

To ensure the best balance between security and readability, follow these guidelines:

🎯 Choosing the Right Method

sanitizeContent(text, options): Best for plain strings, logs, URLs, and code snippets. It is fast and handles raw patterns directly.
sanitizeDescription(html, options): Requirement for HTML documents. It uses a DOM parser to ensure that sensitive data inside tags is redacted without accidentally breaking your HTML layout or attributes.

✅ Recommended Patterns

The library is optimized to catch sensitive data following these labels:

password, pass, pwd, secret, token, key
api_key, access_token, auth_token
credentials, creds

⚠️ Avoiding False Positives

We have implemented smart filtering for technical terms and common words so they are NOT redacted:

Transitions: in, on, at, by, with, as, of, to, is, was, it
Technical Status: enabled, disabled, status, mode, reference, ref
Known Services: lastpass, home
Temporary Indicators: temp, temporary

Examples:

📋 Best Practices for Data Handling

Be Specific: If you are logging a generic value like "I forgot my password in lastpass", the library will correctly ignore the context.
Exclude Already Redacted Data: If your data source already has some fields marked as "REDACTED" or "PRIVATE", use the excludeWords option to prevent double-redaction.
- Example: sanitizeContent("pass: [HIDDEN]", { excludeWords: ["[HIDDEN]"] })
Typo Protection: The library handles common typos automatically (e.g., passwrod, enbled, refrence) to ensure safety even with human error.

📄 License

ISC

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

sanitize-contents

✨ Features

📦 Installation

🚀 Usage

Basic Sanitization

🛠️ Custom Options

Custom Replacement Text & Removal

Excluding Specific Words

📄 HTML Content

🔍 What it Detects

💡 User Help & Guidelines

🎯 Choosing the Right Method

✅ Recommended Patterns

⚠️ Avoiding False Positives

Examples:

📋 Best Practices for Data Handling

📄 License