octocode-data-masker
v1.0.0
Published
A TypeScript library for masking sensitive data in strings, including PII, tokens, API keys, and more
Maintainers
Readme
sensitive-data-masker
A high-performance TypeScript library for detecting and masking sensitive data in strings. Protect PII, API keys, tokens, credentials, and other confidential information with intelligent masking algorithms and configurable accuracy levels.
Features
- 🛡️ 200+ Detection Patterns: Comprehensive coverage for modern security needs
- ⚡ High Performance: Optimized regex engine with pattern caching
- 🎯 Accuracy Control: Configure detection sensitivity (high/medium/low)
- 🔧 Flexible Masking: Smart partial masking that preserves readability
- 📦 Zero Dependencies: Lightweight and secure
- 🌍 International Support: Handles US, UK, Canadian, and international formats
- 🔍 Pattern Filtering: Include or exclude specific pattern types
- 📊 Detailed Results: Get match counts, positions, and masked values
Installation
npm install sensitive-data-maskeryarn add sensitive-data-maskerQuick Start
import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
// Basic usage - intelligent partial masking
const text = 'My email is [email protected] and my SSN is 123-45-6789';
const result = mask(text);
console.log(result.output);
// "My email is **[email protected]** and my SSN is **3-45-67**"
console.log(result.found);
// { email: 1, ssn: 1 }
// Check if content contains sensitive data
const isSensitive = hasSensitiveContent(text);
console.log(isSensitive); // true
// Get detailed pattern matches with positions
const matches = getPatternMatches(text);
console.log(matches);
// [
// {
// pattern: 'email',
// matches: [{ match: '[email protected]', startIndex: 12, endIndex: 27 }]
// },
// {
// pattern: 'ssn',
// matches: [{ match: '123-45-6789', startIndex: 44, endIndex: 54 }]
// }
// ]API Reference
mask(input: string, options?: MaskingOptions): MaskResult
Masks sensitive content in a string using intelligent partial masking.
Options
interface MaskingOptions {
maskChar?: string; // Character used for masking (default: '*')
preserveLength?: boolean; // Preserve original length (default: false)
excludePatterns?: string[]; // Patterns to exclude from masking
onlyPatterns?: string[]; // Only mask these patterns
matchAccuracy?: 'high' | 'medium' | 'low'; // Detection sensitivity
}Returns
interface MaskResult {
output: string; // Masked string
found: { [name: string]: number }; // Count of each pattern found
matches: string[]; // Original matched values
masked: string[]; // Masked versions of matches
}hasSensitiveContent(input: string, options?): boolean
Quickly check if a string contains sensitive data without performing masking.
import { hasSensitiveContent } from 'sensitive-data-masker';
hasSensitiveContent('[email protected]'); // true
hasSensitiveContent('hello world'); // false
// With options
hasSensitiveContent('sk-1234567890abcdef', {
matchAccuracy: 'high',
excludePatterns: ['genericId']
}); // truegetPatternMatches(input: string, options?): PatternMatch[]
Get detailed information about all pattern matches including their positions.
import { getPatternMatches } from 'sensitive-data-masker';
const matches = getPatternMatches('Contact: [email protected] and key: sk-123abc');
console.log(matches);
// [
// {
// pattern: 'email',
// matches: [{ match: '[email protected]', startIndex: 9, endIndex: 22 }]
// },
// {
// pattern: 'openaiApiKey',
// matches: [{ match: 'sk-123abc', startIndex: 33, endIndex: 41 }]
// }
// ]Advanced Usage
Custom Masking Options
import { mask } from 'sensitive-data-masker';
// Custom masking character
const result = mask('API key: sk-1234567890abcdef', { maskChar: '#' });
console.log(result.output);
// "API key: ##-1234567890ab##"
// Preserve original length
const result2 = mask('secret123', { preserveLength: true });
console.log(result2.output);
// "*********" (full length masked)
// Use high accuracy mode (fewer false positives)
const result3 = mask('sk-1234567890abcdef', { matchAccuracy: 'high' });
console.log(result3.output);
// "##-1234567890ab##"Pattern Filtering
// Only mask specific patterns
const result = mask('Email: [email protected], API: sk-123', {
onlyPatterns: ['email', 'openaiApiKey']
});
// Exclude certain patterns
const result2 = mask('Email: [email protected], UUID: 123e4567-e89b-12d3-a456-426614174000', {
excludePatterns: ['uuid', 'genericId']
});
// Combine with accuracy control
const result3 = mask(sensitiveText, {
matchAccuracy: 'high',
excludePatterns: ['uuid']
});Supported Pattern Categories
The library detects sensitive data across 25 categories with 200+ patterns:
🆔 Personal Identifiable Information (PII)
- Email addresses (multiple formats)
- Phone numbers (US, International, E.164)
- Social Security Numbers (US with various formats)
- Driver's license numbers, Medical record numbers
- Tax IDs (TIN/EIN), Canadian SIN, UK National Insurance Numbers
☁️ Cloud Provider Credentials
- AWS: Access keys, secret keys, session tokens, account IDs
- AWS Resources: EC2, S3, RDS, Lambda ARNs, VPC IDs
- Azure: Subscription IDs, client secrets, resource IDs
- Google Cloud: API keys, service account keys, project IDs
💳 Financial & Payment Services
- Credit card numbers (Visa, MasterCard, Amex, Discover)
- Stripe: Secret keys, publishable keys, webhook secrets
- PayPal: Access tokens, client IDs
- Square: Access tokens, application IDs
- Bank account numbers (US routing numbers, IBAN)
🤖 AI Provider Credentials
- OpenAI: API keys, organization IDs
- Anthropic/Claude: API keys
- Google AI: Gemini API keys, Vertex AI tokens
- Hugging Face: Access tokens, API keys
- Other AI: Groq, Perplexity, Replicate, Together AI
🔐 Authentication & Security
- JWT tokens, Bearer tokens
- OAuth access tokens, refresh tokens
- API keys in headers (
X-API-Key,Authorization) - Session IDs, CSRF tokens
- Generic secret patterns in environment variables
🔧 Developer Tools & Services
- GitHub: Personal access tokens, app tokens
- Slack: Bot tokens, webhook URLs, app secrets
- Discord: Bot tokens, webhook URLs
- Analytics: Google Analytics, Mixpanel, Amplitude
- Monitoring: Datadog, New Relic, Sentry keys
🗄️ Database & Storage
- Database connection strings (PostgreSQL, MySQL, MongoDB)
- File Storage: S3 bucket URLs, Azure Blob Storage
- CDN: CloudFront URLs, Azure CDN
- Redis connection strings, Elasticsearch URLs
🔑 Cryptographic Materials
- RSA private keys, SSH private keys
- EC private keys, DSA private keys
- X.509 certificates, PGP private key blocks
- JSON Web Keys (JWK), PKCS#8 keys
🌐 Network & Location
- IPv4/IPv6 addresses, MAC addresses
- Geographic coordinates (latitude/longitude)
- Private network ranges, subnet masks
- URL patterns with embedded secrets
📱 Communication Services
- Messaging: Twilio, SendGrid, Mailgun keys
- Social Media: Twitter, Facebook, Instagram tokens
- Email Services: Mailchimp, Postmark, SparkPost
- SMS/Voice: Nexmo, Plivo, MessageBird
🛠️ Infrastructure & DevOps
- Container Registries: Docker Hub, ECR, GCR tokens
- CI/CD: Jenkins, GitLab CI, CircleCI tokens
- Deployment: Vercel, Netlify, Heroku tokens
- Monitoring: PagerDuty, Datadog, New Relic
🏢 Enterprise & Business
- CRM: Salesforce, HubSpot tokens
- E-commerce: Shopify, WooCommerce keys
- Business Tools: Slack, Microsoft Teams tokens
- Analytics: Google Analytics, Adobe Analytics
🎯 Generic Patterns
- UUID v4, Generic IDs
- Base64 encoded secrets
- Hex-encoded keys (32, 64, 128 bit)
- Custom secret patterns in configuration files
🔍 URL & Reference Patterns
- URLs with embedded tokens
- Database connection URIs
- API endpoints with keys
- Webhook URLs with secrets
💾 Version Control & Code
- Git repository URLs with tokens
- Package manager tokens (npm, PyPI)
- Container registry credentials
- Code hosting platform tokens
Pattern Accuracy Levels
Control detection sensitivity to balance between security and false positives:
High Accuracy
- Most specific patterns with minimal false positives
- Examples: AWS access keys with
AKIAprefix, specific API key formats - Best for production environments
Medium Accuracy (Default)
- Balanced detection with reasonable false positive rates
- Examples: Generic API keys, common secret patterns
- Good for most use cases
Low Accuracy
- Broadest detection, may have higher false positive rates
- Examples: Generic IDs, loose pattern matching
- Useful for comprehensive scanning
// Use high accuracy for production
const prodResult = mask(text, { matchAccuracy: 'high' });
// Use medium accuracy for development
const devResult = mask(text, { matchAccuracy: 'medium' });
// Use low accuracy for comprehensive scanning
const scanResult = mask(text, { matchAccuracy: 'low' });TypeScript Support
Full TypeScript support with complete type definitions:
import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
import type { MaskResult, MaskingOptions } from 'sensitive-data-masker';
// Type-safe masking options
const options: MaskingOptions = {
maskChar: '#',
matchAccuracy: 'high',
excludePatterns: ['uuid']
};
const result: MaskResult = mask(text, options);Real-World Examples
Log File Sanitization
import { mask } from 'sensitive-data-masker';
const logEntry = `
[2024-01-15 10:30:45] INFO User [email protected] logged in
[2024-01-15 10:31:12] DEBUG API call with key sk-1234567890abcdef
[2024-01-15 10:31:15] ERROR Payment failed for card 4111-1111-1111-1111
[2024-01-15 10:31:20] WARN SSN in request: 123-45-6789
`;
const sanitized = mask(logEntry);
console.log(sanitized.output);
// [2024-01-15 10:30:45] INFO User **[email protected]** logged in
// [2024-01-15 10:31:12] DEBUG API call with key **-1234567890ab**
// [2024-01-15 10:31:15] ERROR Payment failed for card **11-1111-1111-11**
// [2024-01-15 10:31:20] WARN SSN in request: **3-45-67**
console.log(sanitized.found);
// { email: 1, openaiApiKey: 1, creditCard: 1, ssn: 1 }Configuration File Security
const config = `
DATABASE_URL=postgresql://user:password123@localhost:5432/db
OPENAI_API_KEY=sk-1234567890abcdef1234567890abcdef
STRIPE_SECRET_KEY=sk_live_abcdef123456
[email protected]
JWT_SECRET=super-secret-key-123
`;
const result = mask(config);
console.log(result.output);
// DATABASE_URL=postgresql://user:**ssword1** @localhost:5432/db
// OPENAI_API_KEY=**-1234567890abcdef1234567890ab**
// STRIPE_SECRET_KEY=**_live_abcdef12**
// ADMIN_EMAIL=**[email protected]**
// JWT_SECRET=**per-secret-key-1**Multi-Environment Setup
import { mask } from 'sensitive-data-masker';
// Production: Mask everything with high accuracy
const prodResult = mask(sensitiveData, { matchAccuracy: 'high' });
// Development: Allow test emails but mask real API keys
const devResult = mask(sensitiveData, {
matchAccuracy: 'medium',
excludePatterns: ['email']
});
// Testing: Only mask financial data
const testResult = mask(sensitiveData, {
onlyPatterns: ['creditCard', 'bankAccount', 'ssn'],
matchAccuracy: 'high'
});Data Pipeline Processing
import { hasSensitiveContent, mask } from 'sensitive-data-masker';
// Check if data needs processing
function processBatch(records: string[]) {
const results = records.map(record => {
if (hasSensitiveContent(record)) {
const masked = mask(record, { matchAccuracy: 'high' });
return {
data: masked.output,
hadSensitiveData: true,
patternsFound: Object.keys(masked.found)
};
}
return { data: record, hadSensitiveData: false };
});
return results;
}Performance Considerations
- Optimized Regex Engine: Patterns are compiled and cached on first use
- Single-Pass Processing: Efficient string traversal with minimal overhead
- Memory Efficient: No unnecessary string copies or allocations
- Pattern Filtering: Use
onlyPatternswhen you know which types to look for - Accuracy Optimization: Higher accuracy modes are faster due to more specific patterns
// Optimize for specific use cases
const emailsOnly = mask(text, { onlyPatterns: ['email'] }); // Faster
const highAccuracy = mask(text, { matchAccuracy: 'high' }); // Faster, fewer false positives
const comprehensive = mask(text, { matchAccuracy: 'low' }); // Slower, more thoroughSecurity Best Practices
- Always mask before logging: Ensure sensitive data is masked before writing to logs
- Use appropriate accuracy: Higher accuracy for production, lower for development/testing
- Store results securely: The
matchesarray contains original sensitive values - Regular updates: Keep the library updated for new pattern definitions
- Test your patterns: Verify masking works correctly with your specific data formats
- Environment-specific config: Use different settings for dev/staging/production
Development
Prerequisites
- Node.js >= 18.12.0
- Yarn or npm
Setup
git clone https://github.com/bgauryy/sensitive-data-mask.git
cd sensitive-data-mask
yarn installCommands
yarn build # Build the library
yarn dev # Build in watch mode
yarn lint # Run ESLint
yarn test # Run tests
yarn typecheck # Run TypeScript compiler checksContributing
Contributions are welcome! Please feel free to submit a Pull Request.
Adding New Patterns
- Choose the appropriate category file in
src/regexes/ - Add your pattern following the existing structure:
{
name: 'myPattern',
regex: /your-regex-here/gi,
description: 'Description of what this detects',
matchAccuracy: 'medium' // optional: 'high', 'medium', or 'low'
}- Run tests to ensure no regressions
- Submit a PR with a clear description
License
MIT © guybary
Security
If you discover a security vulnerability, please email [email protected] instead of using the issue tracker.
Made with ❤️ for developers who care about data security
