data-redactor-core
v1.0.7
Published
Client-side data redaction tool for securing sensitive information before sending to AI systems
Maintainers
Readme
Data Redactor
A powerful, client-side data redaction tool for securing sensitive information before sending to AI systems or external services. Proving that AI can be used securely with proper input sanitization.
Live Demo
https://data-redactor-ui.vercel.app/
Overview
Data Redactor is a monorepo containing three packages:
| Package | Description | Published |
|---------|-------------|-----------|
| data-redactor-core | Core redaction engine | npm v1.0.5 |
| ui | Vanilla JS web interface | Vercel |
| api | Bun REST API for community patterns | Local/Self-hosted |
All redaction happens 100% client-side - no data is ever sent to a server.
Features
Redaction Strategies
| Strategy | Description | Example |
|----------|-------------|---------|
| Token | Replace with typed placeholders | [email protected] → [EMAIL_1] |
| Mask | Replace with mask character, preserve structure | [email protected] → ****@*****.*** |
| Format-Preserving | Replace with realistic fake data | [email protected] → [email protected] |
Built-in Pattern Detection
| Category | Patterns | |----------|----------| | Network | IPv4 (with CIDR), IPv6, MAC Address, Hostname/FQDN | | Personal | Email, Phone (incl. vanity), SSN, Names (8,849+ name database) | | Financial | Credit Card (13-19 digits), Credit Card Last 4 | | Business | Ticket/Case Numbers |
Pattern Builder (New in v1.0.5)
Visual tool to create custom regex patterns from sample data:
- Mark Selection - Highlight text in your sample to mark what should be matched
- Multi-Sample Support - Add multiple samples to refine pattern accuracy
- Auto-Generation - Automatically generates optimized regex from marked text
- Pattern Explanation - Human-readable breakdown of what the pattern matches
- Live Testing - Test generated patterns against sample data in real-time
- One-Click Add - Add patterns directly to your configuration
Community Patterns (New in v1.0.5)
Browse, share, and vote on community-contributed regex patterns:
- Pattern Library - Discover patterns submitted by other users
- Voting System - Upvote/downvote patterns to help surface the best ones
- Category Filtering - Filter by identifier, financial, healthcare, infrastructure, personal
- One-Click Use - Add community patterns to your configuration instantly
- Submit Your Own - Share useful patterns with the community
Presets
Pre-configured pattern sets for common use cases:
| Preset | Description |
|--------|-------------|
| strict-ai | Maximum redaction for AI/LLM inputs |
| minimal | Light redaction, preserves readability |
| logs | Optimized for log file sanitization |
| financial | Focus on financial data (accounts, cards) |
| healthcare | HIPAA-focused (MRN, NPI, patient info) |
Extensibility
- Custom Patterns - Define your own regex patterns with configurable strategies
- Custom Entities - Whitelist specific values (company names, project names, etc.)
- Regex Builder - Programmatic pattern generation from samples
Engine Features
- Deterministic redaction (same input → same output within session)
- Overlap detection and resolution
- Configurable token format per pattern type
- Configurable mask character
- Import/Export JSON configurations
Packages
data-redactor-core
The core TypeScript redaction engine. Zero browser dependencies - works in Node.js and browser environments.
Key exports:
DataRedactor- Main redaction classConfigLoader- Configuration loading and validationDEFAULT_CONFIG- Default configuration with all patterns enabledgetPreset()/hasPreset()- Preset configuration helpersgenerateFromSample()/refineFromSamples()- Regex builder utilities- Pattern classes:
IPv4Pattern,EmailPattern,NamePattern, etc. - Strategy classes:
TokenStrategy,MaskStrategy,FormatPreservingStrategy
UI
Vanilla JavaScript web application (no framework dependencies) with five main views:
- Pattern Detection - Toggle patterns on/off, select strategies per pattern
- JSON Editor - Full configuration editing with validation
- Output Format - Interactive per-pattern testing with live preview of all strategies
- Pattern Builder - Visual tool to create custom regex patterns from sample data
- Community - Browse and use community-contributed patterns
UI Features:
- Mobile-responsive design with optimized touch targets
- Collapsible accordion sections for better organization
- Dark mode support
- Keyboard shortcuts for common actions
API Server
Bun-powered REST API for community patterns and feedback:
| Endpoint | Method | Description |
|----------|--------|-------------|
| /api/health | GET | Health check |
| /api/redact | POST | Redact text (server-side option) |
| /api/presets | GET | List available presets |
| /api/patterns | GET | List community patterns |
| /api/patterns | POST | Submit a new pattern |
| /api/patterns/:id | GET | Get pattern details |
| /api/patterns/:id/vote | POST | Vote on a pattern |
| /api/patterns/:id/use | POST | Mark pattern as used |
| /api/feedback | GET/POST | Feedback collection |
Database: MongoDB Atlas - works both locally and on Vercel. Set MONGODB_URI environment variable.
Installation
# Install the core package
npm install data-redactor-core
# Or use bun
bun add data-redactor-coreUsage
Basic Example
import { DataRedactor } from 'data-redactor-core';
const redactor = new DataRedactor();
const text = "Contact [email protected] at 555-123-4567";
const result = redactor.redact(text);
console.log(result.redactedText);
// "Contact [EMAIL_1] at [PHONE_1]"
console.log(result.mapping);
// { "[email protected]": "[EMAIL_1]", "555-123-4567": "[PHONE_1]" }Using Presets
import { DataRedactor, getPreset } from 'data-redactor-core';
// Use a preset configuration
const redactor = new DataRedactor(getPreset('strict-ai'));
// Or for healthcare compliance
const hipaaRedactor = new DataRedactor(getPreset('healthcare'));Custom Configuration
import { DataRedactor } from 'data-redactor-core';
const config = {
patterns: {
email: { enabled: true, strategy: 'mask' },
phone: { enabled: true, strategy: 'token' },
ipv4: { enabled: false }
},
formatOptions: {
tokenFormat: '[{TYPE}_{INDEX}]',
maskChar: '*',
preserveStructure: true
}
};
const redactor = new DataRedactor(config);Custom Patterns
const config = {
patterns: {
custom: [
{
name: 'caseId',
regex: 'CASE-\\\\d{6}',
strategy: 'token',
flags: 'gi'
}
]
}
};
const redactor = new DataRedactor(config);
const text = "Please reference CASE-123456 in your response";
const result = redactor.redact(text);
// "Please reference [CASEID_1] in your response"Regex Builder (Programmatic)
import { generateFromSample, refineFromSamples } from 'data-redactor-core';
// Generate pattern from a single sample
const result = generateFromSample('ABC-12345', {
wordBoundaries: true,
caseInsensitive: false
});
console.log(result.regex);
// "[A-Z]{3}-\\d{5}"
// Refine with multiple samples
const refined = refineFromSamples(
['ABC-12345', 'XYZ-67890', 'DEF-11111'],
{ wordBoundaries: true }
);Custom Entities
Redact specific values like company names, project names, or customer names:
const config = {
customEntities: {
companyNames: ["Acme Corp", "Globex Corporation"],
projectNames: ["Project Phoenix", "Operation Sunrise"],
customerNames: ["John Smith", "Jane Doe"]
}
};
const redactor = new DataRedactor(config);
const text = "Acme Corp is working on Project Phoenix with John Smith";
const result = redactor.redact(text);
// "[COMPANYNAMES_1] is working on [PROJECTNAMES_1] with [CUSTOMERNAMES_1]"Customizing Token Format
const config = {
formatOptions: {
tokenFormat: '<{TYPE}:{INDEX}>', // Default: '[{TYPE}_{INDEX}]'
maskChar: '#', // Default: '*'
preserveStructure: true // Default: true
},
patterns: {
email: { enabled: true, strategy: 'token' },
phone: { enabled: true, strategy: 'mask' }
}
};
const redactor = new DataRedactor(config);
const text = "Email: [email protected] Phone: 555-1234";
const result = redactor.redact(text);
// "Email: <EMAIL:1> Phone: ###-####"Loading Configuration from File (Node.js)
import { DataRedactor, ConfigLoader } from 'data-redactor-core';
// Load from JSON file
const config = ConfigLoader.loadFromFile('./my-config.json');
const redactor = new DataRedactor(config);
// Or get default config
const defaultConfig = ConfigLoader.getDefault();
// Validate config
const validation = ConfigLoader.validateConfig(config);
if (!validation.valid) {
console.error('Config errors:', validation.errors);
}Development
bun install # Install dependencies (also builds core)
bun dev # Run both UI and API dev servers
bun dev:ui # Run UI dev server with hot reload
bun dev:api # Run API server only
bun build # Build everything (core + UI)
bun build:core # Build core library only
bun build:ui # Build UI for static deployment
bun lint # Run ESLint
bun typecheck # Run TypeScript type checking
bun format # Run PrettierProject Structure
data-redactor/
├── package.json # Root package config
├── tsconfig.json # TypeScript config
├── build-ui.js # UI bundler script (Bun.build)
├── dev.ts # Combined dev server runner
├── vercel.json # Vercel deployment config
├── dist/ # Built UI (static files for deployment)
├── packages/
│ ├── core/src/ # Redaction engine source (TypeScript)
│ │ ├── index.ts # Main exports
│ │ ├── engine.ts # Core redaction logic
│ │ ├── config.ts # Configuration handling
│ │ ├── presets.ts # Preset configurations
│ │ ├── patterns/ # Pattern implementations
│ │ ├── regex-builder/ # Pattern generation from samples
│ │ └── scenarios/ # Context-aware redaction scenarios
│ ├── ui/ # Vanilla JS UI source
│ │ ├── index.html
│ │ ├── main.js
│ │ └── styles.css
│ └── api/ # REST API server
│ ├── server.ts # Main server entry
│ ├── routes/ # API route handlers
│ └── db/ # SQLite database client
├── config-examples/
└── examples/
└── tampermonkey-redactor.js # Browser userscript exampleTech Stack
Latest versions as of 11/29/2025
| Category | Package | Version | |----------|---------|---------| | Runtime | Bun | 1.3+ | | UI | Vanilla JavaScript | ES2022 | | Build | tsup (core), Bun.build (UI) | ^8 | | Language | TypeScript (core) | ^5 | | Database | MongoDB Atlas | ^7 | | Name Data | common-last-names | ^1 | | | datasets-male-first-names-en | ^1 | | | datasets-female-first-names-en | ^1 | | Deploy | Vercel (static) | - |
License
MIT
Author
Matthew Goluba - @goobz22
Contributing
Contributions welcome! See open issues for planned features.
Ways to Contribute
- Submit Patterns - Use the Pattern Builder to create and submit useful patterns
- Vote on Patterns - Help surface the best community patterns
- Report Issues - Found a bug or false positive? Open an issue
- Feature Requests - Ideas for new patterns or features? We'd love to hear them
