pii-guard-node-mini
v1.1.0
Published
Single-file Node.js/TypeScript PII masking engine — detection + masking + tokenization, with streaming, Express middleware, risk scoring, and structure-aware (JSON/HTML/Markdown) helpers
Downloads
166
Maintainers
Keywords
Readme
PII Masker (single-file TypeScript utility)
A self-contained, dependency-free (runtime) PII/secret detection + masking engine designed to be copied directly into other Node.js/TypeScript projects.
What this is for
- Mask PII before sending text to LLMs, logs, telemetry, or analytics.
- Detect and transform common PII/secrets using regex + lightweight validation/heuristics.
- Provide configurable masking strategies per PII type.
- Optionally use tokenization for same-instance reversible masking (useful for round-tripping LLM responses).
What's new in 1.1.0
All additions are backward-compatible.
- 16 new detectors — Driver License (US), Aadhaar (Verhoeff), PAN, UK NIN, UK postcode, Canadian SIN (Luhn), VIN (ISO 3779), CVV, card expiry, NPI, DEA, and brand-specific secret detectors for Stripe, GitHub, Slack, Twilio, OpenAI, Google API keys.
- Batch APIs —
batchMask()/batchDetect(). - Structure-aware masking —
maskJSON(),maskHTML(),maskMarkdown()(skips tags /<script>/ fenced code / inline code). - Streaming —
maskStream()returns a Nodestream.Transformfor very large inputs. - Express/Connect middleware —
createMiddleware({ fields, responseBody, attachReport, skip }). - Risk scoring —
riskScore(text)→{ score, level, breakdown, dominantType }. - Audit & explain —
explain(),diff(),sanitizeForLog(), optional in-memory audit log. - Vault TTL —
vaultTTL+pruneVault()for time-limited tokens. - Multi-locale — run multiple locale-aware detectors in one instance via
locales: [...]. - Pattern overrides — replace built-in regex via
patternOverrides. - New presets —
gdpr,india,developer. Expandedhipaa(adds NPI/DEA) andpci-dss(adds CVV/expiry).
Install / Add to your project
This asset is intentionally shipped as a single file.
Option A — copy the file
Copy
pii-masker.tsinto your project (e.g.src/utils/pii-masker.ts).If your project uses TypeScript, make sure Node types are available:
npm i -D @types/nodeThis file uses Node APIs (
crypto,Buffer). It’s meant for Node.js runtimes (or bundlers configured to polyfill Node APIs).
Option B — install from npm
npm i pii-guard-node-miniIf you copied the file instead of installing from npm, replace import paths like
"pii-guard-node-mini"with your local path (e.g."./utils/pii-masker").
Quick start
import createPIIMasker, { PIIType, MaskingStrategy } from "pii-guard-node-mini";
const masker = createPIIMasker({
preset: "balanced",
logLevel: "silent",
strategies: {
[PIIType.CREDIT_CARD]: MaskingStrategy.REDACT,
},
});
const input = "Email me at [email protected]. Card: 4111 1111 1111 1111";
const { maskedText, entities } = masker.mask(input);
console.log(maskedText);
console.log(entities);API overview
createPIIMasker(userConfig?) returns an object with:
Core
mask(text)→{ maskedText, entities, maskMap }(may also includewarnings/truncated)maskObject(obj, fieldPaths?)→{ masked, entities, maskMap }(may includewarnings)detect(text)→DetectedEntity[]unmask(text)→string(works with tokens produced by this instance, or any vault imported viaimportVault)addDetector(type, fn)→ register custom detectoraddStrategy(name, fn)→ register custom strategyupdateConfig(partial)→ hot-update instance configgetReport()→ basic usage metricsclearVault()→ clears token vault (affects unmask)exportVault()→Record<string, string>oftoken → originalfor external persistenceimportVault(entries)→ loadstoken → originalmappings produced elsewhereresetReport()→ clears countersgetConfig()→ resolved config snapshot
Batch & structure-aware (v2)
batchMask(texts)/batchDetect(texts)→ process arrays in one callmaskJSON(jsonText)→ parse JSON, mask string leaves, re-serializemaskHTML(html)→ mask text nodes only (tags /<script>/<style>preserved)maskMarkdown(md)→ mask prose while leaving fenced code blocks and inline`code`intact
Streaming & middleware (v2)
maskStream(opts?)→ returns a Nodestream.Transformfor large files / pipescreateMiddleware(opts?)→ Express/Connect-style(req, res, next)adapter
Risk & audit (v2)
riskScore(text)→{ score, level, breakdown, dominantType, entityCount }explain(text)→ human-readable multi-line audit stringdiff(text)→ span-level[{ start, end, original, masked, type }]sanitizeForLog(value)→ masks string or deep-masks object (passes primitives through)pruneVault()→ evict expired vault entries (withvaultTTL)getAuditLog()/clearAuditLog()→ in-memory audit log (requiresauditLog: true)
Presets
Presets pre-configure detectors + defaults.
const strict = createPIIMasker({ preset: "strict" });
const balanced = createPIIMasker({ preset: "balanced" });
const minimal = createPIIMasker({ preset: "minimal" });
const hipaa = createPIIMasker({ preset: "hipaa" }); // + NPI, DEA
const pci = createPIIMasker({ preset: "pci-dss" }); // + CVV, expiry
const gdpr = createPIIMasker({ preset: "gdpr" }); // EU-focused
const india = createPIIMasker({ preset: "india" }); // Aadhaar, PAN
const dev = createPIIMasker({ preset: "developer" }); // secrets onlyNotes:
- Preset values can still be overridden by passing your own config fields.
- "HIPAA" / "PCI-DSS" / "GDPR" preset names are practical bundles; you should still validate your usage for your environment.
Configuration (EngineConfig)
You pass a Partial<EngineConfig> to createPIIMasker().
Common knobs:
Production/enterprise recommended defaults
By default, results include entities (with original values) and maskMap (original → masked).
In production pipelines this can accidentally re-introduce sensitive data into logs.
Recommended configuration:
import createPIIMasker, { MaskingStrategy } from "pii-guard-node-mini";
const masker = createPIIMasker({
preset: "balanced",
// Output controls (recommended for production)
includeMaskMap: false,
includeEntities: true,
entityValueMode: "none", // don't return raw matches
includeMaskedValueInEntities: true, // safe to include the replacement
// Safety limits
maxTextLength: 200_000,
maxEntities: 2_000,
onLimitExceeded: "truncate", // or "throw" if you prefer hard-fail
// Recommended for LLM flows
defaultStrategy: MaskingStrategy.REDACT,
});Notes:
- If you set
entityValueMode: "none",DetectedEntity.valuebecomes an empty string. - If you need original values for debugging, enable them only in dev/test.
Output controls (data minimization)
These options are designed to prevent accidental re-introduction of sensitive data via outputs.
includeMaskMap: iftrue,maskMapcontains original → masked mappings.includeEntities: iffalse,entitieswill be an empty list.entityValueMode:"original"(default):DetectedEntity.valueis the raw match."masked":DetectedEntity.valuebecomes the replacement value."none":DetectedEntity.valuebecomes an empty string.
includeMaskedValueInEntities: includesDetectedEntity.masked(the replacement) in the entity.
Example (mask safely, keep entity locations/types only):
const masker = createPIIMasker({
includeMaskMap: false,
includeEntities: true,
entityValueMode: "none",
includeMaskedValueInEntities: false,
});confidenceThreshold
Higher means fewer matches (less false positives), lower means more aggressive masking.
const masker = createPIIMasker({ confidenceThreshold: 0.7 });defaultStrategy and strategies
Set the global default masking behavior, and override per type.
import { MaskingStrategy, PIIType } from "pii-guard-node-mini";
const masker = createPIIMasker({
defaultStrategy: MaskingStrategy.PARTIAL_MASK,
strategies: {
[PIIType.SSN]: MaskingStrategy.REDACT,
[PIIType.CREDIT_CARD]: MaskingStrategy.REDACT,
[PIIType.API_KEY]: MaskingStrategy.REDACT,
},
});allowList
Values that should not be masked even if they look like PII.
const masker = createPIIMasker({
allowList: ["example.com", "localhost"],
});denyList
Terms that must always be masked.
const masker = createPIIMasker({
denyList: [
{ term: "ProjectX", type: "CUSTOM" },
{ term: "InternalCodeWord", type: "CUSTOM" },
],
});detectorsEnabled
Run only a subset of detectors.
import { PIIType } from "pii-guard-node-mini";
const masker = createPIIMasker({
detectorsEnabled: new Set([PIIType.EMAIL, PIIType.PHONE]),
});locale
Affects some patterns/heuristics.
const maskerUS = createPIIMasker({ locale: "US" });
const maskerUK = createPIIMasker({ locale: "UK" });
const maskerIN = createPIIMasker({ locale: "IN" });hashSalt
Only used by HASH strategy.
const masker = createPIIMasker({
defaultStrategy: "HASH",
hashSalt: "your-app-specific-salt",
});Limits: maxTextLength, maxEntities, onLimitExceeded
These controls are designed for untrusted inputs (logs, user text, LLM output) to prevent worst-case performance.
const masker = createPIIMasker({
maxTextLength: 100_000,
maxEntities: 1_000,
onLimitExceeded: "truncate", // or "throw"
});
const res = masker.mask(veryLargeText);
if (res.warnings?.length) {
// handle warnings in your telemetry
}Output controls: includeMaskMap, includeEntities, entityValueMode
const masker = createPIIMasker({
includeMaskMap: false,
includeEntities: true,
entityValueMode: "masked", // or "none" in production
includeMaskedValueInEntities: true,
});logLevel
const masker = createPIIMasker({ logLevel: "warn" });Masking strategies
Available MaskingStrategy values:
REDACT→[REDACTED_<TYPE>]PARTIAL_MASK→ keep some structure (e.g., last 4 digits)HASH→ stable salted hash token like[HASH:abcd1234...]TOKENIZE→<<PII_deadbeef>>+ store mapping in memory vaultREPLACE_FAKE→ replace with realistic fake values (best for demos)CUSTOM_FN→ call a registered custom function
Custom detectors
Add your own detector for domain-specific identifiers.
import { createPIIMasker, type DetectedEntity } from "pii-guard-node-mini";
const masker = createPIIMasker({});
masker.addDetector("EMPLOYEE_ID", (text) => {
const m = /E-\d{4,}/g.exec(text);
if (!m) return [];
const entity: DetectedEntity = {
type: "EMPLOYEE_ID",
value: m[0],
start: m.index,
end: m.index + m[0].length,
confidence: 1,
};
return [entity];
});
masker.updateConfig({
strategies: { EMPLOYEE_ID: "REDACT" },
});Custom strategies
Register a named strategy and assign it per type.
import { createPIIMasker } from "pii-guard-node-mini";
const masker = createPIIMasker({});
masker.addStrategy("KEEP_LAST_2", (entity) =>
entity.value.replace(/.(?=.{2})/g, "*"),
);
masker.updateConfig({
strategies: {
API_KEY: "KEEP_LAST_2",
},
});Masking objects (maskObject) and fieldPaths
By default, maskObject() deep-traverses and masks all string values.
If you pass fieldPaths, only matching paths are masked.
- Exact:
user.email - Wildcard segment:
users.*.email
const input = {
users: [
{ email: "[email protected]", note: "keep this" },
{ email: "[email protected]", note: "keep this" },
],
};
const out = masker.maskObject(input, ["users.*.email"]);Reversible masking (tokenization + unmask)
If you want to restore original values (e.g., when an LLM responds with tokens), use TOKENIZE and set enableReversibility: true.
import { createPIIMasker, MaskingStrategy } from "pii-guard-node-mini";
const masker = createPIIMasker({
enableReversibility: true,
defaultStrategy: MaskingStrategy.TOKENIZE,
});
const masked = masker.mask("Call me at +1 555 123 4567").maskedText;
const restored = masker.unmask(masked);Important:
- Reversal only works for values tokenized by the same masker instance.
- If you call
clearVault(), those mappings are lost.
Tokenization behavior note:
- If
enableReversibility: false, the engine will still produce token-looking placeholders, but it will not store mappings (sounmask()cannot restore, and tokens may not be deterministic). - If you need deterministic tokens, enable reversibility and keep the instance alive for the conversation/session.
Determinism: tokenizationDeterministic
tokenizationDeterministic: true means the same original value becomes the same token within the same instance.
Notes:
- Determinism requires storing mappings, so it only applies when
enableReversibility: true. - If you disable reversibility, tokens are generated but not stored.
Cross-instance / external vault persistence
The token vault is in-memory by default and lost when the process exits. Use exportVault() and importVault() to persist the token → original mappings yourself (file, Redis, DB, KV, etc.) and reverse tokens later from any other instance or process.
import createPIIMasker, { MaskingStrategy } from "pii-guard-node-mini";
// Producer: tokenize with reversibility ON so the vault stores mappings.
const producer = createPIIMasker({
enableReversibility: true,
defaultStrategy: MaskingStrategy.TOKENIZE,
});
const { maskedText } = producer.mask(
"Call me at +1 555 123 4567 or email [email protected]",
);
// Persist the vault (here: just JSON-serialize it).
const vaultJson = JSON.stringify(producer.exportVault());
// store `vaultJson` in your DB / KV / file, keyed by request id
// ---- later, in a different instance/process ----
const consumer = createPIIMasker({
// `enableReversibility` does NOT need to be true on the consumer —
// importing a vault is enough to enable unmask() for those tokens.
enableReversibility: false,
});
consumer.importVault(JSON.parse(vaultJson));
const restored = consumer.unmask(maskedText);
// restored === "Call me at +1 555 123 4567 or email [email protected]"Notes:
exportVault()returns{}when the vault is empty.importVault()merges entries into the current vault; callclearVault()first if you want a clean slate.unmask()is a no-op when reversibility is off and no vault has been imported (it logs a warning and returns the input unchanged).- The exported object contains raw PII values — encrypt it at rest, scope access, and delete it as soon as it is no longer needed.
Built-in detectors (high level)
This library uses pattern-based detectors. Built-in coverage includes:
Core
- Email, phone, SSN, credit card (Luhn), IP, DOB, US address
- Person names (heuristic), passport (US/UK), IBAN
- AWS keys/secrets (heuristic), generic API keys (entropy), JWT, URLs with auth, MAC
- Bank account / routing (conservative: requires context and routing checksum where possible)
Jurisdictional / domain (v2)
- US driver license (state-pattern + context)
- Aadhaar (Verhoeff checksum) and PAN (India)
- UK National Insurance Number + UK postcode
- Canadian SIN (Luhn-checked)
- VIN (ISO 3779 checksum)
- CVV / card expiry (context-gated)
- NPI (NPPES Luhn) and DEA number (DEA checksum) — for healthcare
Brand-specific secret detectors (v2)
- Stripe (
sk_live_…/sk_test_…) - GitHub PATs (
ghp_…,gho_…,ghu_…,ghs_…,ghr_…, and fine-grainedgithub_pat_…) - Slack (
xoxb-…,xoxa-…, etc.) - Twilio Account SID (
AC…) - OpenAI keys (
sk-…,sk-proj-…) - Google API keys (
AIza…)
Reporting
const masker = createPIIMasker({ preset: "balanced" });
masker.mask("Email: [email protected]");
masker.mask("Card: 4111 1111 1111 1111");
console.log(masker.getReport());Batch operations
Process arrays of strings in a single call.
const m = createPIIMasker({ preset: "balanced" });
const results = m.batchMask([
"Email me at [email protected]",
"Card: 4111 1111 1111 1111",
"no pii here",
]);
// results: MaskingResult[] - same order as input
const detections = m.batchDetect([
"Email me at [email protected]",
"no pii here",
]);
// detections: DetectedEntity[][] - same order as inputStructure-aware masking
Three helpers that understand the syntax of common inputs.
maskJSON(jsonText)
Parses a JSON string, masks every string leaf, and re-serializes. Falls back to plain mask() if the input isn't valid JSON.
const out = m.maskJSON(
JSON.stringify({ user: { email: "[email protected]" }, count: 5 }),
);
// out.maskedText === '{"user":{"email":"[REDACTED_EMAIL]"},"count":5}'maskHTML(html)
Regex-tokenizes a tiny HTML-ish string and only masks text nodes. Tags, attributes, and the contents of <script> / <style> are preserved verbatim. No DOM parser is used — the file stays dependency-free.
m.maskHTML(`<p>Email: <b>[email protected]</b></p>`);
// → "<p>Email: <b>[REDACTED_EMAIL]</b></p>"maskMarkdown(md)
Protects fenced code blocks (```) and inline `code` spans, then masks the surrounding prose.
const md =
"Reach me: [email protected]\n\n```\nconst e = '[email protected]';\n```";
m.maskMarkdown(md).maskedText;
// prose email is masked; the code block keeps the literal emailStreaming large inputs
maskStream() returns a Node.js stream.Transform so you can mask multi-GB logs / files without loading them into memory.
import { createReadStream, createWriteStream } from "fs";
import createPIIMasker from "pii-guard-node-mini";
const m = createPIIMasker({ preset: "balanced" });
createReadStream("input.log")
.pipe(m.maskStream({ emitEntities: true }))
.on("entity", (entities) => {
// optional per-chunk metrics
})
.pipe(createWriteStream("masked.log"));Options:
bufferBoundary(default:/[\n\r.!?]\s/) — regex used to find a safe split point so detections don't get cut across chunks.maxBufferSize(default: 64 KiB) — hard cap before the stream forces a flush.emitEntities— emit'entity'events for observability.
Throws on non-Node environments where the stream module is unavailable.
Express / Connect middleware
createMiddleware() returns a standard (req, res, next) handler.
import express from "express";
import createPIIMasker from "pii-guard-node-mini";
const m = createPIIMasker({ preset: "balanced" });
const app = express();
app.use(express.json());
app.use(
m.createMiddleware({
fields: ["body", "query", "params"], // default
responseBody: true, // mask outgoing res.json bodies
attachReport: true, // res.locals.piiReport
skip: (req) => req.path === "/healthz",
}),
);
app.post("/log", (req, res) => {
// req.body is already masked here
// res.locals.piiReport contains { entityCount, byType }
res.json({ ok: true });
});Risk scoring
riskScore(text) returns a 0–100 score and a qualitative bucket. Useful as a pre-LLM gate, a router signal, or for compliance dashboards.
const r = m.riskScore("SSN: 123-45-6789 and card 4111 1111 1111 1111");
// r.score → e.g. 35
// r.level → "high" ("low" | "medium" | "high" | "critical")
// r.breakdown → { SSN: 22.5, CREDIT_CARD: 25.0 }
// r.dominantType → "CREDIT_CARD"
// r.entityCount → 2Tune weights with riskWeights:
const m = createPIIMasker({
riskWeights: { [PIIType.EMAIL]: 12 },
});Explain & diff
console.log(m.explain("Email [email protected] or call +1 555 123 4567"));
// Detected 2 entities:
// - [EMAIL] at pos 6-22 (confidence: 1.00) → j**e@c*****y.com
// - [PHONE] at pos 31-46 (confidence: 0.95) → ************4567
m.diff("Email [email protected]");
// [{ start: 6, end: 22, original: "[email protected]", masked: "...", type: "EMAIL" }]sanitizeForLog(value)
Convenience wrapper — masks a string, deep-masks an object, passes primitives through.
logger.info({ payload: m.sanitizeForLog(req.body) });Vault TTL
When using TOKENIZE with enableReversibility: true, you can auto-expire entries via vaultTTL (ms). Expired tokens are lazily evicted on the next unmask(), tokenize(), or explicit pruneVault() call.
const m = createPIIMasker({
enableReversibility: true,
defaultStrategy: MaskingStrategy.TOKENIZE,
vaultTTL: 5 * 60 * 1000, // 5 minutes
});
m.mask("Call me at +1 555 123 4567");
// ... 6 minutes later ...
m.pruneVault(); // → number evicted
m.unmask(token); // returns token unchanged because it expiredAudit log
Set auditLog: true to keep an in-memory record of every mask() / maskObject() / detect() call. Disabled by default to avoid memory growth on long-running processes.
const m = createPIIMasker({ auditLog: true });
m.mask("Email: [email protected]");
m.getAuditLog();
// [
// { timestamp, operation: 'mask', inputLength, entityCount, byType, warnings? },
// ...
// ]
m.clearAuditLog();Multi-locale
Run locale-aware detectors for several jurisdictions in the same instance:
const m = createPIIMasker({
locales: ["US", "IN", "UK"],
detectorsEnabled: new Set([
PIIType.SSN,
PIIType.AADHAAR,
PIIType.UK_NIN,
PIIType.UK_POSTCODE,
]),
});The legacy single-locale locale: 'US' field still works (and is used when locales is unset).
Pattern overrides
Replace a built-in regex without writing a full detector:
const m = createPIIMasker({
patternOverrides: {
EMAIL: /[\w.+-]+@(?:secret|internal)\.co\b/g,
},
detectorsEnabled: new Set([PIIType.EMAIL]),
});Note: overrides mutate the shared PATTERNS table at engine construction — last writer wins across instances. For per-instance isolation, register a custom detector instead.
License
MIT License — see LICENSE.
