anonyma

v1.0.0

Published

3 months ago

TypeScript-first PII detection & anonymization — 26 built-in detectors (email, SSN, IBAN, and more), 8 strategies (mask, redact, hash, AES-256 encrypt, tokenize, pseudonymize, generalize, synthesize), 6 compliance presets (GDPR, HIPAA, CCPA, PCI-DSS, SOX,

anonyma

A modern, zero-dependency TypeScript library for PII detection and data anonymization — built for the AI era.

Features

🔍 Detect 27 PII categories — email, phone, SSN, credit card, IBAN, passport, address, VIN, API keys, crypto wallets, and more
🛡️ 8 anonymization strategies — mask, redact, pseudonymize, hash (SHA-256), generalize, tokenize, encrypt (AES-GCM), synthesize
🔁 Reversible tokenization — tokenize() / detokenize() for round-trip fidelity
🤖 LLM pipeline helpers — sanitizeForLLM() / restoreFromLLM() for safe prompt injection with reversible tokens
📋 Compliance presets — built-in gdpr, hipaa, ccpa, pci-dss, sox, and ferpa presets
🌍 Locale-aware detection — locale flags for US, UK, EU, CA, AU, BR, IN, CN, JP, KR, ZA, and global
⚡ Batch processing — anonymizeBatch(), anonymizeBatchAsync(), tokenizeBatch(), detectBatch()
🌊 Streaming support — WHATWG TransformStream wrappers (createAnonymizeStream(), createTokenizeStream())
🎯 Field-level record anonymization — dot-notation paths for nested objects
🌳 Deep object anonymization — anonymizeObject() recursively cleans entire JSON trees
✅ PII presence check — hasPII() with early-exit for fast gating
🔎 Aggressive mode — expanded, permissive patterns for obfuscated PII
🧩 Custom patterns & detectors — inject ad-hoc RegExp patterns or fully replace per-category detectors
🚫 Allowlist support — skip known-safe values by exact string or RegExp pattern
🔢 Confidence threshold — filter out low-confidence matches
🔌 Plugin architecture — extend detectors, strategies, and validators via AnonymaPlugin
🔒 Encryption — reversible AES-GCM encrypt() / decrypt() strategy (Web Crypto API)
🎭 Synthesis — format-preserving synthetic data replacement (deterministic, seeded)
✔️ Checksum validators — luhn, verhoeff, nhsMod11, cpfChecksum, vinChecksum, deaChecksum, ibanMod97, ninoValid, aadhaarFormat via "anonyma/validators"
⚡ Zero runtime dependencies — Zod is an optional peer dependency
🌲 Tree-shakeable — import only what you use
🤖 AI-ready — OpenAI/MCP tool definitions and Zod schemas included
📦 Dual ESM + CJS — works everywhere Node.js ≥ 18 runs
🔒 Strict TypeScript — no any, full declaration files

Installation

npm install anonyma
# or
pnpm add anonyma
# or
yarn add anonyma

For runtime validation and AI schema features, also install the optional peer dependency:

npm install zod

Quick Start

import {
  anonymize, anonymizeAsync, detect, hasPII,
  anonymizeObject, anonymizeRecord, createAnonymizer,
  tokenize, tokenizeAsync, detokenize, sanitizeForLLM, restoreFromLLM,
  anonymizeBatch, anonymizeBatchAsync,
} from "anonyma";

// ── Detect PII ─────────────────────────────────────────────────────────────
const matches = detect("Contact [email protected] or call 555-867-5309.");
// [
//   { category: "email", value: "[email protected]", start: 8,  end: 25, confidence: 0.99 },
//   { category: "phone", value: "555-867-5309",       start: 34, end: 46, confidence: 0.9  },
// ]

// ── Fast PII presence check (stops at first match) ─────────────────────────
if (hasPII(commentText)) {
  return { error: "Comment contains personal information." };
}

// ── Anonymize free text (default: redact all PII) ──────────────────────────
const { text } = anonymize("Contact [email protected] or call 555-867-5309.");
// "Contact [REDACTED] or call [REDACTED]."

// ── Async anonymization (required for hash/encrypt strategies) ─────────────
const { text: hashed } = await anonymizeAsync("[email protected]", {
  defaultStrategy: { strategy: "hash", pepper: "my-pepper" },
});
// "5f3e4b3a9c1d8f2a"

// ── Consistent token mapping ───────────────────────────────────────────────
anonymize("From [email protected] ([email protected])", { consistentTokens: true }).text;
// "From EMAIL_1 (EMAIL_1)"

// ── Global replacement ─────────────────────────────────────────────────────
anonymize("[email protected] — 555-867-5309", { globalReplacement: "***" }).text;
// "*** — ***"

// ── Aggressive mode — catch obfuscated PII ────────────────────────────────
anonymize("user [at] example [dot] com", { aggressive: true }).text;
// "[REDACTED]"

// ── Custom pattern ─────────────────────────────────────────────────────────
anonymize("Order ACME-001234 confirmed.", {
  customPatterns: [{ pattern: /\bACME-\d{6}\b/g, label: "[ORDER_ID]" }],
  rules: [],
}).text;
// "Order [ORDER_ID] confirmed."

// ── Deep object anonymization ──────────────────────────────────────────────
anonymizeObject({
  user: { email: "[email protected]", phone: "555-867-5309" },
  notes: ["Call later", "IP: 192.168.1.1"],
});
// {
//   user: { email: "[REDACTED]", phone: "[REDACTED]" },
//   notes: ["Call later", "[REDACTED]"],
// }

// ── Custom strategy per category ───────────────────────────────────────────
const { text: masked } = anonymize("[email protected] and 192.168.1.1", {
  rules: [
    { category: "email", strategy: { strategy: "mask", keepLeading: 1, keepTrailing: 3 } },
    { category: "ipv4",  strategy: { strategy: "redact", label: "[IP REMOVED]" } },
  ],
});
// "a***************com and [IP REMOVED]"

// ── Format-preserving mask ─────────────────────────────────────────────────
anonymize("123-45-6789", {
  rules: [{ category: "ssn", strategy: { strategy: "mask", preserveFormat: true } }],
}).text;
// "000-00-0000"

// ── Enable only specific categories ───────────────────────────────────────
anonymize("[email protected] — call 555-867-5309", {
  enabledCategories: { email: true },
}).text;
// "[REDACTED] — call 555-867-5309"

// ── Allowlist — skip known-safe values ────────────────────────────────────
anonymize("[email protected] or [email protected]", {
  allowlist: ["[email protected]"],
}).text;
// "[email protected] or [REDACTED]"

// ── Confidence threshold ──────────────────────────────────────────────────
anonymize("maybe a name here", { confidenceThreshold: 0.8 }).text;
// Low-confidence matches are left untouched

// ── Compliance presets ─────────────────────────────────────────────────────
anonymize(medicalNote, { preset: "hipaa" }).text;
anonymize(userData,    { preset: "gdpr"  }).text;

// ── Locale-aware detection ─────────────────────────────────────────────────
anonymize("NHS: 943 476 5919", { locales: ["uk"] }).text;
// "[REDACTED]"

// ── Anonymize object fields ─────────────────────────────────────────────────
anonymizeRecord(
  { name: "Alice", email: "[email protected]", age: "27" },
  {
    email: { strategy: { strategy: "redact" } },
    age:   { strategy: { strategy: "generalize" } },  // 27 → "20-29"
  }
);
// { name: "Alice", email: "[REDACTED]", age: "20-29" }

// ── Reusable anonymizer ─────────────────────────────────────────────────────
const anonymizer = createAnonymizer({
  categories: ["email", "phone"],
  defaultStrategy: { strategy: "pseudonymize", seed: "my-secret" },
  consistentTokens: true,
});
anonymizer.anonymize("[email protected]").text;
// "id_3a7f1c2b9e4d0f1a"
anonymizer.hasPII("no pii here"); // false

// ── Reversible tokenization ────────────────────────────────────────────────
const { text: tokenized, mapping } = tokenize("[email protected] called 555-867-5309");
// text: "[EMAIL_0001] called [PHONE_0001]"
const { text: restored } = detokenize(tokenized, mapping);
// "[email protected] called 555-867-5309"

// ── LLM pipeline — sanitize then restore ──────────────────────────────────
const { text: sanitized, mapping: llmMapping } = sanitizeForLLM("Send invoice to [email protected]");
// "Send invoice to [EMAIL_0001]"
const llmResponse = await callLLM(sanitized);
const final = restoreFromLLM(llmResponse, llmMapping);
// Tokens in the LLM's response are swapped back to the original values

// ── Batch processing ───────────────────────────────────────────────────────
const results = anonymizeBatch(["[email protected]", "192.0.2.1", "123-45-6789"]);
for (const r of results) {
  if (r.ok) console.log(r.value.text);
  else console.error(`Item ${r.index} failed:`, r.error.message);
}

// ── AES-GCM encryption (reversible) ───────────────────────────────────────
import { encrypt, decrypt } from "anonyma";
const ciphertext = await encrypt("[email protected]", { passphrase: "s3cr3t" });
// "base64:<iv>:<ciphertext>"
const original = await decrypt(ciphertext, { passphrase: "s3cr3t" });
// "[email protected]"

// ── Synthetic data replacement ────────────────────────────────────────────
anonymize("[email protected]", {
  defaultStrategy: { strategy: "synthesize", seed: "project-x" },
}).text;
// "[email protected]" (deterministic, structurally valid)

Strategies

| Strategy | Description | Async | Deterministic | Reversible | |----------------|---------------------------------------------------------------------|:-----:|:-------------:|:----------:| | redact | Replace with [REDACTED] label (customizable) | ❌ | ✅ | ❌ | | mask | Replace inner chars with *; optional format-preserving mode | ❌ | ✅ | ❌ | | pseudonymize | Replace with a hex pseudonym (seeded or random) | ❌ | ✅† | ❌ | | hash | SHA-256 one-way hash with optional pepper (Web Crypto) | ✅ | ✅ | ❌ | | generalize | Replace numbers with a bucket range (e.g. 27 → 20-29) | ❌ | ✅ | ❌ | | tokenize | Replace with a reversible placeholder token | ❌ | ✅ | ✅ | | encrypt | AES-256-GCM encryption via passphrase or raw key (Web Crypto) | ✅ | ❌‡ | ✅ | | synthesize | Format-preserving synthetic replacement (seeded, no real PII) | ❌ | ✅† | ❌ |

† Deterministic when seed is provided. ‡ Random IV per encryption; decryptable with the same key.

Detected PII Categories

Personal Information

| Category | Examples | Validator / Notes | |-------------------|---------------------------------------------|-------------------------------------| | email | [email protected] | RFC 5321 regex | | phone | +1 (555) 867-5309, 415.555.2671 | Multi-format regex | | ssn | 123-45-6789 | Regex + exclusions | | name | Dear Alice Smith, Patient: John Doe | Heuristic (greeting/title context) | | date-of-birth | 1990-04-15, April 15, 1990 | Multi-format regex | | address | 123 Main St, Springfield, IL 62701 | Pattern + keyword heuristic | | passport | A12345678, P1234567 | Country-specific patterns | | drivers-license | D123-4567-8901, F123-456-78-910-1 | Multi-state/country patterns | | national-id | Aadhaar, NHS number, NINO, CPF, etc. | Country-specific patterns |

Financial

| Category | Examples | Validator / Notes | |--------------------|-------------------------------------------|-------------------------------------| | credit-card | 4111 1111 1111 1111 | Luhn algorithm | | iban | GB82 WEST 1234 5698 7654 32 | MOD-97 (ISO 13616) | | bank-account | Routing + account number pairs | Pattern regex | | cryptocurrency | BTC, ETH, XRP wallet addresses | Format regex per chain | | tax-id | EIN 12-3456789, VAT GB123456789 | Multi-country patterns |

Healthcare

| Category | Examples | Validator / Notes | |-------------------|---------------------------------------------|-------------------------------------| | medical-record | MRN: 1234567, MR#00456789 | Keyword + pattern | | health-insurance| Subscriber ID, Group/Member # | Pattern regex | | prescription | Rx# 1234567, DEA numbers | Pattern + DEA checksum |

Digital Identity

| Category | Examples | Validator / Notes | |----------------|------------------------------------------------|------------------------------| | ipv4 | 192.168.1.1, 10.0.0.0/8 | Octet-range regex | | ipv6 | 2001:0db8::8a2e:0370:7334 | Regex | | url | https://example.com/path?q=1 | http/https (scheme-required) | | api-key | Bearer tokens, AWS keys, GitHub PATs, etc. | Pattern regex per provider | | social-media | @username, profile URLs, handles | Pattern regex |

Vehicles & Transportation

| Category | Examples | Validator / Notes | |-------------------|---------------------------------------|------------------------------| | vin | 1HGCM82633A004352 | VIN checksum (pos 9) | | license-plate | ABC-1234, AB12CDE | Multi-country patterns | | tracking-number | FedEx, UPS, USPS, DHL tracking codes | Pattern regex per carrier |

Government & Legal

| Category | Examples | Validator / Notes | |-----------------------|---------------------------------------|------------------------------| | case-number | 2023-CV-001234, CR-2022-5678 | Pattern regex | | company-registration| EIN, CRN, SIREN, ABN, etc. | Multi-country patterns |

Compliance Presets

Built-in presets pre-configure which categories are detected and which default strategy is applied.

| Preset | Categories Covered | Default Strategy | |------------|--------------------------------------------------------------------------|-------------------| | gdpr | All personal, financial, healthcare, and digital identity categories | pseudonymize | | hipaa | All 18 HIPAA Safe Harbor PHI identifiers | redact | | ccpa | Consumer identifiers, financial data, online activity | redact | | pci-dss | Credit card, cardholder name, bank account, address | mask (last 4) | | sox | Financial identifiers for audit trails | redact | | ferpa | Student PII — name, SSN, DOB, address | redact |

import { anonymize, getPreset, PRESET_REGISTRY } from "anonyma";

// Apply a preset
anonymize(text, { preset: "hipaa" });

// Extend a preset — add API key detection on top of GDPR
anonymize(text, { preset: "gdpr", enabledCategories: { "api-key": true } });

// Inspect a preset's configuration
const hipaa = getPreset("hipaa");
console.log(hipaa.categories); // [...18 categories...]

Reversible Tokenization

tokenize() replaces PII with opaque placeholder tokens and returns a mapping for lossless restoration.

import { tokenize, detokenize } from "anonyma";

const { text, mapping, tokens } = tokenize("[email protected] called 555-867-5309", {
  format: "bracket",  // "[EMAIL_0001]", "[PHONE_0001]" (default)
  // format: "angle",   // "<Email_1>", "<Phone_1>" (LLM-friendly)
  // format: "custom", tokenTemplate: (cat, n) => `{{${cat}_${n}}}`,
  deterministic: true, // same value → same token (default: true)
});
// text: "[EMAIL_0001] called [PHONE_0001]"

const { text: restored } = detokenize(text, mapping);
// "[email protected] called 555-867-5309"

LLM Pipeline Helpers

Sanitize user input before sending it to a language model, then restore PII from the model's response.

import { sanitizeForLLM, restoreFromLLM } from "anonyma";

// 1. Replace PII with bracket tokens before the LLM sees it
const { text: prompt, mapping } = sanitizeForLLM("Send invoice to [email protected]");
// prompt: "Send invoice to [EMAIL_0001]"

// 2. Call your LLM
const response = await callLLM(prompt);
// response: "I have sent the invoice to [EMAIL_0001]."

// 3. Restore original values in the LLM's response
const { text: final, unresolved } = restoreFromLLM(response, mapping);
// final: "I have sent the invoice to [email protected]."
// unresolved: [] (any tokens the LLM dropped are reported here)

Batch Processing

Process arrays of strings efficiently. Each item is independent — failures don't abort the batch.

import { anonymizeBatch, anonymizeBatchAsync, tokenizeBatch, detectBatch } from "anonyma";

// Sync batch
const results = anonymizeBatch(["[email protected]", "192.0.2.1", "bad\x00input"]);
for (const r of results) {
  if (r.ok) console.log(r.value.text);
  else console.error(`Item ${r.index}:`, r.error.message);
}

// Async batch (supports hash/encrypt strategies, runs in parallel)
const asyncResults = await anonymizeBatchAsync(texts, {
  defaultStrategy: { strategy: "hash", pepper: "secret" },
  concurrency: 10,
});

// Batch tokenization
const tokenResults = tokenizeBatch(texts, { format: "bracket" });

// Batch detection only
const detections = detectBatch(texts);

Streaming

Process data through WHATWG TransformStream pipelines (Node ≥ 18 / browsers with Streams API).

import {
  createAnonymizeStream,
  createAnonymizeStreamAsync,
  createTokenizeStream,
} from "anonyma/stream";

// Anonymize line by line
const readable = ReadableStream.from(lines);
const anonymized = readable.pipeThrough(
  createAnonymizeStream({ defaultStrategy: { strategy: "mask" } })
);
for await (const { text } of anonymized) {
  process.stdout.write(text + "\n");
}

// Async stream (for hash/encrypt strategies)
const asyncStream = createAnonymizeStreamAsync({ preset: "hipaa" });

// Tokenize stream
const tokenStream = createTokenizeStream({ format: "bracket" });

Encryption Strategy

Reversible AES-256-GCM encryption using the Web Crypto API. Requires Node.js ≥ 18.

import { encrypt, decrypt } from "anonyma";

// Encrypt with a passphrase (key derived via PBKDF2)
const ciphertext = await encrypt("[email protected]", { passphrase: "s3cr3t" });
// "base64:<iv>:<ciphertext>"

// Or supply raw key bytes (16 or 32 bytes for AES-128/256)
const ct = await encrypt("[email protected]", { keyBytes: myKeyBytes, encoding: "hex" });

// Decrypt
const original = await decrypt(ciphertext, { passphrase: "s3cr3t" });
// "[email protected]"

// Use as a strategy inside anonymize
const { text } = await anonymizeAsync("[email protected]", {
  rules: [{ category: "email", strategy: { strategy: "encrypt", passphrase: "s3cr3t" } }],
});

Synthesis Strategy

Replace PII with structurally valid, format-preserving synthetic data. Deterministic when a seed is provided.

import { synthesize } from "anonyma";
import { anonymize } from "anonyma";

synthesize("[email protected]", { category: "email", seed: "project-x" });
// "[email protected]"

anonymize("Call 555-867-5309 or email [email protected]", {
  defaultStrategy: { strategy: "synthesize", seed: "my-seed" },
}).text;
// "Call +1-312-408-7291 or email [email protected]"

Validators

Standalone checksum and format validators are exported from "anonyma/validators". They are used internally by detectors but can also be used directly.

import {
  luhn,         // Luhn (ISO/IEC 7812) — credit cards
  verhoeff,     // Verhoeff — Indian Aadhaar
  nhsMod11,     // NHS mod-11 — UK NHS numbers
  cpfChecksum,  // CPF — Brazilian tax IDs
  vinChecksum,  // VIN position-9 check digit
  deaChecksum,  // DEA number check — US prescriptions
  ibanMod97,    // MOD-97 — IBAN (ISO 13616)
  ninoValid,    // NINO format — UK National Insurance
  aadhaarFormat,// Aadhaar format — Indian national ID
} from "anonyma/validators";

luhn("4111111111111111");  // true
ibanMod97("GB82WEST12345698765432");  // true
nhsMod11("943-476-5919");   // true

Plugin Architecture

Extend anonyma with custom detectors, strategies, and validators using AnonymaPlugin.

import { createAnonymizer } from "anonyma";
import type { AnonymaPlugin } from "anonyma";

const myPlugin: AnonymaPlugin = {
  name: "my-plugin",
  detectors: {
    // Override or add custom category detection
    "employee-id": (text) => {
      const matches = [];
      for (const m of text.matchAll(/\bEMP-\d{6}\b/g)) {
        matches.push({ category: "employee-id", value: m[0], start: m.index!, end: m.index! + m[0].length, confidence: 0.95 });
      }
      return matches;
    },
  },
};

const anonymizer = createAnonymizer({ plugins: [myPlugin] });
anonymizer.anonymize("Employee EMP-001234 called.").text;
// "Employee [REDACTED] called."

AI Integration

OpenAI / Anthropic Function Calling

import {
  ANONYMIZE_TOOL_DEFINITION,
  DETECT_TOOL_DEFINITION,
  HAS_PII_TOOL_DEFINITION,
  ANONYMIZE_OBJECT_TOOL_DEFINITION,
} from "anonyma/schemas";

// Pass directly to the tools array:
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  tools: [
    { type: "function", function: ANONYMIZE_TOOL_DEFINITION },
    { type: "function", function: HAS_PII_TOOL_DEFINITION },
    { type: "function", function: ANONYMIZE_OBJECT_TOOL_DEFINITION },
  ],
  messages: [{ role: "user", content: "Anonymize: [email protected]" }],
});

MCP Tool Definition

import { ANONYMA_MANIFEST } from "anonyma/schemas";

// Use the capability manifest to let an AI agent discover anonyma's tools:
const manifest = ANONYMA_MANIFEST;
console.log(manifest.capabilities.anonymize.strategies);

Runtime Validation with Zod

import { AnonymizeOptionsSchema } from "anonyma/schemas";
import { anonymize } from "anonyma";

// Validate untrusted input (e.g. from an HTTP request):
const opts = AnonymizeOptionsSchema.parse(req.body.options);
const result = anonymize(req.body.text as string, opts);

Subpath Imports

| Import path | Contents | |--------------------------|---------------------------------------------------------------------------| | "anonyma" | Core API — no Zod dependency | | "anonyma/detectors" | All detectors + DETECTOR_REGISTRY + AGGRESSIVE_DETECTOR_REGISTRY | | "anonyma/schemas" | Zod schemas, JSON schemas, AI/MCP tool definitions (requires zod) | | "anonyma/validators" | Standalone checksum validators (Luhn, MOD-97, NHS, VIN, etc.) | | "anonyma/stream" | WHATWG TransformStream wrappers (Node ≥ 18 / browsers) | | "anonyma/crypto" | Low-level Web Crypto utilities |

API Documentation

Full API reference: docs/api.md

Requirements

Node.js ≥ 18 (for hash, encrypt, and streaming features — Web Crypto + TransformStream)
TypeScript ≥ 5.0 (optional)
Zod ≥ 3.23 (optional peer-dependency for "anonyma/schemas")

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

anonyma

Features

Installation

Quick Start

Strategies

Detected PII Categories

Personal Information

Financial

Healthcare

Digital Identity

Vehicles & Transportation

Government & Legal

Compliance Presets

Reversible Tokenization

LLM Pipeline Helpers

Batch Processing

Streaming

Encryption Strategy

Synthesis Strategy

Validators

Plugin Architecture

AI Integration

OpenAI / Anthropic Function Calling

MCP Tool Definition

Runtime Validation with Zod

Subpath Imports

API Documentation

Requirements

Contributing

License