scrubtext

v0.1.1

Published

a day ago

Redact secrets and PII (emails, credit cards, API keys, JWTs, private keys…) from text — zero dependencies.

Downloads

251

0High
0Medium
0Low

billdaddy

redact pii secrets privacy security mask sanitize logs llm gdpr credit-card api-key

scrubtext

Redact secrets and PII from text — emails, credit cards, API keys, JWTs, private keys, and more — with zero dependencies.

Sensitive data leaks through the cracks — into log files, error trackers, analytics events, and increasingly into LLM prompts. scrubtext is a tiny, dependency-free library that finds and removes secrets and PII before that happens.

import { redact } from "scrubtext";

redact("Charge card 4242 4242 4242 4242 for [email protected]");
// → "Charge card [CREDIT_CARD] for [EMAIL]"

redact("AWS key AKIAIOSFODNN7EXAMPLE leaked", { strategy: "mask" });
// → "AWS key ******************** leaked"

Why scrubtext?

Zero dependencies. Pure regex + validators. Runs anywhere — Node, edge, browser, Workers — with no native bindings and no cold-start penalty.
Low false positives. Credit cards are checked with the Luhn algorithm, IPv4 octets are range-validated, SSN area numbers are sanity-checked.
Built for the LLM era. Scrub user input and tool output before it reaches a model, or before model output reaches your logs.
Extensible. Add your own detectors (employee IDs, internal URLs, anything regexable) and choose how matches are replaced.
ESM + CJS + types, plus a CLI for shell pipelines.

What it detects

| Type | Notes | | ----------------- | -------------------------------------------- | | email | | | credit_card | 13–19 digits, Luhn-validated | | ssn | US format, invalid area numbers rejected | | phone | International and US formats | | ipv4 / ipv6 | IPv4 octets range-checked | | mac_address | | | jwt | header.payload.signature | | aws_access_key | AKIA…, ASIA…, etc. | | github_token | ghp_, gho_, ghu_, ghs_, ghr_ | | openai_key | sk-…, sk-proj-… | | slack_token | xoxb-, xoxp-, … | | private_key | PEM blocks (RSA / EC / OpenSSH / PGP / DSA) | | url_credentials | scheme://user:pass@host |

Install

npm install scrubtext
# or: pnpm add scrubtext  /  yarn add scrubtext  /  bun add scrubtext

API

`redact(text, options?): string`

Return text with every finding replaced. The default strategy is a [LABEL] tag.

redact("ssh in as root:[email protected]");
// → "ssh in as [URL_CREDENTIALS]"   (note: scheme required for URL creds)

`redactWithReport(text, options?): { text, findings }`

Same as redact, but also returns what was removed — handy for audit logs and metrics.

const { text, findings } = redactWithReport(input);
metrics.increment("pii.redacted", findings.length);

`findSecrets(text, options?): Finding[]`

Scan without modifying. Returns findings sorted by position, with overlaps resolved (a JWT is never also reported as a generic token).

findSecrets("card 4242 4242 4242 4242");
// → [{ type: "credit_card", label: "CREDIT_CARD", value: "...", start: 5, end: 24 }]

Replacement strategies

| Strategy | Result for 4242 4242 4242 4242 | | ------------------ | --------------------------------------- | | "label" (default)| [CREDIT_CARD] | | "mask" | ******************* | | "partial" | ***************4242 (keeps last 4) | | (finding) => … | anything you return |

redact(input, { strategy: "partial", keepLast: 4, maskChar: "•" });
redact(input, { strategy: (f) => `[redacted:${f.type}]` });

Custom detectors & allowlists

import { redact, defaultDetectors } from "scrubtext";

redact(text, {
  // Add to the built-ins:
  extraDetectors: [
    { type: "employee_id", label: "EMPLOYEE_ID", pattern: /\bEMP-\d{5}\b/g },
  ],
  // Never touch known-safe values:
  allowlist: ["[email protected]"],
});

// …or replace the built-in set entirely:
redact(text, { detectors: defaultDetectors.filter((d) => d.type !== "phone") });

A detector is just:

interface Detector {
  type: string;
  label: string;
  pattern: RegExp;            // must be global (g flag)
  validate?: (match: string) => boolean; // optional false-positive filter
}

CLI

# Redact (reads stdin or a file)
cat app.log | scrubtext
scrubtext --strategy partial secrets.env

# Scan only — list findings, don't modify
cat app.log | scrubtext scan
cat app.log | scrubtext scan --json

scrubtext [file]            Redact text (stdin if no file)
scrubtext scan [file]       List findings without modifying
  --strategy, -s <s>   label | mask | partial   (default: label)
  --keep-last <n>      Trailing chars kept by "partial"
  --mask-char <c>      Character used by mask/partial
  --json               (scan only) emit findings as JSON

A note on guarantees

Regex-based redaction is a strong, fast first line of defence, not a formal guarantee. Free-form names, postal addresses, and novel secret formats can slip through. For regulated workloads, combine scrubtext with review and the extraDetectors hook for your domain-specific identifiers.

Contributors ✨

This project follows the all-contributors specification. Contributions of any kind are welcome — code, docs, bug reports, ideas, reviews! See the emoji key for how each contribution is recognized, and open a PR or issue to get involved.

Thanks goes to these wonderful people:

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

scrubtext

Why scrubtext?

What it detects

Install

API

redact(text, options?): string

redactWithReport(text, options?): { text, findings }

findSecrets(text, options?): Finding[]

Replacement strategies

Custom detectors & allowlists

CLI

A note on guarantees

Contributors ✨

License

`redact(text, options?): string`

`redactWithReport(text, options?): { text, findings }`

`findSecrets(text, options?): Finding[]`