scrubtext
v0.1.1
Published
Redact secrets and PII (emails, credit cards, API keys, JWTs, private keys…) from text — zero dependencies.
Downloads
251
Maintainers
Readme
scrubtext
Redact secrets and PII from text — emails, credit cards, API keys, JWTs, private keys, and more — with zero dependencies.
Sensitive data leaks through the cracks — into log files, error trackers, analytics
events, and increasingly into LLM prompts. scrubtext is a tiny, dependency-free
library that finds and removes secrets and PII before that happens.
import { redact } from "scrubtext";
redact("Charge card 4242 4242 4242 4242 for [email protected]");
// → "Charge card [CREDIT_CARD] for [EMAIL]"
redact("AWS key AKIAIOSFODNN7EXAMPLE leaked", { strategy: "mask" });
// → "AWS key ******************** leaked"Why scrubtext?
- Zero dependencies. Pure regex + validators. Runs anywhere — Node, edge, browser, Workers — with no native bindings and no cold-start penalty.
- Low false positives. Credit cards are checked with the Luhn algorithm, IPv4 octets are range-validated, SSN area numbers are sanity-checked.
- Built for the LLM era. Scrub user input and tool output before it reaches a model, or before model output reaches your logs.
- Extensible. Add your own detectors (employee IDs, internal URLs, anything regexable) and choose how matches are replaced.
- ESM + CJS + types, plus a CLI for shell pipelines.
What it detects
| Type | Notes |
| ----------------- | -------------------------------------------- |
| email | |
| credit_card | 13–19 digits, Luhn-validated |
| ssn | US format, invalid area numbers rejected |
| phone | International and US formats |
| ipv4 / ipv6 | IPv4 octets range-checked |
| mac_address | |
| jwt | header.payload.signature |
| aws_access_key | AKIA…, ASIA…, etc. |
| github_token | ghp_, gho_, ghu_, ghs_, ghr_ |
| openai_key | sk-…, sk-proj-… |
| slack_token | xoxb-, xoxp-, … |
| private_key | PEM blocks (RSA / EC / OpenSSH / PGP / DSA) |
| url_credentials | scheme://user:pass@host |
Install
npm install scrubtext
# or: pnpm add scrubtext / yarn add scrubtext / bun add scrubtextAPI
redact(text, options?): string
Return text with every finding replaced. The default strategy is a [LABEL] tag.
redact("ssh in as root:[email protected]");
// → "ssh in as [URL_CREDENTIALS]" (note: scheme required for URL creds)redactWithReport(text, options?): { text, findings }
Same as redact, but also returns what was removed — handy for audit logs and metrics.
const { text, findings } = redactWithReport(input);
metrics.increment("pii.redacted", findings.length);findSecrets(text, options?): Finding[]
Scan without modifying. Returns findings sorted by position, with overlaps resolved (a JWT is never also reported as a generic token).
findSecrets("card 4242 4242 4242 4242");
// → [{ type: "credit_card", label: "CREDIT_CARD", value: "...", start: 5, end: 24 }]Replacement strategies
| Strategy | Result for 4242 4242 4242 4242 |
| ------------------ | --------------------------------------- |
| "label" (default)| [CREDIT_CARD] |
| "mask" | ******************* |
| "partial" | ***************4242 (keeps last 4) |
| (finding) => … | anything you return |
redact(input, { strategy: "partial", keepLast: 4, maskChar: "•" });
redact(input, { strategy: (f) => `[redacted:${f.type}]` });Custom detectors & allowlists
import { redact, defaultDetectors } from "scrubtext";
redact(text, {
// Add to the built-ins:
extraDetectors: [
{ type: "employee_id", label: "EMPLOYEE_ID", pattern: /\bEMP-\d{5}\b/g },
],
// Never touch known-safe values:
allowlist: ["[email protected]"],
});
// …or replace the built-in set entirely:
redact(text, { detectors: defaultDetectors.filter((d) => d.type !== "phone") });A detector is just:
interface Detector {
type: string;
label: string;
pattern: RegExp; // must be global (g flag)
validate?: (match: string) => boolean; // optional false-positive filter
}CLI
# Redact (reads stdin or a file)
cat app.log | scrubtext
scrubtext --strategy partial secrets.env
# Scan only — list findings, don't modify
cat app.log | scrubtext scan
cat app.log | scrubtext scan --jsonscrubtext [file] Redact text (stdin if no file)
scrubtext scan [file] List findings without modifying
--strategy, -s <s> label | mask | partial (default: label)
--keep-last <n> Trailing chars kept by "partial"
--mask-char <c> Character used by mask/partial
--json (scan only) emit findings as JSONA note on guarantees
Regex-based redaction is a strong, fast first line of defence, not a formal guarantee.
Free-form names, postal addresses, and novel secret formats can slip through. For
regulated workloads, combine scrubtext with review and the extraDetectors hook for
your domain-specific identifiers.
Contributors ✨
This project follows the all-contributors specification. Contributions of any kind are welcome — code, docs, bug reports, ideas, reviews! See the emoji key for how each contribution is recognized, and open a PR or issue to get involved.
Thanks goes to these wonderful people:
License
MIT © Tung Tran
