redactory
v0.2.0
Published
[](https://www.npmjs.com/package/redactory) [](LICENSE)
Readme
Redactory
Redactory is a privacy‑first utility for detecting and removing personally identifiable information (PII) from text files. It transforms documentation into AI-ready web content without exposing PII or export-controlled data. All processing happens on your machine unless you set the AZURE_BLOB_SAS_URL environment variable, in which case scrubbed files are uploaded to Azure Blob Storage.
Features
- Detects common PII types (EMAIL, PHONE, SSN and ICD10 codes) using regular expressions
- Policy‑driven actions to
MASK,REDACTorALLOWdetected entities - Streaming transform for processing large files
- Command line interface with
scrub,preview,ingestandpolicy validate - Optional upload of scrubbed files to Azure Blob Storage
- Converts documentation into AI-ready web content without exposing PII or export-controlled data
Installation
npm install redactoryQuick Start
Create a policy file describing which entity types to detect and how they should be handled. An example policy is included in this repository:
version: 1
entityTypes:
- EMAIL
- PHONE
- SSN
- ICD10
actions:
EMAIL: MASK
PHONE: MASK
SSN: REDACT
ICD10: ALLOW
thresholds:
default: 0.7
SSN: 0.9
mask:
char: "*"
keepLast: 4
fallback: BLOCKCLI usage
Build the project and run the CLI with npx:
npm run build
npx redactory scrub synthetic-data/sample.txtAvailable commands:
scrub <file>– redact a file in placepreview <file>– show a diff of changes without modifying the fileingest <dir>– scrub all.txt,.htmland.jsonfiles in a directorypolicy validate <file>– verify a policy file is valid
If the AZURE_BLOB_SAS_URL environment variable is set, scrubbed files will automatically be uploaded to Azure Blob Storage and the resulting blob URL will be printed.
Programmatic API
You can also use Redactory from your own Node.js code:
import { Scrubber, loadPolicy } from 'redactory';
const policy = loadPolicy('policy.yaml');
const scrubber = new Scrubber(policy);
const { result } = scrubber.scrub('Contact me at [email protected]');
console.log(result);Using an ONNX NER model
Redactory can optionally load an ONNX model to detect entities
using machine learning. A CPU build of onnxruntime-node is installed automatically
during npm install. To use the GPU build set the environment variable
ONNXRUNTIME_GPU=1 before installing. After obtaining a vocabulary mapping of tokens
to IDs, provide the model path and vocabulary when constructing the Scrubber:
import fs from 'fs';
import { Scrubber, loadPolicy } from 'redactory';
const policy = loadPolicy('policy.yaml');
const vocab = JSON.parse(fs.readFileSync('vocab.json', 'utf8'));
const scrubber = new Scrubber(policy, {
ner: { modelPath: 'ner-model.onnx', vocab }
});
const { result, entities } = scrubber.scrub('Alice met Bob.');
console.log(result, entities);Streaming API
Redactory can scrub data from Node.js streams using the scrubStream helper:
import { Scrubber, loadPolicy, scrubStream } from 'redactory';
import { Readable } from 'stream';
const policy = loadPolicy('policy.yaml');
const scrubber = new Scrubber(policy);
const input = Readable.from(['Contact me at [email protected]']);
// pipe the redacted output elsewhere
const redacted = scrubStream(input, scrubber);
redacted.on('data', chunk => process.stdout.write(chunk));This makes it easy to pipe the redacted output into other streams such as file writes or network uploads.
Synthetic Test Data
A sample file containing fabricated sensitive data lives in synthetic-data/sample.txt. Try scrubbing it with the CLI to see the output.
Development
Compile the TypeScript sources and run the test suite:
npm run build
npm testLicense
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
Contributing
We welcome community contributions! Feel free to open an issue or submit a pull request on GitHub if you discover problems or have improvements to share.
