epsteiner
v1.0.1
Published
Over-redact text and Word documents in government classified-document style
Maintainers
Readme
epsteiner
Over-redact documents in government classified-document style. Transforms your files into heavily redacted releases where most content is hidden.
Features
- Redact
.txtand.docxfiles - Configurable redaction ratio (default: 90%)
- Deterministic output via seed
- Word-level or span-level redaction
- Multiple mask styles
- CLI and programmatic API
Installation
npm install epsteinerCLI Usage
Basic usage:
npx epsteiner document.docxWith options:
npx epsteiner input.docx \
--ratio 0.92 \
--mode span \
--seed classified \
--mask █CLI Options
--ratio, -r <number>- Percentage of content to redact (0.0-1.0, default: 0.9)--mode, -m <mode>- Redaction granularity:wordorspan(default: word)--keep-words, -k <number>- Keep exactly N words unredacted (overrides ratio)--seed, -s <string>- Seed for deterministic output--mask <type>- Mask character:█,▇, or[REDACTED](default: █)
Output files are written next to the input file with .redacted added before the extension.
Programmatic API
import { redact } from 'epsteiner';
await redact('input.docx', {
ratio: 0.9,
mode: 'word',
seed: 'classified',
mask: '█'
});API Options
interface RedactionOptions {
ratio?: number; // 0.0-1.0, default: 0.9
mode?: 'word' | 'span'; // default: 'word'
keepWords?: number; // overrides ratio if specified
seed?: string; // default: random
mask?: '█' | '▇' | '[REDACTED]'; // default: '█'
}Mode Comparison
Word mode (default):
The █████ brown ███ jumps ████ the lazy ███Span mode:
The ████████████ jumps ████████████████Span mode groups consecutive redactions into longer blocks for a more dramatic effect.
Deterministic Output
Use the same seed to produce identical redactions:
await redact('document.docx', { seed: 'foia-2024' });This is useful for:
- Reproducible redactions
- Version control
- Collaborative work
Keep Exact Word Count
Instead of a ratio, specify exactly how many words to keep:
await redact('document.txt', { keepWords: 10 });Examples
Redact a text file with high ratio
npx epsteiner report.txt --ratio 0.95Redact a DOCX with span mode
npx epsteiner memo.docx --mode span --seed meeting-notesRedact keeping exactly 20 words
npx epsteiner briefing.txt --keep-words 20Use alternative mask
npx epsteiner document.txt --mask "[REDACTED]"Programmatic Usage Examples
Basic redaction
import { redact } from 'epsteiner';
const outputPath = await redact('classified.docx');
console.log(`Redacted file: ${outputPath}`);Custom configuration
await redact('sensitive.docx', {
ratio: 0.85,
mode: 'span',
seed: 'v1',
mask: '▇'
});Batch processing
import { redact } from 'epsteiner';
import { readdir } from 'fs/promises';
const files = await readdir('./documents');
const docxFiles = files.filter(f => f.endsWith('.docx'));
for (const file of docxFiles) {
await redact(`./documents/${file}`, {
ratio: 0.92,
seed: 'batch-2024'
});
}Format Support
TXT
- Replaces redacted words with block characters
- Preserves all whitespace and line breaks
- Maintains punctuation placement
DOCX
- Operates on text runs
- Preserves document structure
- Maintains formatting and styles
- Does not break headings or tables
How It Works
- Text Extraction: Extracts text while preserving structure
- Tokenization: Splits content into words, whitespace, and punctuation
- Selection: Uses seeded random selection to choose words to redact
- Rendering: Applies format-specific redaction rendering
- Output: Writes redacted file to disk
The core engine is format-agnostic. All file formats use the same redaction logic, ensuring consistent behavior across TXT and DOCX.
License
MIT
