@pii-mask/nlp
v0.2.1
Published
NLP extension for PII masking. Detects names and places using compromise.
Maintainers
Readme
@pii-mask/nlp
Optional NLP extension for @pii-mask/core. Uses compromise for freeform person name and place detection in unstructured text — cases where regex-based detectors can't help.
Installation
pnpm add @pii-mask/nlp compromisePeer dependency: @pii-mask/core
pnpm add @pii-mask/coreWhen to Use This
The built-in person-name and address detectors in @pii-mask/core work via key-name heuristics — they fire when the object key is name, firstName, address, etc. This works well for structured data (JSON, CSV) where field names are known.
@pii-mask/nlp solves a different problem: detecting person names and places in the value itself, using NLP entity recognition. Use it when:
- You have freeform text (support tickets, chat logs, LLM outputs)
- Field names are generic or absent (plain text, unlabeled arrays)
- You need to catch names that aren't in a known-keys list
Quick Start
import { createMasker } from '@pii-mask/core';
import { buildCompromiseDetectors } from '@pii-mask/nlp';
const nlpDetectors = buildCompromiseDetectors({
confidence: 0.7,
entities: ['Person', 'Place'],
});
const masker = createMasker({
mode: 'redact',
extend: nlpDetectors,
});
const { result } = masker.maskString('John Smith lives in New York');
// → '[REDACTED] lives in [REDACTED]'API Reference
buildCompromiseDetectors(options?)
Builds an array of PIIDetector objects for use with createMasker({ extend: [...] }). These detectors use compromise's NLP engine to identify entities.
import { buildCompromiseDetectors } from '@pii-mask/nlp';
const detectors = buildCompromiseDetectors({
confidence: 0.7,
entities: ['Person', 'Place'],
customLexicon: {
'Acme Corp': 'Organization',
},
});Options
| Option | Type | Default | Description |
| --------------- | ------------------------------------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------- |
| confidence | number | 0.7 | Minimum ratio of detected entity text to total text length (0–1). Higher values reduce false positives. |
| entities | Array<'Person' \| 'Place' \| 'Organization'> | ['Person', 'Place'] | Entity types to detect |
| customLexicon | Record<string, 'Person' \| 'Place' \| 'Organization'> | {} | Additional terms to teach compromise |
Returns
PIIDetector[] — pass directly to createMasker({ extend: [...] }).
The returned detectors:
- Do not self-register in the global registry — they are consumer-supplied extensions only
- Use
getOrCreateTokenandgetOrCreateLabelfrom@pii-mask/corefor consistent tokenization - Support all six masking modes
buildPersonDetector(confidence?)
Builds a single person-name detector. Use this when you only need person detection without places.
import { buildPersonDetector } from '@pii-mask/nlp';
const personDetector = buildPersonDetector(0.8);
const masker = createMasker({
mode: 'pseudonymize',
extend: [personDetector],
});
masker.maskString('Jane Doe');
// → 'PERSON_1'Parameters
confidence(optional, default: 0.7) — Minimum ratio threshold
buildPlaceDetector()
Builds a single place/location detector.
import { buildPlaceDetector } from '@pii-mask/nlp';
const placeDetector = buildPlaceDetector();
const masker = createMasker({
mode: 'redact',
extend: [placeDetector],
});
masker.maskString('Paris');
// → '[REDACTED]'Detector IDs
| ID | Entity | Description |
| ------------ | ------ | ---------------------------------------------- |
| nlp-person | Person | Names detected by compromise's .people() |
| nlp-place | Place | Locations detected by compromise's .places() |
Custom Lexicon
Teach compromise additional terms using the customLexicon option. This uses compromise's object-style nlp.extend({ words: ... }) API.
const detectors = buildCompromiseDetectors({
customLexicon: {
Wakanda: 'Place',
JARVIS: 'Person',
'Stark Industries': 'Organization',
},
});Masking Modes
All NLP detectors support every masking mode:
| Mode | Person output | Place output |
| -------------- | ------------------------------- | --------------------------------- |
| mask | J*** (first char + asterisks) | P***** (first char + asterisks) |
| redact | [REDACTED] | [REDACTED] |
| pseudonymize | PERSON_1 | PLACE_1 |
| anonymize | PERSON_1 | PLACE_1 |
| tokenize | <<PII_a1b2c3d4>> | <<PII_e5f6a7b8>> |
| substitute | Random full name via faker | Random city name via faker |
Combining with Core Detectors
NLP detectors complement — not replace — core detectors. Use both together:
import { createMasker } from '@pii-mask/core';
import { buildCompromiseDetectors } from '@pii-mask/nlp';
const masker = createMasker({
mode: 'redact',
extend: buildCompromiseDetectors(),
});
// Core detectors catch structured PII (emails, SSNs, etc.)
// NLP detectors catch names and places in freeform text
const { result } = masker.maskObject({
email: '[email protected]', // caught by core 'email' detector
notes: 'Spoke with Jane Doe', // caught by NLP 'nlp-person' detector
});Important Notes
- Never import
@pii-mask/nlpfrom core, cli, or react. NLP is always a consumer-supplied peer, never a dependency of other packages. - Token generation uses
generateToken,getOrCreateToken, andgetOrCreateLabelfrom@pii-mask/core— never local reimplementations. - The compromise library adds ~200KB to your bundle. Only install this package if you need NLP-based detection.
License
MIT
