npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

indian-pii

v0.1.1

Published

Detect, validate & mask Indian PII (Aadhaar, PAN, GSTIN, UPI, IFSC) in JavaScript — zero-dependency, real checksums, for redaction & KYC.

Readme

indian-pii — detect, validate & mask Indian PII for JavaScript

npm version license: MIT zero dependencies types: included runtime: Node + Browser

Detect, validate, and mask Indian personally identifiable information — Aadhaar, PAN, GSTIN, UPI, Voter ID, Passport, IFSC, and 11 more — with real checksums, zero dependencies, and first-class TypeScript types. Works in both Node and the browser.

npm install indian-pii

Why

JavaScript has plenty of generic PII libraries, but almost none understand Indian identifiers — and the few that do only check shape (a regex), not authenticity. A 12-digit number is not an Aadhaar unless its Verhoeff check digit is valid; a 15-character string is not a GSTIN unless its mod-36 digit and state code check out. In the DPDP Act era, teams building redaction, logging hygiene, KYC, and consent tooling need accurate, checksum-backed detection they can run client-side or server-side without pulling in a dependency tree. indian-pii does exactly that — and nothing else.

Quick start

import { detect, mask } from "indian-pii";

detect("My PAN is ABCPK1234Z");
// [{ type: 'pan', value: 'ABCPK1234Z', index: 9, valid: true, confidence: 0.8 }]

mask("My PAN is ABCPK1234Z"); // "My PAN is AXXXXXXXXZ"

Use cases

  • KYC & onboarding — validate Aadhaar, PAN, GSTIN, IFSC, and other bank/tax identifiers with genuine checksum and structure checks before you trust user input.
  • Redaction & log scrubbing — automatically find and mask Indian PII in application logs, support tickets, analytics events, and outbound payloads.
  • DPDP Act compliance — minimise and de-identify personal data in line with India's Digital Personal Data Protection era, on the client or the server.
  • ID-document processing (OCR) — extract PII from scanned Aadhaar/PAN cards via the optional indian-pii/image layer and redact it with pixel-accurate boxes.

API reference

detect(text, options?) → DetectionResult[]

Scans text and returns non-overlapping hits. Where spans overlap, the highest-confidence hit wins (checksum-validated beats structure-only).

| Param | Type | Description | |-------|------|-------------| | text | string | Text to scan. Non-string input returns []. | | options.types | string[] | Restrict to these detector ids. | | options.requireValid | boolean | Only return hits that pass validation. | | options.contextWindow | number | Chars each side searched for a keyword (default 40). |

Returns an array of:

interface DetectionResult {
  type: string;       // detector id, e.g. "pan"
  value: string;      // matched substring
  index: number;      // offset in text
  valid: boolean;     // passed real validation (checksum/structure)
  confidence: number; // 0–1; checksum-validated hits score highest
}
detect("GSTIN 27AAPFU0939F1ZV, call 9876543210");
// [
//   { type: 'gstin', value: '27AAPFU0939F1ZV', index: 6,  valid: true, confidence: 0.99 },
//   { type: 'mobile_in', value: '9876543210',   index: 28, valid: true, confidence: 0.9 }
// ]

validate(type, value) → boolean

Strictly validates a single value (checksum or structure). Returns false for unknown types and non-string input — never throws.

validate("gstin", "27AAPFU0939F1ZV"); // true
validate("card",  "4111111111111111"); // true  (Luhn + Visa IIN + length)
validate("upi_vpa", "[email protected]");  // false (that is an email, not a VPA)

maskValue(type, value) → string

Masks one value using that type's rule. Unknown types return the input unchanged; non-string input returns "".

maskValue("aadhaar", "2345 6789 0124"); // "XXXX XXXX 0124"
maskValue("card", "4111111111111111");  // "XXXXXXXXXXXX1111"
maskValue("unknown", "keep me");        // "keep me"

mask(text, options?) → string

Returns a copy of text with every detected value replaced by its masked form. Accepts the same options as detect, plus maskChar (default "X").

mask("Card 4111 1111 1111 1111");          // "Card XXXX XXXX XXXX 1111"
mask("PAN ABCPK1234Z", { maskChar: "•" }); // "PAN A••••••••Z"

detectors → Detector[]

The full registry of 18 detectors. Each exposes { id, label, category, severity, regex, validate, mask, contextHints }.

import { detectors } from "indian-pii";
detectors.map((d) => d.id);
// ['aadhaar','card','gstin','abha','cin','pan','tan','voter_id','passport',
//  'driving_licence','ifsc','demat','upi_vpa','uan','mobile_in','micr','din','pincode']

Images / OCR

An optional, dependency-free layer at the subpath indian-pii/image lets you pull PII out of images. It does not perform OCR and ships no model — you run OCR yourself (e.g. with Tesseract.js), hand the result to this layer, and it feeds the text back into the same core detect() engine and maps every hit to pixel boxes for redaction.

import { createWorker } from "tesseract.js";          // YOUR OCR dependency
import { fromTesseract, detectInImage, redactBoxes } from "indian-pii/image";

const worker = await createWorker("eng");
const { data } = await worker.recognize(imageFile);   // run OCR
await worker.terminate();

const ocr = fromTesseract(data);                       // normalise OCR output
const results = detectInImage(ocr);                    // detect PII + boxes
//  [{ type:'aadhaar', value:'2345 6789 0124', valid:true,
//     bbox:{x,y,width,height}, boxes:[...3 word boxes], ocrConfidence:0.8, ... }]

const ctx = canvas.getContext("2d");                   // browser canvas or node-canvas
redactBoxes(ctx, results);                             // paint opaque boxes over PII

detectInImage(ocr, options?) → ImageDetectionResult[]

Reconstructs scan text from ocr.words (joined by a single-character joiner, default " "), runs core detect(), then maps each hit back to the OCR words its character range overlaps.

| Param | Type | Description | |-------|------|-------------| | ocr | OcrResult | { words: OcrWord[], imageWidth?, imageHeight? } | | options.types | string[] | Restrict to these detector ids. | | options.requireValid | boolean | Only return hits that pass validation. | | options.contextWindow | number | Chars each side searched for a keyword (default 40). | | options.joiner | string | One character joined between words (default " "). Throws RangeError if not length 1. |

Each result is a core DetectionResult plus bbox (union box), boxes (the per-word boxes the hit spans), and ocrConfidence (mean confidence of those words, when available). Hits that land only on joiner characters are skipped.

fromTesseract(data) → OcrResult

A pure transform (imports nothing) over the data object returned by Tesseract.js worker.recognize(). Converts corner coords {x0,y0,x1,y1} to top-left {x,y,width,height}, rescales confidence 0–100 → 0–1, drops empty-text words, and tolerates missing/null input.

redactBoxes(ctx, results, options?) → number

Paints opaque rectangles over detected PII and returns how many were drawn. ctx only needs fillStyle and fillRect (the structural Fill2D interface), so it works with the browser CanvasRenderingContext2D and node-canvas without importing either. Options: color (default "#000"), padding (default 2), perWord (default false — fill the union box; set true for individual word boxes).

Accuracy caveat. OCR is imperfect — a misread digit changes the value, so a checksum that was valid can fail (and, rarely, a wrong value can coincidentally pass). A clean detectInImage pass is not proof of a real, active identifier; treat it as a redaction aid, not verification.

Object detection (future seam)

The types RegionDetector and ObjectRegion define a contract for detecting non-text regions (faces, signatures, QR codes). This is intentionally unimplemented — core ships no ML model to keep the zero-dependency promise. Bring your own detector that satisfies RegionDetector if you need it.

Detector table

All sample values below are fabricated for documentation. Validation is the strongest check each detector applies; Gated marks loose patterns that only fire in free text with a nearby keyword (or a self-identifying token).

| id | Format | Validation | Gated | Example | |----|--------|-----------|:----:|---------| | aadhaar | 12 digits, 1st 2–9 | checksum (Verhoeff) | | 2345 6789 0124 | | pan | AAAAA9999A | structure (4th = holder type) | | ABCPK1234Z | | voter_id | AAA9999999 | structure (EPIC) | | ABC1234567 | | passport | A9999999 | structure ([A-PR-WY]) | | P1234567 | | driving_licence | SS RR YYYY NNNNNNN | structure (state + length) | | MH1220110012345 | | upi_vpa | name@psp | structure (known psp / no-dot) | ✓* | ramesh@oksbi | | ifsc | BANK0BRANCH | structure (5th char 0) | | SBIN0001234 | | micr | 9 digits | structure | ✓ | MICR 400002007 | | demat | IN+14 / 16 digits | structure | ✓* | IN30001012345678 | | card | 13–19 digits | checksum (Luhn + IIN + length) | | 4111 1111 1111 1111 | | gstin | 15 chars | checksum (mod-36 + state) | | 27AAPFU0939F1ZV | | tan | AAAA99999A | structure | | MUMA12345B | | cin | 21 chars | structure (6 segments) | | U72200KA2011PTC123456 | | din | 8 digits | structure | ✓ | DIN 01234567 | | uan | 12 digits | structure | ✓ | UAN 100123456789 | | abha | 14 digits | checksum (Verhoeff) | | 12-3456-7890-1230 | | mobile_in | [6-9] + 9 digits | structure | ✓* | +91 98765 43210 | | pincode | 6 digits, non-zero start | structure | ✓ | 560001 |

* Context-gated, but a self-identifying token bypasses the gate: a known UPI handle (name@oksbi), an IN-prefixed demat id, or a +91-prefixed mobile is flagged even without a nearby keyword.

Per-detector examples

Every example uses a fake value. For context-gated detectors the detect() input includes the keyword that the gate requires.

1. aadhaar — checksum (Verhoeff)

detect("Aadhaar 2345 6789 0124");
// [{ type: 'aadhaar', value: '2345 6789 0124', index: 8, valid: true, confidence: 0.99 }]
validate("aadhaar", "2345 6789 0124"); // true
validate("aadhaar", "2345 6789 0123"); // false (bad checksum)
maskValue("aadhaar", "2345 6789 0124"); // "XXXX XXXX 0124"

2. pan — structure

detect("PAN ABCPK1234Z");
// [{ type: 'pan', value: 'ABCPK1234Z', index: 4, valid: true, confidence: 0.8 }]
validate("pan", "ABCPK1234Z"); // true
validate("pan", "ABCDK1234Z"); // false (4th char not a holder type)
maskValue("pan", "ABCPK1234Z"); // "AXXXXXXXXZ"

3. voter_id — structure

detect("EPIC ABC1234567");
// [{ type: 'voter_id', value: 'ABC1234567', index: 5, valid: true, confidence: 0.8 }]
validate("voter_id", "ABC1234567"); // true
validate("voter_id", "AB1234567");  // false
maskValue("voter_id", "ABC1234567"); // "ABCXXXXXXX"

4. passport — structure

detect("Passport P1234567");
// [{ type: 'passport', value: 'P1234567', index: 9, valid: true, confidence: 0.8 }]
validate("passport", "P1234567"); // true
validate("passport", "Q1234567"); // false (Q/X/Z not allowed as 1st char)
maskValue("passport", "P1234567"); // "PXXXXX67"

5. driving_licence — structure

detect("DL MH1220110012345");
// [{ type: 'driving_licence', value: 'MH1220110012345', index: 3, valid: true, confidence: 0.8 }]
validate("driving_licence", "MH1220110012345"); // true
validate("driving_licence", "ZZ1220110012345"); // false (bad state code)
maskValue("driving_licence", "MH1220110012345"); // "MH12XXXXXXXXX45"

6. upi_vpa — structure (self-identifies on known handle)

detect("Pay ramesh@oksbi");
// [{ type: 'upi_vpa', value: 'ramesh@oksbi', index: 4, valid: true, confidence: 0.9 }]
validate("upi_vpa", "ramesh@oksbi");     // true
validate("upi_vpa", "[email protected]"); // false (email, not a VPA)
maskValue("upi_vpa", "ramesh@oksbi"); // "rXXXXX@oksbi"

7. ifsc — structure

detect("IFSC SBIN0001234");
// [{ type: 'ifsc', value: 'SBIN0001234', index: 5, valid: true, confidence: 0.8 }]
validate("ifsc", "SBIN0001234"); // true
validate("ifsc", "SBIN1001234"); // false (5th char must be 0)
maskValue("ifsc", "SBIN0001234"); // "SBINXXXXXXX"

8. micr — structure (context-gated)

detect("MICR 400002007");
// [{ type: 'micr', value: '400002007', index: 5, valid: true, confidence: 0.9 }]
detect("value 400002007 here"); // [] — no "MICR" keyword nearby
validate("micr", "400002007"); // true
maskValue("micr", "400002007"); // "XXXXXX007"

9. demat — structure (CDSL 16-digit form is gated; NSDL IN form self-identifies)

detect("Demat IN30001012345678");
// [{ type: 'demat', value: 'IN30001012345678', index: 6, valid: true, confidence: 0.9 }]
validate("demat", "IN30001012345678"); // true (NSDL)
validate("demat", "1234567890123456");  // true (CDSL)
maskValue("demat", "IN30001012345678"); // "INXXXXXXXXXX5678"

10. card — checksum (Luhn + IIN + length)

detect("Card 4111 1111 1111 1111");
// [{ type: 'card', value: '4111 1111 1111 1111', index: 5, valid: true, confidence: 0.99 }]
validate("card", "4111111111111111"); // true
validate("card", "1234567812345670"); // false (Luhn ok but no real IIN)
maskValue("card", "4111 1111 1111 1111"); // "XXXX XXXX XXXX 1111"

11. gstin — checksum (mod-36 + state code)

detect("GSTIN 27AAPFU0939F1ZV");
// [{ type: 'gstin', value: '27AAPFU0939F1ZV', index: 6, valid: true, confidence: 0.99 }]
validate("gstin", "27AAPFU0939F1ZV"); // true
validate("gstin", "27AAPFU0939F1ZX"); // false (bad check digit)
maskValue("gstin", "27AAPFU0939F1ZV"); // "27XXXXXXXXXXXXV"

12. tan — structure

detect("TAN MUMA12345B");
// [{ type: 'tan', value: 'MUMA12345B', index: 4, valid: true, confidence: 0.8 }]
validate("tan", "MUMA12345B"); // true
validate("tan", "MUM12345B");  // false
maskValue("tan", "MUMA12345B"); // "MUMAXXXXXB"

13. cin — structure (all 6 segments)

detect("CIN U72200KA2011PTC123456");
// [{ type: 'cin', value: 'U72200KA2011PTC123456', index: 4, valid: true, confidence: 0.8 }]
validate("cin", "U72200KA2011PTC123456"); // true
validate("cin", "U72200ZZ2011PTC123456"); // false (bad state code)
maskValue("cin", "U72200KA2011PTC123456"); // "UXXXXXXXXXXXXXX123456"

14. din — structure (context-gated)

detect("DIN 01234567");
// [{ type: 'din', value: '01234567', index: 4, valid: true, confidence: 0.9 }]
detect("ref 01234567 here"); // [] — no "DIN"/"director" keyword nearby
validate("din", "01234567"); // true
maskValue("din", "01234567"); // "XXXXXX67"

15. uan — structure (context-gated)

detect("UAN 100123456789");
// [{ type: 'uan', value: '100123456789', index: 4, valid: true, confidence: 0.9 }]
detect("number 100123456789 here"); // [] — no "UAN"/"PF" keyword nearby
validate("uan", "100123456789"); // true
maskValue("uan", "100123456789"); // "XXXXXXXX6789"

16. abha — checksum (Verhoeff)

detect("ABHA 12-3456-7890-1230");
// [{ type: 'abha', value: '12-3456-7890-1230', index: 5, valid: true, confidence: 0.99 }]
validate("abha", "12-3456-7890-1230"); // true
validate("abha", "12-3456-7890-1234"); // false (bad checksum)
maskValue("abha", "12-3456-7890-1230"); // "XX-XXXX-XXXX-1230"

17. mobile_in — structure (context-gated; +91 self-identifies)

detect("Call +91 98765 43210");
// [{ type: 'mobile_in', value: '+91 98765 43210', index: 5, valid: true, confidence: 0.9 }]
detect("id 9876543210 here"); // [] — no +91 and no phone keyword nearby
validate("mobile_in", "9876543210"); // true
maskValue("mobile_in", "+91 98765 43210"); // "+XX XXXXX X3210"

18. pincode — structure (context-gated)

detect("PIN code 560001");
// [{ type: 'pincode', value: '560001', index: 9, valid: true, confidence: 0.9 }]
detect("order 560001 shipped"); // [] — no PIN/postal/address keyword nearby
validate("pincode", "560001"); // true
maskValue("pincode", "560001"); // "5XXXXX"

Usage in Node and the browser

The package ships ESM, CommonJS, and TypeScript declarations.

Node (ESM) / bundlers / browsers:

import { detect, validate, mask } from "indian-pii";

Node (CommonJS):

const { detect, validate, mask } = require("indian-pii");

Browser via a CDN (no build step):

<script type="module">
  import { mask } from "https://esm.sh/indian-pii";
  console.log(mask("PAN ABCPK1234Z")); // "PAN AXXXXXXXXZ"
</script>

Try it yourself

Paste this into a file and run it with Node (node try.mjs) after installing:

import { detect, validate, mask } from "indian-pii";

const input = process.argv[2] ?? "PAN ABCPK1234Z, GSTIN 27AAPFU0939F1ZV";
console.log("detected:", detect(input));
console.log("masked:  ", mask(input));
console.log("valid PAN?", validate("pan", "ABCPK1234Z"));
node try.mjs "Aadhaar 2345 6789 0124 and card 4111 1111 1111 1111"

The repo also ships runnable demos in examples/ — after npm run build, run:

node examples/detect-demo.js
node examples/validate-demo.js
node examples/mask-demo.js

Testing

The suite (run with Vitest) covers, for every one of the 18 detectors, at least three valid samples and three invalid ones (wrong checksum, wrong structure, value embedded in a longer string, and empty/null input). It also covers the engine itself: context gating (loose patterns are not bare-matched in free text), UPI-vs-email separation, boundary safety, overlap de-duplication (checksum-validated wins), the requireValid and types options, masking output, and input safety.

npm test

Reading the output: Vitest prints one line per test file with a ✓ (all passing) or ✗ (a failure), then a summary like Tests 55 passed (55). A green summary with zero failures means every detector and engine behaviour is verified.

Honest limitations

  • Format-valid ≠ real. Detection and validation verify format and checksums only. A value that validates is well-formed — it is not proof that the identifier was issued, is active, or belongs to anyone. Use this for redaction, privacy hygiene, and input sanity checks; never as proof of identity or as a substitute for an authoritative verification service.
  • Context-gated detectors trade recall for precision. Loose patterns (MICR, DIN, UAN, mobile, pincode, and the bare CDSL demat form) only fire in free text when a related keyword is nearby. To check such a value directly, call validate(type, value) — it never requires context.
  • Structure-only detectors can over-match in free text. Identifiers without a checksum (e.g. passport, voter ID) are validated by shape; a coincidental matching token may be flagged. Use requireValid and the valid flag to decide how strict your pipeline should be.

Robustness

  • Zero runtime dependencies — nothing to audit downstream.
  • Browser + Node, ESM + CJS, tree-shakeable ("sideEffects": false).
  • Input-safenull/undefined/non-string never throw.
  • Boundary-safe — values glued inside a longer alphanumeric run are ignored.
  • ReDoS-safe — all patterns are linear with bounded quantifiers.
  • Normalization — spaces/hyphens stripped and case folded where formats allow.

FAQ

Is any 12-digit number a valid Aadhaar? No. A real Aadhaar's last digit is a Verhoeff check digit and it never starts with 0 or 1, so most random 12-digit numbers fail validation. indian-pii enforces the Verhoeff checksum, not just the length.

How do I validate a PAN or GSTIN checksum in JavaScript? Call validate("pan", value) to check PAN structure and its holder-type character, and validate("gstin", value) to verify the 15-character structure, the state code, and the GSTN mod-36 check digit — both with zero dependencies, in Node or the browser.

Can it validate IFSC, TAN, CIN, ABHA, UAN, voter ID, or passport numbers too? Yes — see the detector table. Some identifiers are checksum-validated (Aadhaar, GSTIN, payment card, ABHA), others are structure-validated, and a few loose patterns are context-gated to avoid false positives.

Does a passing result mean the identifier is real or active? No. Validation checks format and checksums only — it confirms a value is well-formed, not that it was issued or belongs to anyone. Never use it as proof of identity.

Does it work in the browser? Yes. It is zero-dependency, ships ESM + CommonJS + TypeScript types, and is tree-shakeable.

Can it detect PII inside images? Yes, via the optional indian-pii/image subpath: you run OCR yourself (e.g. with Tesseract.js) and it maps detected PII back to pixel boxes for redaction. No OCR engine or model is bundled.

Roadmap

  • More identifiers (ration card, ESIC, NPS/PRAN, FASTag, bank-account heuristics).
  • Locale-aware name & address heuristics.
  • Optional OCR / document-parsing modules (Aadhaar/PAN card images, PDFs).
  • Configurable redaction policies and streaming/large-text scanning.

Contributing

Issues and pull requests are welcome. Please add or update tests for any detector change (valid and invalid cases) and keep the library dependency-free. Run npm test and npm run typecheck before opening a PR.

License

MIT © 2026 Chandrabhan Shekhawat / Gigai Kripa Services