@lnpg/unicode-scrub
v1.0.0
Published
Scan and sanitize source files by removing invisible, dangerous, unwanted Unicode characters before build.
Maintainers
Readme
unicode-scrub
Scan and sanitise source files by removing invisible, dangerous, and unwanted Unicode characters before build.
unicode-scrub is a lightweight CLI and library that protects codebases from:
- invisible Unicode characters (zero-width spaces, joiners, BOMs)
- bidirectional control characters (Trojan Source class issues)
- non-breaking spaces and copy-paste artifacts
It is designed to run safely in build pipelines and CI.
Supported file types
By default, unicode-scrub scans:
- TypeScript / JavaScript (
.ts,.tsx,.js,.jsx) - SCSS / CSS (
.scss,.css) - HTML (
.html) - PHP (
.php) - Markdown (
.md)
Markdown is treated differently from code: visible typography (curly quotes, dashes) is preserved.
Installation
Run directly with npx:
npx unicode-scrub "src/**/*"Or install locally:
npm install --save-dev @lnpg/unicode-scrubUsage
Scan only (no changes)
unicode-scrub "src/**/*"Scans files and reports issues. Exits with a non-zero status if any issues are found.
Fix files in place
unicode-scrub --fix "src/**/*"Removes or normalises offending characters and rewrites files safely.
Typical build integration
{
"scripts": {
"prebuild": "unicode-scrub \"src/**/*\"",
"build": "vite build"
}
}This ensures Unicode hygiene before every build.
Options
| Option | Description |
| ------ | ----------- |
| --fix | Rewrite files in place. |
| --config <path> | Path to JSON config file. |
| --extensions <list> | Comma-separated extensions to include. |
| --ignore <glob> | Ignore patterns (repeatable) |
| --json | Output machine-readable JSON. |
| --max-issues <n> | Limit number of issues printed |
| --fail-on-issues | Exit non-zero if issues are found (default: true) |
Configuration
Optional JSON configuration file:
{
"normaliseSpaces:": true,
"byExt": {
"md": {
"normaliseSpaces": true
}
}
}- Global options apply to all file types.
byExtallows per-extension overrides.
What this tool does not do
- It does not parse ASTs.
- It does not lint syntax.
