npm-malware-scanner
v0.0.3
Published
Real-time malware scanner for npm packages
Maintainers
Readme
NPM Malware Scanner
Real-time malware scanner for npm packages. Detects install scripts, shell access, obfuscated code, network access, filesystem access, and typosquatting attacks.
External Resources & Research
How many hours did you spend? Roughly 5 hours.
Did you have adequate time to work on the code submission? For an alpha version, I think so.
Did you use any AI coding tools to assist with coding? Yes, ChatGPT and Claude.
Did you leverage external resources? Yes. This project was built using industry best practices and research from security experts. Google, StackOverflow, and reference documentation were also used for research.
Key Resources Used
Supply Chain Security:
- Socket.dev Documentation - Alert types and detection strategies
- npm Security Best Practices - Understanding npm security model
- OWASP Top 10 for CI/CD - CI/CD security risks
Static Analysis Techniques:
- Babel Parser Documentation - AST parsing for JavaScript/TypeScript
- ESLint Source Code - Pattern matching and code analysis techniques
- Shannon Entropy - Obfuscation detection using information theory
Typosquatting Research:
- Levenshtein Distance Algorithm - String similarity measurement
- Typosquatting on PyPI - Academic research on package name attacks
- npm Typosquatting Attacks - Real-world examples
npm Registry APIs:
- npm Registry API - Package metadata and download
- CouchDB Changes Feed - Real-time monitoring
Notable CVEs & Attacks:
- CVE-2021-44906 - Minimist prototype pollution
- event-stream incident - Malicious dependency injection
- ua-parser-js attack - Cryptocurrency miner in popular package
Why These Resources?
- Socket.dev - Understand the product we're building towards
- Academic papers - Proven algorithms for typosquat detection
- Real CVEs - Learn from actual attacks to build better detectors
- npm APIs - Official documentation for reliable integration
- Open source projects - Learn from battle-tested implementations (ESLint, Babel)
Installation
# Global installation
npm install -g npm-malware-scanner
# Or use directly with npx
npx npm-malware-scanner express 4.18.2Usage
Scan a Package
npm-scanner <package-name> <version>
# Examples
npm-scanner express 4.18.2
npm-scanner axios 1.6.0Live Monitoring
Monitor the npm registry feed in real-time:
npm-scanner --liveCI/CD Integration
The scanner automatically detects CI/CD environments and adapts output format.
GitHub Actions:
- name: Security Scan
run: npm-scanner express 4.18.2Other CI Systems:
CI=true npm-scanner express 4.18.2See CI-CD-INTEGRATION.md for detailed integration guides.
Detection Capabilities
Install Scripts
Identifies packages with lifecycle scripts that execute arbitrary code:
- `preinstall`, `install`, `postinstall`
- `preuninstall`, `uninstall`, `postuninstall`
Severity: High
Network Access
Detects packages making network requests:
- Node.js modules: `http`, `https`, `net`, `dgram`, `dns`
- Browser APIs: `fetch`, `XMLHttpRequest`, `WebSocket`, `EventSource`
- Popular libraries: `axios`, `node-fetch`, `got`, `superagent`, `request`
Severity: Medium
Typosquatting
Identifies packages with names similar to popular packages using Levenshtein distance.
Severity: High
Architecture
src/
├── cli.ts # CLI entry point
├── scanner.ts # Scan orchestration
├── types.ts # TypeScript interfaces
├── detectors/
│ ├── install-scripts.ts # Lifecycle script detection
│ ├── network-access.ts # Network access detection (AST + regex)
│ └── typosquat.ts # Typosquat detection
├── npm/
│ ├── registry.ts # Package fetching & extraction
│ └── feed.ts # Live feed monitoring
└── utils/
├── logger.ts # Output formatting
└── environment.ts # CI/CD detectionDesign Decisions
Static Analysis Only
Choice: Analyze code without execution
Rationale: Safe, fast (~500ms per package), effective for most threats
Tradeoff: Cannot detect runtime behavior or heavily obfuscated code
Hybrid Detection (AST + Regex)
Choice: Combine AST parsing with regex patterns
Rationale: AST for accuracy, regex for obfuscated/dynamic code
Tradeoff: Slightly slower but more comprehensive
Popular Packages for Typosquat
Choice: Compare only against top npm packages
Rationale: Fast, practical, low false positives
Tradeoff: Misses typosquats of less popular packages
Extending the Scanner
Adding a New Detector
Create a detector file:
// src/detectors/my-detector.ts
import { Alert, DetectorResult } from '../types';
export class MyDetector {
static async detect(packagePath: string): Promise<DetectorResult> {
const alerts: Alert[] = [];
// Your detection logic
return { alerts };
}
}Register in `src/scanner.ts`:
import { MyDetector } from './detectors/my-detector';
const [installScriptResult, networkAccessResult, typosquatResult, myResult] =
await Promise.all([
InstallScriptDetector.detect(packageInfo.extractedPath),
NetworkAccessDetector.detect(packageInfo.extractedPath),
TyposquatDetector.detect(packageName),
MyDetector.detect(packageInfo.extractedPath), // Add here
]);
alerts.push(...myResult.alerts);Development
# Clone and setup
git clone https://github.com/socket-security/npm-scanner
cd npm-scanner
pnpm install
# Build
pnpm build
# Run tests
pnpm test
# Test with coverage
pnpm test:coverage
# Test a package
pnpm start express 4.18.2
# Test in CI mode
CI=true pnpm start express 4.18.2Testing
The project includes comprehensive unit tests for all detectors:
# Run all tests
pnpm test
# Watch mode
pnpm test:watch
# Coverage report
pnpm test:coverageTest Coverage:
- Install script detection
- Shell access detection (child_process, exec, spawn)
- Obfuscation detection (entropy analysis)
- Network access detection (http, fetch, axios, etc.)
- Filesystem access detection (fs module operations)
- Typosquat detection (Levenshtein distance)
- Edge cases and error handling
Known Limitations
- Static analysis only - Cannot detect runtime behavior
- No dependency scanning - Only scans the target package
- Obfuscation - Heavily obfuscated code may evade detection
- False positives - Legitimate packages may trigger alerts (e.g., HTTP clients)
Performance
- Single package scan: 500ms - 2s
- Network detection: 100-500ms
- Typosquat check: ~50ms
- Live mode throughput: 1-2 packages/second
Contributing
Contributions welcome! Areas of interest:
- New detectors (shell access, crypto mining, data exfiltration)
- Performance improvements
- Better obfuscation detection
- Additional CI/CD integrations
License
MIT
