pastapolice
v1.0.3
Published
Detect copy-paste and semantically similar code in TypeScript/JavaScript projects
Maintainers
Readme
PastaPolice
A CLI tool to detect copy-paste code (duplicate pasta) in your TypeScript/JavaScript projects.
Features
- Semantic Detection (default): Finds semantically similar functions using AST-based normalization (different variable names, same logic)
- Syntactic Detection: Finds exact duplicate code blocks using line-based hashing
Installation
Using Bun (recommended)
bun install -g pastapoliceUsing npm
npm install -g pastapoliceUsage
pastapolice <path> [options]Options
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| --min-lines | -m | Minimum lines for a code block | 5 |
| --syntactic | -s | Use syntactic (line-based) detection | false |
Examples
# Scan current directory (semantic mode)
pastapolice .
# Scan with custom minimum lines
pastapolice . -m 10
# Syntactic mode (exact duplicates)
pastapolice . --syntactic
# Scan specific directory
pastapolice ./src -m 3Modes
Semantic Detection (Default)
Uses AST-based normalization to find functions that do the same thing but have different variable names, formatting, or minor syntax differences.
Syntactic Detection
Finds exact duplicate code blocks by comparing normalized lines. Good for catching literal copy-paste.
How It Works
Semantic Mode
- Parses TypeScript files using TypeScript compiler API
- Extracts function declarations, methods, arrow functions, and function expressions
- Normalizes by removing comments and replacing:
- Identifiers →
VAR1,VAR2,VAR3... - String/numeric literals →
LIT
- Identifiers →
- Hashes normalized representation
- Groups functions with identical hashes as semantic duplicates
Syntactic Mode
- Sliding window scans files for consecutive lines
- Normalizes lines (removes whitespace)
- Hashes using xxhash64
- Groups identical hashes as duplicates
Development
# Install dependencies
bun install
# Run directly
bun run src/index.ts .
# Build
bun run buildLicense
MIT
