unslop
v0.1.7
Published
Detect duplicated code, dead code, and anti-patterns in AI-generated codebases
Maintainers
Readme
unslop
Standalone CLI tool for detecting duplicated code, dead code, inlined utilities, and semantic anti-patterns in AI-generated codebases. No AI/LLM in the detection pipeline — deterministic analysis only.
Install
go install github.com/unslop/unslop/cmd/unslop@latestOr build from source:
make buildRequires Go 1.22+ and CGO (for tree-sitter).
Usage
# Scan current directory
unslop .
# Run directly with npx (no go install)
npx unslop .
# Verbose text output (full findings)
unslop --verbose .
# Changed-only review (uncommitted files vs existing code)
unslop --changed-only .
# Scan specific paths
unslop ./src ./lib
# JSON output
unslop --format json .
# SARIF output (for CI integration)
unslop --format sarif .
# List built-in rules and defaults
unslop --list-rules
# With config file
unslop --config .unslop.yaml .Slop Score is shown in terminal text output (--format text, default). JSON and SARIF outputs remain unchanged in v1.
Default text output is compact (priority summary + slop score). Use --verbose for the full finding list.
Use --changed-only to focus on uncommitted files and prioritize reuse against existing code.
What It Detects
Quick Reference
| Category | Engine | Reliability |
| -------------------------------------- | --------- | :---------: |
| Identical constants across files | Clone | 99% |
| Copy-pasted functions (same names) | Clone | 99% |
| Copy-pasted functions (renamed params) | Clone | 95% |
| Reformatted JSX components | Clone | 90% |
| Similar functions (small edits) | Clone | 80% |
| Equivalent regex patterns | Clone | 95% |
| Cross-package export matches | Clone | 99% |
| a>b?a:b → Math.max | Oxlint | 99% |
| Dead branches | Oxlint | 90% |
| Unreachable code | Oxlint | 99% |
| Inlined utilities | Oxlint | 95% |
| Dead exports | Clone | 95% |
| Complexity budget breaches | Practices | 90% |
Rules Reference
10 rules across 3 engines. Use unslop --list-rules to see defaults for your config.
Engine 1: Clone Detection (3 rules)
Tree-sitter parses source into a CST, then a language plugin normalizes it to a language-agnostic tree with alpha-renamed identifiers (a, b, c instead of real names). All three rules operate on these normalized trees.
exact-clone — Tier A, Error
SHA-256 fingerprint of the entire normalized tree. Two fragments with identical hashes are exact clones.
- Algorithm: deterministic S-expression serialization
(Kind:Label child1 child2 ...)→ SHA-256. O(n) grouping by hash. - Minimum size: 50 tokens, 8 nodes.
- Filters: same-file duplicates, import-linked pairs, rule boilerplate scaffolding (<30 line span in rules scaffold paths), repeated framework idioms (3+ occurrences with diverse names across 2+ files).
near-clone — Tier A, Warning
Finds near-miss duplicates via suffix tree / LCS analysis on linearized token sequences.
- Linearization: pre-order traversal of normalized tree → flat token sequence with
^sentinel tokens marking end-of-children. - Bucketing: logarithmic buckets by token count (20–39 → bucket 0, 40–79 → bucket 1, etc.) to avoid O(n²). Each sequence placed in adjacent bucket +1 for cross-boundary matches.
- Algorithm: rolling DP longest common substring (two rows, O(n) space), capped at 220,000 DP cells per pair. Falls back to bounded longest common subsequence for edited clones with insertions/deletions.
- Threshold: 80% similarity (
matchLen / max(len(A), len(B))). - Limits: max 10,000 comparisons, min 50 tokens.
- Post-processing: deterministic one-to-one matching per file pair — sorts by similarity descending, prefers same-name pairs, greedy assignment (each fragment matched at most once per file pair). Excludes declarative clones (different names + no control flow or ≤3 statements).
structural-similarity — Tier A, Warning
PQ-gram tree profiling for structural similarity between same-kind fragments.
- Algorithm: PQ-gram with p=2 ancestors, q=3 siblings. Each gram = stem of p ancestor labels + base of q sibling labels. Similarity via Sørensen–Dice coefficient:
2 * |intersection| / (|A| + |B|). - Eligibility: same fragment Kind only (function-to-function, class-to-class), different files, min 8 nodes, size ratio ≤ 1.8×.
- Thresholds (variable by file type):
- Both production files: 70%
- One test + one production: 86%
- Both test files: 90%
- Limits: max 50,000 comparisons. Profiles cached per fragment.
Engine 2: Linter Integration (5 rules)
Wraps external linters (e.g. oxlint) run as subprocesses with configurable timeouts. Parses JSON output into findings. Non-fatal — a missing linter produces a warning, not an error.
inlined-utility — Tier A, Warning
Hand-rolled code that could use a standard library or utility call (e.g. a > b ? a : b → Math.max).
dead-code — Tier A, Error, Gateable
Unreachable code blocks that can never execute.
dead-branch — Tier A, Warning
Conditional branches (if/else, ternary) that can never be taken.
duplication — Tier A, Warning
Duplicate imports, exports, or repeated patterns detected by the linter.
dead-export — Tier A, Warning
Exported symbols that nothing imports.
Engine 3: Best Practices (1 rule)
In-process deterministic checks — pattern matching and metric computation. No external dependencies.
complexity-budget — Tier A, Warning
Flags functions that exceed thresholds on 2+ of 3 metrics simultaneously.
| Metric | Strict | Balanced (default) | Lenient | | -------------------------------------- | -----: | -----------------: | ------: | | Decision count (if/loop/switch/select) | ≥10 | ≥12 | ≥14 | | Max nesting depth | ≥3 | ≥4 | ≥5 | | Source lines | ≥75 | ≥90 | ≥110 |
- Functions under 20 lines are always excluded.
- Headroom for special paths: tooling/infrastructure paths get +20 decisions, +3 nesting, +80 lines. Script functions (
/scripts/,.mjs/.cjs,main/walk) get +14/+2/+40. Orchestration functions (/cmd/,/routes/,/handlers/, names starting withhandle/route/serve) get +2/+1/+25. - Also fires on single-metric extreme outliers: decisions ≥ threshold+8, nesting ≥ threshold+2, or lines ≥ threshold+40 (when some complexity is also present).
Slop Score
A 0–100 weighted composite of findings and structural smell metrics.
Category weights (base, before profile multiplier):
| Category | Weight | Cap | | --------------------- | -----: | --: | | exact-clone | 9 | 18 | | near-clone | 6 | 18 | | structural-similarity | 5 | 18 | | dead-code | 5 | 18 | | complexity-budget | 5 | 14 | | inlined-utility | 4 | 18 | | duplication | 4 | 18 | | dead-export | 4 | 18 | | dead-branch | 3 | 18 |
Profile multipliers: strict = 1.2×, balanced = 1.0×, lenient = 0.8×.
Per-finding formula: points = weight × profileMultiplier × (0.6 + 0.4 × confidence) × (1.0 + min(0.5, 0.1 × (locations - 1))). Category sums are capped, then total finding points capped at 70.
Smell metrics (up to 30 points):
- wrapper-function-density (45% weight) — single-statement wrapper functions as a ratio of total functions.
- trivial-declaration-density (35% weight) — declarations with ≤6 nodes and no control flow.
- reused-fragment-names (20% weight) — function/class/variable names appearing in 3+ distinct files.
Score bands:
| Range | Band | | -----: | -------- | | 0–14 | Minimal | | 15–34 | Low | | 35–59 | Moderate | | 60–79 | High | | 80–100 | Severe |
Architecture
Three artifact engines run in parallel and are evaluated through a rule catalog:
- Clone Detection (in-process) — tree-sitter parsing → pre-normalization (cached) → alpha-renaming → hash/suffix-tree/PQ-gram comparison
- Semantic Analysis (subprocess) — external linter diagnostics
- Best Practices (in-process) — deterministic clean-code checks
Each rule is a first-class catalog entry with defaults and per-rule controls.
Built on Semgrep's model: tree-sitter CST → language-specific normalizer → generic normalized tree → language-agnostic analysis.
Language Support
- TypeScript/TSX — built-in (ships with binary)
- JavaScript/JSX — built-in (ships with binary)
- More languages via YAML rules and external linter integration
Community Plugin SDK
Community plugin authors should use:
- Go package:
github.com/unslop/unslop/pkg/sdk
The SDK exposes stable plugin interfaces/normalized types/registry helpers and
is licensed under Apache-2.0 so third parties can build
custom/community plugins.
Note: the stock unslop binary only includes built-in plugins. External
plugins are loaded when they are linked into a custom binary.
Community plugins are recommended to live in separate repositories.
SDK stability policy: pkg/sdk/STABILITY.md.
Configuration
Create .unslop.yaml in your project root. v2 requires version: 2:
version: 2
analysis:
ignore:
- "vendor/"
- "node_modules/"
extensions: [".ts", ".tsx", ".go"]
languages: ["typescript", "go"]
changed_only: false
engines:
clone:
min_tokens: 50
similarity_threshold: 0.8
max_suffix_pairs: 10000
max_pqgram_pairs: 50000
linters:
oxlint:
enabled: true
command: oxlint
args: ["--format", "json"]
timeout: "60s"
practices:
enabled: true
profile: balanced
ignore_tests: true
max_findings_per_rule: 200
rules:
defaults:
severity: warning
gateable: false
exact-clone:
severity: error
gateable: true
complexity-budget:
paths:
include: ["apps/**"]
exclude: ["scripts/**"]
gates:
max_score: 35
fail_on_rules: ["exact-clone"]
fail_on_tiers: ["A"]
slop_score:
profile: balanced
top_contributors: 5Development
make test # Run all tests with race detector
make bench # Run benchmarks
make lint # Run golangci-lint
make build # Build binary to bin/unslop
# Full npm release (version sync + tests + all-platform build + publish)
make release-npm VERSION=0.1.3
# Same flow without publishing (sanity check)
make release-npm VERSION=0.1.3 RELEASE_ARGS="--dry-run"License
- Repository and SDK:
Apache-2.0 - See
LICENSEandNOTICE
Contribution terms and CLA:
CONTRIBUTING.mddocs/legal/CLA.mddocs/legal/LICENSING.md
