unslop

v0.1.7

Published

18 days ago

Detect duplicated code, dead code, and anti-patterns in AI-generated codebases

0High
0Medium
0Low

dprctd

code-quality linter clone-detection ai-code static-analysis

unslop

Standalone CLI tool for detecting duplicated code, dead code, inlined utilities, and semantic anti-patterns in AI-generated codebases. No AI/LLM in the detection pipeline — deterministic analysis only.

Install

go install github.com/unslop/unslop/cmd/unslop@latest

Or build from source:

make build

Requires Go 1.22+ and CGO (for tree-sitter).

Usage

# Scan current directory
unslop .

# Run directly with npx (no go install)
npx unslop .

# Verbose text output (full findings)
unslop --verbose .

# Changed-only review (uncommitted files vs existing code)
unslop --changed-only .

# Scan specific paths
unslop ./src ./lib

# JSON output
unslop --format json .

# SARIF output (for CI integration)
unslop --format sarif .

# List built-in rules and defaults
unslop --list-rules

# With config file
unslop --config .unslop.yaml .

Slop Score is shown in terminal text output (--format text, default). JSON and SARIF outputs remain unchanged in v1. Default text output is compact (priority summary + slop score). Use --verbose for the full finding list. Use --changed-only to focus on uncommitted files and prioritize reuse against existing code.

What It Detects

Quick Reference

| Category | Engine | Reliability | | -------------------------------------- | --------- | :---------: | | Identical constants across files | Clone | 99% | | Copy-pasted functions (same names) | Clone | 99% | | Copy-pasted functions (renamed params) | Clone | 95% | | Reformatted JSX components | Clone | 90% | | Similar functions (small edits) | Clone | 80% | | Equivalent regex patterns | Clone | 95% | | Cross-package export matches | Clone | 99% | | a>b?a:b → Math.max | Oxlint | 99% | | Dead branches | Oxlint | 90% | | Unreachable code | Oxlint | 99% | | Inlined utilities | Oxlint | 95% | | Dead exports | Clone | 95% | | Complexity budget breaches | Practices | 90% |

Rules Reference

10 rules across 3 engines. Use unslop --list-rules to see defaults for your config.

Engine 1: Clone Detection (3 rules)

Tree-sitter parses source into a CST, then a language plugin normalizes it to a language-agnostic tree with alpha-renamed identifiers (a, b, c instead of real names). All three rules operate on these normalized trees.

`exact-clone` — Tier A, Error

SHA-256 fingerprint of the entire normalized tree. Two fragments with identical hashes are exact clones.

Algorithm: deterministic S-expression serialization (Kind:Label child1 child2 ...) → SHA-256. O(n) grouping by hash.
Minimum size: 50 tokens, 8 nodes.
Filters: same-file duplicates, import-linked pairs, rule boilerplate scaffolding (<30 line span in rules scaffold paths), repeated framework idioms (3+ occurrences with diverse names across 2+ files).

`near-clone` — Tier A, Warning

Finds near-miss duplicates via suffix tree / LCS analysis on linearized token sequences.

Linearization: pre-order traversal of normalized tree → flat token sequence with ^ sentinel tokens marking end-of-children.
Bucketing: logarithmic buckets by token count (20–39 → bucket 0, 40–79 → bucket 1, etc.) to avoid O(n²). Each sequence placed in adjacent bucket +1 for cross-boundary matches.
Algorithm: rolling DP longest common substring (two rows, O(n) space), capped at 220,000 DP cells per pair. Falls back to bounded longest common subsequence for edited clones with insertions/deletions.
Threshold: 80% similarity (matchLen / max(len(A), len(B))).
Limits: max 10,000 comparisons, min 50 tokens.
Post-processing: deterministic one-to-one matching per file pair — sorts by similarity descending, prefers same-name pairs, greedy assignment (each fragment matched at most once per file pair). Excludes declarative clones (different names + no control flow or ≤3 statements).

`structural-similarity` — Tier A, Warning

PQ-gram tree profiling for structural similarity between same-kind fragments.

Algorithm: PQ-gram with p=2 ancestors, q=3 siblings. Each gram = stem of p ancestor labels + base of q sibling labels. Similarity via Sørensen–Dice coefficient: 2 * |intersection| / (|A| + |B|).
Eligibility: same fragment Kind only (function-to-function, class-to-class), different files, min 8 nodes, size ratio ≤ 1.8×.
Thresholds (variable by file type):
- Both production files: 70%
- One test + one production: 86%
- Both test files: 90%
Limits: max 50,000 comparisons. Profiles cached per fragment.

Engine 2: Linter Integration (5 rules)

Wraps external linters (e.g. oxlint) run as subprocesses with configurable timeouts. Parses JSON output into findings. Non-fatal — a missing linter produces a warning, not an error.

`inlined-utility` — Tier A, Warning

Hand-rolled code that could use a standard library or utility call (e.g. a > b ? a : b → Math.max).

`dead-code` — Tier A, Error, Gateable

Unreachable code blocks that can never execute.

`dead-branch` — Tier A, Warning

Conditional branches (if/else, ternary) that can never be taken.

`duplication` — Tier A, Warning

Duplicate imports, exports, or repeated patterns detected by the linter.

`dead-export` — Tier A, Warning

Exported symbols that nothing imports.

Engine 3: Best Practices (1 rule)

In-process deterministic checks — pattern matching and metric computation. No external dependencies.

`complexity-budget` — Tier A, Warning

Flags functions that exceed thresholds on 2+ of 3 metrics simultaneously.

| Metric | Strict | Balanced (default) | Lenient | | -------------------------------------- | -----: | -----------------: | ------: | | Decision count (if/loop/switch/select) | ≥10 | ≥12 | ≥14 | | Max nesting depth | ≥3 | ≥4 | ≥5 | | Source lines | ≥75 | ≥90 | ≥110 |

Functions under 20 lines are always excluded.
Headroom for special paths: tooling/infrastructure paths get +20 decisions, +3 nesting, +80 lines. Script functions (/scripts/, .mjs/.cjs, main/walk) get +14/+2/+40. Orchestration functions (/cmd/, /routes/, /handlers/, names starting with handle/route/serve) get +2/+1/+25.
Also fires on single-metric extreme outliers: decisions ≥ threshold+8, nesting ≥ threshold+2, or lines ≥ threshold+40 (when some complexity is also present).

Slop Score

A 0–100 weighted composite of findings and structural smell metrics.

Category weights (base, before profile multiplier):

| Category | Weight | Cap | | --------------------- | -----: | --: | | exact-clone | 9 | 18 | | near-clone | 6 | 18 | | structural-similarity | 5 | 18 | | dead-code | 5 | 18 | | complexity-budget | 5 | 14 | | inlined-utility | 4 | 18 | | duplication | 4 | 18 | | dead-export | 4 | 18 | | dead-branch | 3 | 18 |

Profile multipliers: strict = 1.2×, balanced = 1.0×, lenient = 0.8×.

Per-finding formula: points = weight × profileMultiplier × (0.6 + 0.4 × confidence) × (1.0 + min(0.5, 0.1 × (locations - 1))). Category sums are capped, then total finding points capped at 70.

Smell metrics (up to 30 points):

wrapper-function-density (45% weight) — single-statement wrapper functions as a ratio of total functions.
trivial-declaration-density (35% weight) — declarations with ≤6 nodes and no control flow.
reused-fragment-names (20% weight) — function/class/variable names appearing in 3+ distinct files.

Score bands:

| Range | Band | | -----: | -------- | | 0–14 | Minimal | | 15–34 | Low | | 35–59 | Moderate | | 60–79 | High | | 80–100 | Severe |

Architecture

Three artifact engines run in parallel and are evaluated through a rule catalog:

Clone Detection (in-process) — tree-sitter parsing → pre-normalization (cached) → alpha-renaming → hash/suffix-tree/PQ-gram comparison
Semantic Analysis (subprocess) — external linter diagnostics
Best Practices (in-process) — deterministic clean-code checks

Each rule is a first-class catalog entry with defaults and per-rule controls.

Built on Semgrep's model: tree-sitter CST → language-specific normalizer → generic normalized tree → language-agnostic analysis.

Language Support

TypeScript/TSX — built-in (ships with binary)
JavaScript/JSX — built-in (ships with binary)
More languages via YAML rules and external linter integration

Community Plugin SDK

Community plugin authors should use:

Go package: github.com/unslop/unslop/pkg/sdk

The SDK exposes stable plugin interfaces/normalized types/registry helpers and is licensed under Apache-2.0 so third parties can build custom/community plugins.

Note: the stock unslop binary only includes built-in plugins. External plugins are loaded when they are linked into a custom binary. Community plugins are recommended to live in separate repositories. SDK stability policy: pkg/sdk/STABILITY.md.

Configuration

Create .unslop.yaml in your project root. v2 requires version: 2:

version: 2

analysis:
  ignore:
    - "vendor/"
    - "node_modules/"
  extensions: [".ts", ".tsx", ".go"]
  languages: ["typescript", "go"]
  changed_only: false

engines:
  clone:
    min_tokens: 50
    similarity_threshold: 0.8
    max_suffix_pairs: 10000
    max_pqgram_pairs: 50000
  linters:
    oxlint:
      enabled: true
      command: oxlint
      args: ["--format", "json"]
      timeout: "60s"
  practices:
    enabled: true
    profile: balanced
    ignore_tests: true
    max_findings_per_rule: 200

rules:
  defaults:
    severity: warning
    gateable: false
  exact-clone:
    severity: error
    gateable: true
  complexity-budget:
    paths:
      include: ["apps/**"]
      exclude: ["scripts/**"]

gates:
  max_score: 35
  fail_on_rules: ["exact-clone"]
  fail_on_tiers: ["A"]

slop_score:
  profile: balanced
  top_contributors: 5

Development

make test      # Run all tests with race detector
make bench     # Run benchmarks
make lint      # Run golangci-lint
make build     # Build binary to bin/unslop

# Full npm release (version sync + tests + all-platform build + publish)
make release-npm VERSION=0.1.3

# Same flow without publishing (sanity check)
make release-npm VERSION=0.1.3 RELEASE_ARGS="--dry-run"

License

Repository and SDK: Apache-2.0
See LICENSE and NOTICE

Contribution terms and CLA:

CONTRIBUTING.md
docs/legal/CLA.md
docs/legal/LICENSING.md

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

unslop

Install

Usage

What It Detects

Quick Reference

Rules Reference

Engine 1: Clone Detection (3 rules)

exact-clone — Tier A, Error

near-clone — Tier A, Warning

structural-similarity — Tier A, Warning

Engine 2: Linter Integration (5 rules)

inlined-utility — Tier A, Warning

dead-code — Tier A, Error, Gateable

dead-branch — Tier A, Warning

duplication — Tier A, Warning

dead-export — Tier A, Warning

Engine 3: Best Practices (1 rule)

complexity-budget — Tier A, Warning

Slop Score

Architecture

Language Support

Community Plugin SDK

Configuration

Development

License

`exact-clone` — Tier A, Error

`near-clone` — Tier A, Warning

`structural-similarity` — Tier A, Warning

`inlined-utility` — Tier A, Warning

`dead-code` — Tier A, Error, Gateable

`dead-branch` — Tier A, Warning

`duplication` — Tier A, Warning

`dead-export` — Tier A, Warning

`complexity-budget` — Tier A, Warning