duphunt

v0.1.0

Published

a day ago

Find duplicate files by content (SHA-256) — zero install, cross-platform. npx duphunt . — no brew/apt. Zero dependencies.

0High
0Medium
0Low

yyfjj

duplicate duplicates dedupe finder hash sha256 cli filesystem cleanup disk

duphunt

Find duplicate files by content — anywhere, with nothing to install. The great duplicate finders (fdupes, jdupes, rdfind, fclones) are native binaries you have to brew/apt/cargo install first — which you can't always do on a locked-down box, a colleague's laptop, a CI runner, or a container. duphunt runs the moment you have Node or Python: npx duphunt . or pip install duphunt. Zero dependencies, no network.

$ npx duphunt ~/Downloads

2 duplicate group(s), 5 files, 8.1 MB reclaimable

  4.1 MB × 2   4.1 MB reclaimable
    /Users/me/Downloads/invoice.pdf
    /Users/me/Downloads/invoice (1).pdf

  2.0 MB × 3   4.0 MB reclaimable
    /Users/me/Downloads/clip.mp4
    /Users/me/Downloads/clip-copy.mp4
    /Users/me/Downloads/old/clip.mp4

Groups are sorted biggest-waste-first, so the files worth deleting are at the top.

How it works

Group by size. Two files of different sizes can't be identical, so files with a unique size are never even read.
Hash the collisions. Within each size group, each file is SHA-256 hashed (streamed in 64 KB chunks, so multi-GB files don't blow up memory).
Report identical content. Files with the same hash are true byte-for-byte duplicates, grouped and ranked by reclaimable space.

It reports — it never deletes. You decide what to remove.

Usage

duphunt                      # scan the current directory
duphunt ~/Downloads ~/Desktop   # scan several roots at once
duphunt a.jpg b.jpg c.jpg    # or just compare specific files
duphunt . --json             # machine-readable
duphunt . --min-size 1048576 # ignore files under 1 MB
duphunt . --exit-code        # exit 1 if any duplicates exist (CI gate)

Options

| Flag | Effect | |------|--------| | --json | Emit { groups, summary } as JSON (raw byte sizes, full paths) | | --quiet | Print only the one-line summary | | --min-size <n> | Ignore files smaller than n bytes (default 1 — skips empty files) | | --follow | Follow symlinks (default: skip them, to avoid loops and double-counting) | | --exit-code | Exit 1 when duplicates are found (for CI gates) | | -v, --version | Print version | | -h, --help | Show help |

Notes

Empty files are skipped by default (they all hash alike and are rarely what you mean); pass --min-size 0 to include them.
Symlinks are skipped unless --follow, so a symlinked tree won't be double-counted or loop forever.
Each physical file is counted once. Repeated or overlapping roots and symlink aliases (even under --follow) are de-duplicated by real path, so they never inflate the results — while genuine hard links still surface.
Same tool, two builds. The Node and Python builds hash with SHA-256 and produce identical results — use whichever your environment already has.

`--json` shape

{
  "groups": [
    { "hash": "9f86d0…", "size": 4300000, "count": 2, "wasted": 4300000,
      "paths": ["/a/invoice.pdf", "/b/invoice (1).pdf"] }
  ],
  "summary": { "groups": 1, "files": 2, "wasted": 4300000 }
}

Exit codes

| Code | Meaning | |------|---------| | 0 | success (default — even when duplicates are found) | | 1 | duplicates found and --exit-code was passed | | 2 | error (bad option, missing path) |

By default duphunt is a viewer and exits 0; add --exit-code to gate a pipeline on it.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

duphunt

How it works

Usage

Options

Notes

--json shape

Exit codes

License

`--json` shape