duphunt
v0.1.0
Published
Find duplicate files by content (SHA-256) — zero install, cross-platform. npx duphunt . — no brew/apt. Zero dependencies.
Maintainers
Readme
duphunt
Find duplicate files by content — anywhere, with nothing to install. The
great duplicate finders (fdupes, jdupes, rdfind, fclones) are native
binaries you have to brew/apt/cargo install first — which you can't always
do on a locked-down box, a colleague's laptop, a CI runner, or a container.
duphunt runs the moment you have Node or Python: npx duphunt . or
pip install duphunt. Zero dependencies, no network.
$ npx duphunt ~/Downloads
2 duplicate group(s), 5 files, 8.1 MB reclaimable
4.1 MB × 2 4.1 MB reclaimable
/Users/me/Downloads/invoice.pdf
/Users/me/Downloads/invoice (1).pdf
2.0 MB × 3 4.0 MB reclaimable
/Users/me/Downloads/clip.mp4
/Users/me/Downloads/clip-copy.mp4
/Users/me/Downloads/old/clip.mp4Groups are sorted biggest-waste-first, so the files worth deleting are at the top.
How it works
- Group by size. Two files of different sizes can't be identical, so files with a unique size are never even read.
- Hash the collisions. Within each size group, each file is SHA-256 hashed (streamed in 64 KB chunks, so multi-GB files don't blow up memory).
- Report identical content. Files with the same hash are true byte-for-byte duplicates, grouped and ranked by reclaimable space.
It reports — it never deletes. You decide what to remove.
Usage
duphunt # scan the current directory
duphunt ~/Downloads ~/Desktop # scan several roots at once
duphunt a.jpg b.jpg c.jpg # or just compare specific files
duphunt . --json # machine-readable
duphunt . --min-size 1048576 # ignore files under 1 MB
duphunt . --exit-code # exit 1 if any duplicates exist (CI gate)Options
| Flag | Effect |
|------|--------|
| --json | Emit { groups, summary } as JSON (raw byte sizes, full paths) |
| --quiet | Print only the one-line summary |
| --min-size <n> | Ignore files smaller than n bytes (default 1 — skips empty files) |
| --follow | Follow symlinks (default: skip them, to avoid loops and double-counting) |
| --exit-code | Exit 1 when duplicates are found (for CI gates) |
| -v, --version | Print version |
| -h, --help | Show help |
Notes
- Empty files are skipped by default (they all hash alike and are rarely what
you mean); pass
--min-size 0to include them. - Symlinks are skipped unless
--follow, so a symlinked tree won't be double-counted or loop forever. - Each physical file is counted once. Repeated or overlapping roots and
symlink aliases (even under
--follow) are de-duplicated by real path, so they never inflate the results — while genuine hard links still surface. - Same tool, two builds. The Node and Python builds hash with SHA-256 and produce identical results — use whichever your environment already has.
--json shape
{
"groups": [
{ "hash": "9f86d0…", "size": 4300000, "count": 2, "wasted": 4300000,
"paths": ["/a/invoice.pdf", "/b/invoice (1).pdf"] }
],
"summary": { "groups": 1, "files": 2, "wasted": 4300000 }
}Exit codes
| Code | Meaning |
|------|---------|
| 0 | success (default — even when duplicates are found) |
| 1 | duplicates found and --exit-code was passed |
| 2 | error (bad option, missing path) |
By default duphunt is a viewer and exits 0; add --exit-code to gate a
pipeline on it.
License
MIT
