npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

dedupr

v2.0.2

Published

Node.js tool to find (or remove) duplicates from the specified folders.

Downloads

42

Readme

Dedupr

Quick and dirty Node.js tool to find (and delete) duplicate files. Not the fastest, not the most featured, not the most popular out there, and no voodoo involved. But it works well enough, and can be used directly via the command line or imported as a library on your Node.js app.

Features

  • Can be used via command line or imported as a library
  • Find duplicates based on file size, content hash and filenames (optional)
  • Supports most common hashing algorithms (SHA1, SHA256, SHA512, MD4, MD5 etc)
  • Delete duplicates automatically (optional)
  • Results are exported to a JSON file

How to install

To install globally on your machine please use:

$ npm install dedupr -g

Or to install locally on your current project:

$ npm install dedupr --save

Command line usage

$ dedupr -[options] folders

Detect duplicates on some user folders, using defaults:

$ dedupr /home/joe /home/karen /home/sara

Detect duplicates on all user's home folders, verbose mode activated:

$ dedupr -v /home

Delete duplicates on logged user's home folder, using the "fast" hashing preset:

$ dedupr -d --fast ~/

Delete duplicate images considering filenames, reverse order, MD5 hashing only the first and last 1KB of files (hash size 2KB):

$ dedupr -o /home/custom-dedupr-output.json \
         -e jpg gif png bmp \
         -s 2
         -h md5
         -f -r -d
         ~/photos ~/camera ~/pictures ~/downloads

Importing as a library

import Dedupr from "dedupr"
// const Dedupr = require("dedupr").default

const options = {
    console: true,
    folders: ["/home/user1/photos", "/home/user2/photos", "/home/user3/photos"],
    hashAlgorithm: "sha1"
}

const dedupr = new Dedupr(options)
await dedupr.run()

console.dir(dedupr.results)

Options

console

Enable or disable logging to the console. Enabled by default when using via the command line, but not when using it as a library / programatically.

folders

List of folders that should be scanned. On the command line, these are the last arguments to be passed. If any of these folders do not exist, the tool will throw an exception.

By default, duplicates will be detected based on alphabetical / ascending order. If you pass /folderA, folderB and folderC, in that order, duplicates will be flagged on folderB and folderC only. If a file is present on folderB and folderC, the one inside folderC will be flagged. This behaviour can be changed with the reverse option.

extensions -e

Array of file extensions that should be included. Defaults to all files.

output -o

Save results to the specified output file. Defaults to dedupr.json on the current folder. If you want to disable saving the output, set this to false

reverse -r

Sort folders and files from Z to A to 0 (descending). By default, they are sorted alphabetically (ascending). Please note that this also changes the order of the passed folders, so the very last occurrence of a file will be the non-duplicate.

filename -f

In addition to the hash value and file size, also consider the filename to find duplicates. Meaning files will identical contents but different filenames won't be marked as duplicates. Default is false.

verbose -v

Activate verbose mode with extra logging. Defaults to false.

delete -d

Delete duplicates. Only the very first occurrence of a file will remain (or very last, if reverse is set). Use with caution!

Advanced options

parallel -p

How many files should be hashed in parallel. Defaults to 3.

hashSize -s

How many bytes should be hashed from each file? Defaults to 2048, meaning it uses the first and last 1MB of a file. Depending on your use case and the available CPU power, you might want to reduce this value.

hashAlgorithm -h

Which hashing algorithm should be used. Default is sha256. Some of the other possible values: "sha1", "sha512", "md5", "blake2b512".

Shortcut options

Please note that the options below will always override the hashSize and hashAlgorithm values.

--crazyfast

Same as -s 4 -h sha1. Hashes the first and last 2KB of files, using SHA1. Use with caution, as this might catch some false positives.

--veryfast

Same as -s 64 -h sha1. Hashes the first and last 32KB of files, using SHA1.

--faster

Same as -s 512 -h sha1. Hashes the first and last 256KB of files, using SHA1.

--fast

Same as -s 1024 -h sha256. Hashes the first and last 512KB of files, using SHA256.

--safe

Same as -s 32768 -h sha256. Hashes the first and last 16MB of files, using SHA256. Might be a bit slow if you have many media files.

--safer

Same as -s 131072 -h sha512. Hashes the first and last 64MB of files, using SHA512. Very slow if you have many media files.

Need help?

Post an issue here.