npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

dupdup

v1.1.0

Published

Command-line tool for indexing directories and finding duplicate files or comparing directories using SQLite

Downloads

24

Readme

Dupdup

dupdup is a command-line tool for indexing directories and finding duplicate files or comparing directories. It uses SQLite databases to store file metadata (paths, sizes, modification times, MD5 hashes) for fast lookups and comparisons.

Overview

Dupdup helps you:

  • Find duplicate files within a directory
  • Compare files across different directories
  • Verify backups by checking if files exist in multiple locations
  • Search for files across indexed directories
  • Manage file indexes efficiently with smart incremental updates

Key Features

  • Fast Indexing: Uses SQLite databases for efficient storage and queries
  • Smart Updates: Skips unchanged files (compares size and mtime) to avoid recomputing hashes
  • MD5 Hashing: Uses MD5 hashes for content-based file comparison
  • Multiple Root Directories: Support for indexing multiple root directories in a single index
  • Path Filtering: Ignore specific paths or path segments during comparison
  • Schema Versioning: Automatic database schema migration

Installation

npm install

Usage

Create an Index

Create a new empty index file. After creating an index, use add-path to add directories to index.

./dupdup create-index emby.idx

Add Root Directories

Add one or more root directories to an index and build the index. This will index all files in the specified directories.

./dupdup add-path emby.idx /my/data/libraries/photos
./dupdup add-path emby.idx /path/to/dir1 /path/to/dir2

Update an Index

Update an existing index. The tool is smart enough to skip over unchanged files (same path, size, and modification time), only computing MD5 hashes for files that need updating.

./dupdup update-index emby.idx

Remove Root Directory

Remove a root directory from an index. This will delete all files associated with that root directory from the index.

./dupdup remove-path emby.idx /my/data/libraries/photos

Compare Directories

Find files in one index (copy) that don't exist in another index (master). Files are matched by both size and MD5 hash.

./dupdup compare emby-master.idx emby-copy.idx

Compare files in a specific subpath of the copy index. If the path ends with /, it's treated as a directory and all files under that directory are compared:

./dupdup compare emby-master.idx emby-copy.idx:some/path
./dupdup compare emby-master.idx emby-copy.idx:some/dir/

Ignore specific paths or path segments:

./dupdup compare emby-master.idx emby-copy.idx --ignore .DS_Store --ignore node_modules

Find a File

Search for files by name or path in an index. The search command is an alias for find.

Find by exact basename match:

./dupdup find filename.txt index.idx
./dupdup search filename.txt index.idx

Find by hash (locate a file from source index in target index):

./dupdup find source.idx:filename.txt target.idx

Search options:

  • -i, --ignore-case: Ignore case when searching
  • -s, --substring: Search for substring matches instead of exact matches

Examples:

./dupdup find -i -s photo index.idx
./dupdup search path/to/dir/ index.idx

List Duplicates

Find duplicate files within a single index (files with the same size and MD5 hash).

./dupdup list-duplicates emby.idx

Merge Indices

Merge one or more source indices into a destination index. This combines all files and root directories from the source indices into the destination.

If the destination index already exists, you must use the -f or --overwrite flag to allow merging into it.

./dupdup merge dest.idx source1.idx source2.idx
./dupdup merge -f dest.idx source1.idx source2.idx
./dupdup merge --overwrite dest.idx source1.idx source2.idx

Index Information

Get statistics about an index including root directories, file counts, and last indexed timestamp.

./dupdup info emby.idx

Upgrade Database Schema

Upgrade an index database to the current schema version. Commands will automatically check schema version and prompt you to run this command if the database schema is outdated.

./dupdup upgrade-schema emby.idx

How It Works

  1. Indexing: The tool walks through a directory tree, computing MD5 hashes for files and storing metadata in a SQLite database.

  2. Smart Updates: When updating an index, files with unchanged size and modification time are skipped, avoiding unnecessary hash computations.

  3. Comparison: Files are compared using both size and MD5 hash to ensure accurate duplicate detection.

  4. Database Schema: Each index file stores:

    • Multiple root directories (schema version 3)
    • Relative path, basename, size, modification time, and MD5 hash for each file
    • Indexes on (size, md5, root_id) and (size, mtime, root_id) for fast queries
    • Metadata including last indexed timestamp