dupdup
v1.1.0
Published
Command-line tool for indexing directories and finding duplicate files or comparing directories using SQLite
Downloads
24
Maintainers
Readme
Dupdup
dupdup is a command-line tool for indexing directories and finding duplicate files or comparing directories. It uses SQLite databases to store file metadata (paths, sizes, modification times, MD5 hashes) for fast lookups and comparisons.
Overview
Dupdup helps you:
- Find duplicate files within a directory
- Compare files across different directories
- Verify backups by checking if files exist in multiple locations
- Search for files across indexed directories
- Manage file indexes efficiently with smart incremental updates
Key Features
- Fast Indexing: Uses SQLite databases for efficient storage and queries
- Smart Updates: Skips unchanged files (compares size and mtime) to avoid recomputing hashes
- MD5 Hashing: Uses MD5 hashes for content-based file comparison
- Multiple Root Directories: Support for indexing multiple root directories in a single index
- Path Filtering: Ignore specific paths or path segments during comparison
- Schema Versioning: Automatic database schema migration
Installation
npm installUsage
Create an Index
Create a new empty index file. After creating an index, use add-path to add directories to index.
./dupdup create-index emby.idxAdd Root Directories
Add one or more root directories to an index and build the index. This will index all files in the specified directories.
./dupdup add-path emby.idx /my/data/libraries/photos
./dupdup add-path emby.idx /path/to/dir1 /path/to/dir2Update an Index
Update an existing index. The tool is smart enough to skip over unchanged files (same path, size, and modification time), only computing MD5 hashes for files that need updating.
./dupdup update-index emby.idxRemove Root Directory
Remove a root directory from an index. This will delete all files associated with that root directory from the index.
./dupdup remove-path emby.idx /my/data/libraries/photosCompare Directories
Find files in one index (copy) that don't exist in another index (master). Files are matched by both size and MD5 hash.
./dupdup compare emby-master.idx emby-copy.idxCompare files in a specific subpath of the copy index. If the path ends with /, it's treated as a directory and all files under that directory are compared:
./dupdup compare emby-master.idx emby-copy.idx:some/path
./dupdup compare emby-master.idx emby-copy.idx:some/dir/Ignore specific paths or path segments:
./dupdup compare emby-master.idx emby-copy.idx --ignore .DS_Store --ignore node_modulesFind a File
Search for files by name or path in an index. The search command is an alias for find.
Find by exact basename match:
./dupdup find filename.txt index.idx
./dupdup search filename.txt index.idxFind by hash (locate a file from source index in target index):
./dupdup find source.idx:filename.txt target.idxSearch options:
-i, --ignore-case: Ignore case when searching-s, --substring: Search for substring matches instead of exact matches
Examples:
./dupdup find -i -s photo index.idx
./dupdup search path/to/dir/ index.idxList Duplicates
Find duplicate files within a single index (files with the same size and MD5 hash).
./dupdup list-duplicates emby.idxMerge Indices
Merge one or more source indices into a destination index. This combines all files and root directories from the source indices into the destination.
If the destination index already exists, you must use the -f or --overwrite flag to allow merging into it.
./dupdup merge dest.idx source1.idx source2.idx
./dupdup merge -f dest.idx source1.idx source2.idx
./dupdup merge --overwrite dest.idx source1.idx source2.idxIndex Information
Get statistics about an index including root directories, file counts, and last indexed timestamp.
./dupdup info emby.idxUpgrade Database Schema
Upgrade an index database to the current schema version. Commands will automatically check schema version and prompt you to run this command if the database schema is outdated.
./dupdup upgrade-schema emby.idxHow It Works
Indexing: The tool walks through a directory tree, computing MD5 hashes for files and storing metadata in a SQLite database.
Smart Updates: When updating an index, files with unchanged size and modification time are skipped, avoiding unnecessary hash computations.
Comparison: Files are compared using both size and MD5 hash to ensure accurate duplicate detection.
Database Schema: Each index file stores:
- Multiple root directories (schema version 3)
- Relative path, basename, size, modification time, and MD5 hash for each file
- Indexes on
(size, md5, root_id)and(size, mtime, root_id)for fast queries - Metadata including last indexed timestamp
