Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

121 votes

14 answers

85515 views

Find duplicate files

files duplicate-files

Is it possible to find duplicate files on my disk which are bit to bit identical but have different file-names?

                                  Is it possible to find duplicate files on my disk which are bit to bit identical but have different file-names?
                                

student (18865 rep)

Apr 4, 2013, 01:18 PM • Last activity: Jul 10, 2025, 03:09 PM

1 votes

0 answers

66 views

How can I find multiple duplicates of media files,sort, backup them and delete the rest?

filesystems backup sort file-copy duplicate-files

I have a 4 TB hard drive containing pictures, sounds, and videos from the last 15 years. These files were copied onto this drive from various sources, including hard drives, cameras, phones, CD-ROMs, DVDs, USB sticks, SD cards, and downloads. The files come in formats such as JPEG, PNG, GIF, SVG, VO...

                                  I have a 4 TB hard drive containing pictures, sounds, and videos from the last 15 years. 
These files were copied onto this drive from various sources, including hard drives, cameras, phones, CD-ROMs, DVDs, USB sticks, SD cards, and downloads. 

The files come in formats such as JPEG, PNG, GIF, SVG, VOB, MP4, MPEG, MOV, AVI, SWF, WMV, FLV, 3GP, WAV, WMA, AAC, and OGG. Over the years, the files have been copied back and forth between different file systems, including FAT, exFAT, NTFS, HFS+/APFS, and ext3/ext4. Currently, the hard drive uses the ext4 file system.
There are folders and files that appear multiple times (duplicates, triplicates, or even more). 

The problem is that the folder and filenames are not always identical. 

For example:

1. A folder named "bilder_2012" might appear elsewhere as "backup_bilder_2012" or "media_2012_backup_2016".

2. In some cases, newer folders contain additional files that were not present in the older versions.

3. The files themselves may have inconsistent names, such as "bild1", "bild2" in one folder and "bilder2018(1)", "bilder2018(2)" in another.

What I Want to Achieve:

1. Sort and clean up the files and Remove all duplicates and copy the remaining files to a new hard drive.

2. Identify the original copies and Is there a way to determine which version of a file is the earliest/original?

3. Preserve the original folder names and For example, I know that "bilder_2012" was the first name given to a folder, and I like to keep that name if possible.

4. Standardize file naming and After copying, I like the files to follow a consistent naming scheme, such as.
Folder: "bilder2012" and Files: "bilder2012(1).jpeg", "bilder2012(2).jpeg", etc.

Is there a way to automate this process while ensuring the oldest/original files are preserved and duplicates are safely removed?

Bernd Kunze (11 rep)

Jun 21, 2025, 09:49 AM • Last activity: Jun 25, 2025, 07:26 AM

0 votes

1 answers

331 views

Find and delete all duplicate files by hash

command-line duplicate-files

As the title suggests, I'm looking to check a bunch of files on a Linux system, and keep only one of each hash. For the files, the filename is irrelevant, the only important part is the hash itself. I did find this question which partly answers my question in that it finds all the duplicates. https:...

find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Any ideas/suggestions as to add deleting to this answer? I guess I could use something like php/python to parse the output and split the files into groups by the blank line, then skip the first entry in each group if the file exists, and then delete the rest if they exist.

AeroMaxx (227 rep)

Nov 22, 2024, 02:58 PM • Last activity: Nov 22, 2024, 04:37 PM

1 votes

1 answers

150 views

Skip .app folders when using "fdupes" with the option "--recurse"

shell-script macos duplicate-files fdupes

I am using `fdupes` to print the list of duplicate files in a certain folder, with the option `--recurse`. However, since I am using macOS, the recursing process regards Mac apps (which appear to be folders ended with `.app`) as folders and runs into them, thus producing lots of unwanted extra infor...

                                  I am using fdupes to print the list of duplicate files in a certain folder, with the option --recurse.

However, since I am using macOS, the recursing process regards Mac apps (which appear to be folders ended with .app) as folders and runs into them, thus producing lots of unwanted extra information on duplicated files. Is there some way to skip such folders?

Jinwen (113 rep)

Apr 7, 2024, 09:04 PM • Last activity: Apr 8, 2024, 04:31 PM

1 votes

3 answers

914 views

Detect duplicate folders with identical content

duplicate-files

I often have folders with different names but the same content. For example, I copy a folder to another location, for ease of access, and then I forget to delete the copy. How can I detect the duplicate folders, with the same content? For detecting duplicate files, I use Czkawka, but did not find a...

                                  I often have folders with different names but the same content.

For example, I copy a folder to another location, for ease of access, and then I forget to delete the copy.

How can I detect the duplicate folders, with the same content?

For detecting duplicate files, I use Czkawka, but did not find a similar tool for duplicate folders.

Similar questions:

detect duplicate folders but ignoring the presence of empty folders
https://stackoverflow.com/questions/43560796/how-to-find-duplicate-directories 

detect duplicate folders but the folder name matters
https://unix.stackexchange.com/questions/288591/find-and-list-duplicate-directories

Ilario Gelmetti (108 rep)

Sep 22, 2023, 12:46 PM • Last activity: Oct 21, 2023, 08:47 PM

209 votes

20 answers

78136 views

Is there an easy way to replace duplicate files with hardlinks?

files hard-link deduplication duplicate-files

I'm looking for an easy way (a command or series of commands, probably involving `find`) to find duplicate files in two directories, and replace the files in one directory with hardlinks of the files in the other directory. Here's the situation: This is a file server which multiple people store audi...

                                  I'm looking for an easy way (a command or series of commands, probably involving find) to find duplicate files in two directories, and replace the files in one directory with hardlinks of the files in the other directory.

Here's the situation: This is a file server which multiple people store audio files on, each user having their own folder. Sometimes multiple people have copies of the exact same audio files. Right now, these are duplicates. I'd like to make it so they're hardlinks, to save hard drive space.

Josh (8728 rep)

Oct 12, 2010, 07:23 PM • Last activity: Jun 7, 2023, 03:16 PM

3 votes

2 answers

2978 views

How to use `rmlint` to remove duplicates only from one location and leave all else untouched?

shell deduplication duplicate-files rmlint

I have two locations `/path/to/a` and `/path/to/b`. I need to find duplicate files in both paths and remove only items in `/path/to/b` items. `rmlint` generates quite a large removal script, but it contains entries from both paths (and even empty folders) for removal. I ran `rmlint` with the followi...

I have two locations /path/to/a and /path/to/b. I need to find duplicate files in both paths and remove only items in /path/to/b items. rmlint generates quite a large removal script, but it contains entries from both paths (and even empty folders) for removal. I ran rmlint with the following arguments to obtain this result, which I thought would yield ONLY /path/to/a being selected for removal:

rmlint -g -e -S p /path/to/a /path/to/b

ylluminate (686 rep)

Mar 14, 2021, 06:06 PM • Last activity: Mar 12, 2023, 01:36 AM

0 votes

1 answers

135 views

How can you list a directory using the inode not the directory name? I have the same directory name appearing twice with different inodes

directory ls inode duplicate-files

When I do a directory listing of a python installation the `include` directory appears twice and each one has a different `inode`. ╰─○ ls -i1 2282047 bin 2641630 include 2642559 include 2282048 lib 2641850 share I assume that their contents may be different as the inodes are different. Is there away...

                                  When I do a directory listing of a python installation the include directory appears twice and each one has a different inode.

    ╰─○ ls -i1
    2282047 bin
    2641630 include
    2642559 include 
    2282048 lib
    2641850 share

I assume that their contents may be different as the inodes are different.

Is there away to use the ls command to use the inode not the directory name so I can check them individually?

When I execute ls include I have no idea which directory is listed.

                                

vfclists (7909 rep)

Oct 11, 2022, 03:22 AM • Last activity: Oct 11, 2022, 04:10 AM

0 votes

1 answers

938 views

Find Duplicate Files, but Specify a Directory to Keep

linux bash duplicate-files fdupes

I am working on de-cluttering a company shared drive, and looking to remove duplicates. Is there any duplicate finding program that allows you to specify which directory's duplicates are to be removed? I would like to be able to do: `fdupes -rdN some_Folder master_folder` so that it preferentially k...

                                  I am working on de-cluttering a company shared drive, and looking to remove duplicates.
Is there any duplicate finding program that allows you to specify which directory's duplicates are to be removed?

I would like to be able to do: fdupes -rdN some_Folder master_folder so that it preferentially keeps the duplicates in one folder over the other folder.

This involves thousands of files, so doing it by hand is not really an option. If rdfind's results file is the only way to do it, what's the best way to use that file?

George Coffey (1 rep)

Jun 10, 2022, 07:16 PM • Last activity: Jun 10, 2022, 08:11 PM

22 votes

8 answers

15825 views

case-insensitive search of duplicate file-names

find uniq case-sensitivity duplicate-files

I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?

                                  I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
                                

lamcro (923 rep)

Oct 18, 2011, 07:02 PM • Last activity: Dec 17, 2021, 04:16 PM

2 votes

1 answers

442 views

Remove all duplicate image files except for 1

zsh macos duplicate-files

I have a folder of images that contain quite a bit of duplicates, I'd like to remove all duplicates except for one. Upon Googling I found this clever script from [this post][1] that succinctly does *almost* what I want it to do: ``` #!/bin/sh -eu find "${1:-.}" -type f ! -empty -print0 | xargs -0 md...

I have a folder of images that contain quite a bit of duplicates, I'd like to remove all duplicates except for one. Upon Googling I found this clever script from this post that succinctly does *almost* what I want it to do:

#!/bin/sh -eu
find "${1:-.}" -type f ! -empty -print0 | xargs -0 md5 -r | \
    awk '$1 in a{sub("^.{33}","");printf "%s\0",$0}a[$1]+=1{}' | \
    xargs -0 rm -v --

Unfortunately I am still fairly green when it comes to UNIX shell scripting so I'm not sure what the actual commands/flags for each piece are doing here so I am unable to modify it for my specific needs. From my understanding: find "${1:-.}" -type f ! -empty -print0 - searches the current directory for non-empty files and prints the file names. (not sure what the piece "${1:-.}" means though) | xargs -0 md5 -r - Pipes the results above (via the xargs -0 command?) into the md5 command to get the md5 hash signature of each file (-r reverses the output to make it a single line?) awk '$1 in a{sub("^.{33}","");printf "%s\0",$0}a[$1]+=1{}' - This is where I get lost.. - $1 in a{sub("^.{33}","") - takes the input up until the first whitespace character and replaces the first 33 characters from the start of the string with nothing (sub("^.{33}","") - printf "%s\0" - format prints the entire string - a{...,$0} - I'm not sure what this does - a[$1]+=1{} - Not sure either xargs -0 rm -v -- - Pipes the results to the rm command, printing each file name via -v, but I'm not sure what the syntax -- is for. When I run this, it outputs like this ./test3.jpg./test2.jpg./test.jpg: No such file or directory so there must be a formatting issue. My question is: 1. Can this be modified to remove all files except 1? 2. Can someone help explain the gaps in what the commands/syntax means as I've outlined above? I'm sure this is probably easy for someone who knows UNIX well but unfortunately that person is not me. Thank you in advance! For context: I'm running this in ZSH in macOS BigSur 11.

cdouble.bhuck (123 rep)

Dec 12, 2021, 05:26 PM • Last activity: Dec 12, 2021, 10:32 PM

100 votes

3 answers

157012 views

What's the quickest way to find duplicated files?

centos files find duplicate-files

I found this command used to find duplicated files but it was quite long and made me confused. For example, if I remove `-printf "%s\n"`, nothing came out. Why was that? Besides, why have they used `xargs -I{} -n1`? Is there any easier way to find duplicated files? [4a-o07-d1:root/798]#find -not -em...

                                  I found this command used to find duplicated files but it was quite long and made me confused. 

For example, if I remove -printf "%s\n", nothing came out. Why was that? Besides, why have they used xargs -I{} -n1? 

Is there any easier way to find duplicated files?

    [4a-o07-d1:root/798]#find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
    0bee89b07a248e27c83fc3d5951213c1  ./test1.txt
    0bee89b07a248e27c83fc3d5951213c1  ./test2.txt

The One (5112 rep)

Apr 20, 2016, 02:46 AM • Last activity: Nov 3, 2021, 08:26 AM

1 votes

1 answers

827 views

How to use rmlint to merge two large folders?

shell duplicate deduplication duplicate-files rmlint

In exploring options to merge two folders, I've come across a very powerful tool known as [rmlint][1]. It has some useful [documentation][2] (and [Gentle Guide][3]). I have a scenario that I previously mentioned and to which I received some great answers: https://unix.stackexchange.com/q/628685/4701...

In exploring options to merge two folders, I've come across a very powerful tool known as rmlint . It has some useful documentation (and Gentle Guide ). I have a scenario that I previously mentioned and to which I received some great answers: https://unix.stackexchange.com/q/628685/47012 I was leaning towards the rdfind answer, but as I was researching it a bit I stumbled upon rmlint and found the developer's discussion on duplicate isolation to be quite elucidating. While reviewing all of this I found a couple interesting arguments: --merge-directories --honour-dir-layout I thus tried an incantation as follows:

rmlint -T "bi,bl,df,dd" --progress --merge-directories --honour-dir-layout A B

Unfortunately the saved command that I'm to execute is rather enormous given my large scenario and haven't really been able to isolate a manageable smaller subset thus far to test on to establish any degree of confidence before firing this up. I tried to find a way to do a trial run so it might print out what it would be doing vs showing it to me in a script so as to emulate the actions that would be taken, but I'm not finding this option (maybe I'm just bleary eyed and overlooking it?). I therefore thought I could and should pose a question here to this end: **Has anyone had any success at merging duplicate data sets with rmlint and, if so, what arguments would you suggest to merge two folders such that the goals of my earlier question may be reasonably met?** *To briefly restate: The ultimate goal is to get everything that is unique to B into A, while deleting everything in B that is already present in A and anything that has a data contents conflict (ie, non-unique contents) between A and B then leave in both for manual compare such that it will be relatively easy to find these in B after after execution.*

ylluminate (686 rep)

Jan 13, 2021, 06:27 AM • Last activity: Jan 13, 2021, 05:20 PM

2 votes

2 answers

662 views

Delete duplicates from another directory recursively

filenames duplicate-files

(N.B. There are many similar questions (e.g. [here](https://stackoverflow.com/q/32651413/575530), [here](https://unix.stackexchange.com/q/524310/160404), [here](https://stackoverflow.com/q/21337587/575530), and [here](https://stackoverflow.com/q/32489574/575530)) but they either assume that the dire...

                                  (N.B. There are many similar questions (e.g. [here](https://stackoverflow.com/q/32651413/575530) , [here](https://unix.stackexchange.com/q/524310/160404) , [here](https://stackoverflow.com/q/21337587/575530) , and [here](https://stackoverflow.com/q/32489574/575530))  but they either assume that the directory structure is one-deep, or the answers are more complex multi-line scripts.)

This is my situation:



    .
    ├── to_keep
    │   ├── a
    │   │   └── duplicate1.txt
    │   └── b
    │       ├── duplicate2.txt
    │       └── unique1.txt
    └── to_purge
        ├── c
        │   └── duplicate1.txt
        └── d
            ├── duplicate2.txt
            └── unique2.txt

Is there a simple one line script that will use the basenames found in to_keep (and its sub-directories) and remove files with the same name from to_purge (and its sub-directories)?

The two attempts I have made both fail. 

(In both I have used find -print to test the command, with the intention of swapping it to find -delete when it is working.)

The first uses $():



    find ./to_purge/ -print -name $(find ./to_keep/ -type f -printf "%f\n")



    find: paths must precede expression: `duplicate2.txt'

The second uses xargs:



    find ./to_keep/ -type f -printf "%f\n" | xargs --max-args=1 find ./to_purge/ -print -name



    ./to_purge/
    ./to_purge/c
    ./to_purge/c/duplicate1.txt
    ./to_purge/d
    ./to_purge/d/duplicate2.txt
    ./to_purge/d/unique2.txt
    ./to_purge/
    ./to_purge/c
    ./to_purge/c/duplicate1.txt
    ./to_purge/d
    ./to_purge/d/duplicate2.txt
    ./to_purge/d/unique2.txt
    ./to_purge/
    ./to_purge/c
    ./to_purge/c/duplicate1.txt
    ./to_purge/d
    ./to_purge/d/duplicate2.txt
    ./to_purge/d/unique2.txt

Neither attempt works. What have I got wrong?
                                

dumbledad (121 rep)

Jan 9, 2021, 07:11 AM • Last activity: Jan 12, 2021, 02:51 PM

1 votes

3 answers

5369 views

What is the most efficient way to find duplicate files?

fdupes duplicate-files

I have a number of folders with a few million files (amounting to a few TB) in total. I wish to find duplicates across all files. The output ideally is a simple list of dupes - I will process them further with my own scripts. I know that there is an `fdupes` command which apparently uses "file sizes...

                                  I have a number of folders with a few million files (amounting to a few TB) in total.  I wish to find duplicates across all files.  The output ideally is a simple list of dupes - I will process them further with my own scripts.

I know that there is an fdupes command which apparently uses "file sizes and MD5 signatures" to compare files.

What is unclear to me is whether files that are unique in size are read (and their hash computed) which I do not want.  With the sheer amount of data in my situation care needs to be taken not to do any more disk I/O than absolutely necessary.  Also, the amount of temporary space used ought to be minimal.

Ned64 (9256 rep)

Feb 28, 2020, 06:59 PM • Last activity: Dec 6, 2020, 09:20 PM

30 votes

5 answers

14709 views

Finding duplicate files and replace them with symlinks

symlink fdupes duplicate-files

I'm trying to find a way to check inside a given directory for duplicate files (even with different names) and replace them with symlinks pointing to the first occurrence. I've tried with `fdupes` but it just lists those duplicates. That's the context: I'm customizing an icon theme to my liking, and...

                                  I'm trying to find a way to check inside a given directory for duplicate files (even with different names) and replace them with symlinks pointing to the first occurrence. I've tried with fdupes but it just lists those duplicates.  
That's the context: I'm customizing an icon theme to my liking, and I've found that many icons, even if they have different names and different locations inside their parent folder, and are used for different purposes, basically are just the same picture. Since applying the same modification twenty or thirty times is redundant when just one is really necessary, I want to keep just one image and symlink all the others.

As an example, if I run fdupes -r ./ inside the directory testdir, it might return to me the following results:

    ./file1.png
    ./file2.png
    ./subdir1/anotherfile.png
    ./subdir1/subdir2/yetanotherfile.png

Given this output, I'd like to keep just the file file1.png, delete all the others and replace them with symlinks pointing to it, while maintaining all original file names. So file2.png will retain its name, but will become a link to file1.png instead of being a duplicate.

Those links should not point to an absolute path, but should be relative to the parent testdir directory; i.e. yetanotherfile.png will be point to ../../file1.png, not to /home/testuser/.icons/testdir/file1.png

I'm interested both in solutions that involve a GUI and CLI. It is not mandatory to use fdupes I've cited it because it is a tool that I know, but I'm open to solutions that use other tools as well.

I'm pretty sure that a bash script to handle all of this should not be that difficult to create, but I'm not expert enough to find out how to write it myself.
                                

Sekhemty (924 rep)

Sep 14, 2014, 02:33 PM • Last activity: Jun 30, 2020, 10:06 PM

1 votes

0 answers

523 views

Copy unique files to new directory?

shell-script file-copy duplicate-files

I have a number of folders with my various media (e.g. photos, music) from different points in time. The different folders have some of the same content (e.g. a photo might be in 2 folders), but should be mostly unique. There are no guarantees on the filename in different folders - e.g. a photo might be present as A/foo.png and B/bar.png. Alternatively, A/baz.png and B/baz.png might not be the same file. I'm looking for some way to consolidate all of the media into a single, flat folder, with duplicates removed. Ideally, some tracking of where the files originally came from would be nice (e.g. knowing that output/001.png came from A/baz.png, etc), but this isn't strictly necessary. There are a lot (1M+ files), so the faster the better :). I originally tried to just copy all of the files from the folders into a new folder, but this took a long time, and would only deduplicate if the filenames are identical, which isn't true in this case. I think there might be some way to get this command to go faster with xargs -P but I wasn't sure how.

find . -type f -exec cp {} \;

A two stage system or similar is fine - e.g. first flatten and rename all of the files into a new folder so that they all have unique filenames, and then filter out duplicates. I have the storage space to do that, I'm just not sure how to do it.

Vasu (111 rep)

May 27, 2020, 06:49 AM • Last activity: May 27, 2020, 06:59 AM

0 votes

2 answers

1750 views

Compare large directories recursively - but ignoring sub-directories - compare two backups - with gui

directory diff directory-structure file-comparison duplicate-files

I've got 2 very old Backups of a friends computer. They were simply copied into a folder each on an external Harddrive. Both are about 300GB in Size and the contents are very much alike but not identical and the folder-structure is different. I want to free that space and make one single Backup of t...

                                  I've got 2 very old Backups of a friends computer. They were simply copied into a folder each on an external Harddrive. Both are about 300GB in Size and the contents are very much alike but not identical and the folder-structure is different. I want to free that space and make one single Backup of those two. I think about 90% of the files are douplicates, but i dont want to miss the files that are not.

So what I need is a program that compares the files in two directories with all their subdirectories but ignoring these subdirectories.
All files within Folder A should be compared with All Files in Folder B.
All exact douplicates in Folder B should be marked/moved(/deleted). I will handle the remains in Folder B manually.

I've tried meld, I've tried Gnome-Commander (I'm using Xubuntu with XFCE)

I would enjoy a gui-solution but I should be able to handle terminal and scripts too.
I thought it may be possible to build a file-list for both sides and pipe these to some diff-program, but how to do it exactly is out of my capabilities.

Well, looking forward to your answers,
turtle purple
                                

Turtlepurple (103 rep)

Jan 26, 2020, 01:10 PM • Last activity: May 23, 2020, 10:04 PM

0 votes

1 answers

59 views

move files from directory A to directory B, without duplicating any files

linux files duplicate-files

I am looking to take files from directory B and copy them into directory A. However, I know that are some same named files in each. I would only like to have only the "non-matching" file names copied into directory A. Thanks

                                  I am looking to take files from directory B and copy them into directory A. However, I know that are some same named files in each. I would only like to have only the "non-matching" file names copied into directory A.

Thanks

stormctr2 (65 rep)

Apr 16, 2020, 03:11 PM • Last activity: Apr 16, 2020, 03:25 PM

0 votes

0 answers

70 views

Create duplicate files on 2 different locations?

filesystems cloning duplicate-files

tough to find information to my idea as people normally look forward to find and remove duplicates not vice versa. I have an application running to control a heating system. It uses a file based database to store configuration and sensor data. There is a backup routine implemented in the software wh...

                                  tough to find information to my idea as people normally look forward to find and remove duplicates not vice versa.

I have an application running to control a heating system.
It uses a file based database to store configuration and sensor data.
There is a backup routine implemented in the software which is running on a daily basis.

My question: is it possible to intervene from outside and clone the live output to an external storage? In case of an hardware failure would the loss of data be absolutely minimal instead of up to 24 hours.

I was hoping it would be as easy as using symbolic links.

You may wonder if it is necessary and worth the effort. Yes it is. These heating systems are of gigantic dimensions and need real-time data eg. for efficiency to keep fuel consumption as low as possible. We are talking about several ton of fuel per hour.

Cheers
Jan
                                

Jan S (57 rep)

Sep 12, 2019, 07:33 AM

Showing page 1 of 20 total questions