Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
121
votes
14
answers
85515
views
Find duplicate files
Is it possible to find duplicate files on my disk which are bit to bit identical but have different file-names?
Is it possible to find duplicate files on my disk which are bit to bit identical but have different file-names?
student
(18865 rep)
Apr 4, 2013, 01:18 PM
• Last activity: Jul 10, 2025, 03:09 PM
1
votes
0
answers
66
views
How can I find multiple duplicates of media files,sort, backup them and delete the rest?
I have a 4 TB hard drive containing pictures, sounds, and videos from the last 15 years. These files were copied onto this drive from various sources, including hard drives, cameras, phones, CD-ROMs, DVDs, USB sticks, SD cards, and downloads. The files come in formats such as JPEG, PNG, GIF, SVG, VO...
I have a 4 TB hard drive containing pictures, sounds, and videos from the last 15 years.
These files were copied onto this drive from various sources, including hard drives, cameras, phones, CD-ROMs, DVDs, USB sticks, SD cards, and downloads.
The files come in formats such as JPEG, PNG, GIF, SVG, VOB, MP4, MPEG, MOV, AVI, SWF, WMV, FLV, 3GP, WAV, WMA, AAC, and OGG. Over the years, the files have been copied back and forth between different file systems, including FAT, exFAT, NTFS, HFS+/APFS, and ext3/ext4. Currently, the hard drive uses the ext4 file system.
There are folders and files that appear multiple times (duplicates, triplicates, or even more).
The problem is that the folder and filenames are not always identical.
For example:
1. A folder named "bilder_2012" might appear elsewhere as "backup_bilder_2012" or "media_2012_backup_2016".
2. In some cases, newer folders contain additional files that were not present in the older versions.
3. The files themselves may have inconsistent names, such as "bild1", "bild2" in one folder and "bilder2018(1)", "bilder2018(2)" in another.
What I Want to Achieve:
1. Sort and clean up the files and Remove all duplicates and copy the remaining files to a new hard drive.
2. Identify the original copies and Is there a way to determine which version of a file is the earliest/original?
3. Preserve the original folder names and For example, I know that "bilder_2012" was the first name given to a folder, and I like to keep that name if possible.
4. Standardize file naming and After copying, I like the files to follow a consistent naming scheme, such as.
Folder: "bilder2012" and Files: "bilder2012(1).jpeg", "bilder2012(2).jpeg", etc.
Is there a way to automate this process while ensuring the oldest/original files are preserved and duplicates are safely removed?
Bernd Kunze
(11 rep)
Jun 21, 2025, 09:49 AM
• Last activity: Jun 25, 2025, 07:26 AM
0
votes
1
answers
331
views
Find and delete all duplicate files by hash
As the title suggests, I'm looking to check a bunch of files on a Linux system, and keep only one of each hash. For the files, the filename is irrelevant, the only important part is the hash itself. I did find this question which partly answers my question in that it finds all the duplicates. https:...
As the title suggests, I'm looking to check a bunch of files on a Linux system, and keep only one of each hash. For the files, the filename is irrelevant, the only important part is the hash itself.
I did find this question which partly answers my question in that it finds all the duplicates.
https://superuser.com/questions/487810/find-all-duplicate-files-by-md5-hash
The above linked question has this as an answer.
find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
Any ideas/suggestions as to add deleting to this answer?
I guess I could use something like php/python to parse the output and split the files into groups by the blank line, then skip the first entry in each group if the file exists, and then delete the rest if they exist.
AeroMaxx
(227 rep)
Nov 22, 2024, 02:58 PM
• Last activity: Nov 22, 2024, 04:37 PM
1
votes
1
answers
150
views
Skip .app folders when using "fdupes" with the option "--recurse"
I am using `fdupes` to print the list of duplicate files in a certain folder, with the option `--recurse`. However, since I am using macOS, the recursing process regards Mac apps (which appear to be folders ended with `.app`) as folders and runs into them, thus producing lots of unwanted extra infor...
I am using
fdupes
to print the list of duplicate files in a certain folder, with the option --recurse
.
However, since I am using macOS, the recursing process regards Mac apps (which appear to be folders ended with .app
) as folders and runs into them, thus producing lots of unwanted extra information on duplicated files. Is there some way to skip such folders?
Jinwen
(113 rep)
Apr 7, 2024, 09:04 PM
• Last activity: Apr 8, 2024, 04:31 PM
1
votes
3
answers
914
views
Detect duplicate folders with identical content
I often have folders with different names but the same content. For example, I copy a folder to another location, for ease of access, and then I forget to delete the copy. How can I detect the duplicate folders, with the same content? For detecting duplicate files, I use Czkawka, but did not find a...
I often have folders with different names but the same content.
For example, I copy a folder to another location, for ease of access, and then I forget to delete the copy.
How can I detect the duplicate folders, with the same content?
For detecting duplicate files, I use Czkawka, but did not find a similar tool for duplicate folders.
Similar questions:
detect duplicate folders but ignoring the presence of empty folders
https://stackoverflow.com/questions/43560796/how-to-find-duplicate-directories
detect duplicate folders but the folder name matters
https://unix.stackexchange.com/questions/288591/find-and-list-duplicate-directories
Ilario Gelmetti
(108 rep)
Sep 22, 2023, 12:46 PM
• Last activity: Oct 21, 2023, 08:47 PM
209
votes
20
answers
78136
views
Is there an easy way to replace duplicate files with hardlinks?
I'm looking for an easy way (a command or series of commands, probably involving `find`) to find duplicate files in two directories, and replace the files in one directory with hardlinks of the files in the other directory. Here's the situation: This is a file server which multiple people store audi...
I'm looking for an easy way (a command or series of commands, probably involving
find
) to find duplicate files in two directories, and replace the files in one directory with hardlinks of the files in the other directory.
Here's the situation: This is a file server which multiple people store audio files on, each user having their own folder. Sometimes multiple people have copies of the exact same audio files. Right now, these are duplicates. I'd like to make it so they're hardlinks, to save hard drive space.
Josh
(8728 rep)
Oct 12, 2010, 07:23 PM
• Last activity: Jun 7, 2023, 03:16 PM
3
votes
2
answers
2978
views
How to use `rmlint` to remove duplicates only from one location and leave all else untouched?
I have two locations `/path/to/a` and `/path/to/b`. I need to find duplicate files in both paths and remove only items in `/path/to/b` items. `rmlint` generates quite a large removal script, but it contains entries from both paths (and even empty folders) for removal. I ran `rmlint` with the followi...
I have two locations
/path/to/a
and /path/to/b
. I need to find duplicate files in both paths and remove only items in /path/to/b
items. rmlint
generates quite a large removal script, but it contains entries from both paths (and even empty folders) for removal.
I ran rmlint
with the following arguments to obtain this result, which I thought would yield ONLY /path/to/a
being selected for removal:
rmlint -g -e -S p /path/to/a /path/to/b
ylluminate
(686 rep)
Mar 14, 2021, 06:06 PM
• Last activity: Mar 12, 2023, 01:36 AM
0
votes
1
answers
135
views
How can you list a directory using the inode not the directory name? I have the same directory name appearing twice with different inodes
When I do a directory listing of a python installation the `include` directory appears twice and each one has a different `inode`. ╰─○ ls -i1 2282047 bin 2641630 include 2642559 include 2282048 lib 2641850 share I assume that their contents may be different as the inodes are different. Is there away...
When I do a directory listing of a python installation the
include
directory appears twice and each one has a different inode
.
╰─○ ls -i1
2282047 bin
2641630 include
2642559 include
2282048 lib
2641850 share
I assume that their contents may be different as the inodes are different.
Is there away to use the ls
command to use the inode not the directory name so I can check them individually?
When I execute ls include
I have no idea which directory is listed.
vfclists
(7909 rep)
Oct 11, 2022, 03:22 AM
• Last activity: Oct 11, 2022, 04:10 AM
0
votes
1
answers
938
views
Find Duplicate Files, but Specify a Directory to Keep
I am working on de-cluttering a company shared drive, and looking to remove duplicates. Is there any duplicate finding program that allows you to specify which directory's duplicates are to be removed? I would like to be able to do: `fdupes -rdN some_Folder master_folder` so that it preferentially k...
I am working on de-cluttering a company shared drive, and looking to remove duplicates.
Is there any duplicate finding program that allows you to specify which directory's duplicates are to be removed?
I would like to be able to do:
fdupes -rdN some_Folder master_folder
so that it preferentially keeps the duplicates in one folder over the other folder.
This involves thousands of files, so doing it by hand is not really an option. If rdfind's results file is the only way to do it, what's the best way to use that file?
George Coffey
(1 rep)
Jun 10, 2022, 07:16 PM
• Last activity: Jun 10, 2022, 08:11 PM
22
votes
8
answers
15825
views
case-insensitive search of duplicate file-names
I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
lamcro
(923 rep)
Oct 18, 2011, 07:02 PM
• Last activity: Dec 17, 2021, 04:16 PM
2
votes
1
answers
442
views
Remove all duplicate image files except for 1
I have a folder of images that contain quite a bit of duplicates, I'd like to remove all duplicates except for one. Upon Googling I found this clever script from [this post][1] that succinctly does *almost* what I want it to do: ``` #!/bin/sh -eu find "${1:-.}" -type f ! -empty -print0 | xargs -0 md...
I have a folder of images that contain quite a bit of duplicates, I'd like to remove all duplicates except for one.
Upon Googling I found this clever script from this post that succinctly does *almost* what I want it to do:
#!/bin/sh -eu
find "${1:-.}" -type f ! -empty -print0 | xargs -0 md5 -r | \
awk '$1 in a{sub("^.{33}","");printf "%s\0",$0}a[$1]+=1{}' | \
xargs -0 rm -v --
Unfortunately I am still fairly green when it comes to UNIX shell scripting so I'm not sure what the actual commands/flags for each piece are doing here so I am unable to modify it for my specific needs.
From my understanding:
find "${1:-.}" -type f ! -empty -print0
- searches the current directory for non-empty files and prints the file names. (not sure what the piece "${1:-.}"
means though)
| xargs -0 md5 -r
- Pipes the results above (via the xargs -0
command?) into the md5
command to get the md5 hash signature of each file (-r
reverses the output to make it a single line?)
awk '$1 in a{sub("^.{33}","");printf "%s\0",$0}a[$1]+=1{}'
- This is where I get lost..
- $1 in a{sub("^.{33}","")
- takes the input up until the first whitespace character and replaces the first 33 characters from the start of the string with nothing (sub("^.{33}",""
)
- printf "%s\0"
- format prints the entire string
- a{...,$0}
- I'm not sure what this does
- a[$1]+=1{}
- Not sure either
xargs -0 rm -v --
- Pipes the results to the rm
command, printing each file name via -v
, but I'm not sure what the syntax --
is for.
When I run this, it outputs like this ./test3.jpg./test2.jpg./test.jpg: No such file or directory
so there must be a formatting issue.
My question is:
1. Can this be modified to remove all files except 1?
2. Can someone help explain the gaps in what the commands/syntax means as I've outlined above?
I'm sure this is probably easy for someone who knows UNIX well but unfortunately that person is not me. Thank you in advance!
For context: I'm running this in ZSH in macOS BigSur 11.
cdouble.bhuck
(123 rep)
Dec 12, 2021, 05:26 PM
• Last activity: Dec 12, 2021, 10:32 PM
100
votes
3
answers
157012
views
What's the quickest way to find duplicated files?
I found this command used to find duplicated files but it was quite long and made me confused. For example, if I remove `-printf "%s\n"`, nothing came out. Why was that? Besides, why have they used `xargs -I{} -n1`? Is there any easier way to find duplicated files? [4a-o07-d1:root/798]#find -not -em...
I found this command used to find duplicated files but it was quite long and made me confused.
For example, if I remove
-printf "%s\n"
, nothing came out. Why was that? Besides, why have they used xargs -I{} -n1
?
Is there any easier way to find duplicated files?
[4a-o07-d1:root/798]#find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
0bee89b07a248e27c83fc3d5951213c1 ./test1.txt
0bee89b07a248e27c83fc3d5951213c1 ./test2.txt
The One
(5112 rep)
Apr 20, 2016, 02:46 AM
• Last activity: Nov 3, 2021, 08:26 AM
1
votes
1
answers
827
views
How to use rmlint to merge two large folders?
In exploring options to merge two folders, I've come across a very powerful tool known as [rmlint][1]. It has some useful [documentation][2] (and [Gentle Guide][3]). I have a scenario that I previously mentioned and to which I received some great answers: https://unix.stackexchange.com/q/628685/4701...
In exploring options to merge two folders, I've come across a very powerful tool known as rmlint . It has some useful documentation (and Gentle Guide ).
I have a scenario that I previously mentioned and to which I received some great answers:
https://unix.stackexchange.com/q/628685/47012
I was leaning towards the
rdfind
answer, but as I was researching it a bit I stumbled upon rmlint
and found the developer's discussion on duplicate isolation to be quite elucidating.
While reviewing all of this I found a couple interesting arguments:
--merge-directories --honour-dir-layout
I thus tried an incantation as follows:
rmlint -T "bi,bl,df,dd" --progress --merge-directories --honour-dir-layout A B
Unfortunately the saved command that I'm to execute is rather enormous given my large scenario and haven't really been able to isolate a manageable smaller subset thus far to test on to establish any degree of confidence before firing this up. I tried to find a way to do a trial run so it might print out what it would be doing vs showing it to me in a script so as to emulate the actions that would be taken, but I'm not finding this option (maybe I'm just bleary eyed and overlooking it?).
I therefore thought I could and should pose a question here to this end:
**Has anyone had any success at merging duplicate data sets with rmlint
and, if so, what arguments would you suggest to merge two folders such that the goals of my earlier question may be reasonably met?**
*To briefly restate: The ultimate goal is to get everything that is unique to B into A, while deleting everything in B that is already present in A and anything that has a data contents conflict (ie, non-unique contents) between A and B then leave in both for manual compare such that it will be relatively easy to find these in B after after execution.*
ylluminate
(686 rep)
Jan 13, 2021, 06:27 AM
• Last activity: Jan 13, 2021, 05:20 PM
2
votes
2
answers
662
views
Delete duplicates from another directory recursively
(N.B. There are many similar questions (e.g. [here](https://stackoverflow.com/q/32651413/575530), [here](https://unix.stackexchange.com/q/524310/160404), [here](https://stackoverflow.com/q/21337587/575530), and [here](https://stackoverflow.com/q/32489574/575530)) but they either assume that the dire...
(N.B. There are many similar questions (e.g. [here](https://stackoverflow.com/q/32651413/575530) , [here](https://unix.stackexchange.com/q/524310/160404) , [here](https://stackoverflow.com/q/21337587/575530) , and [here](https://stackoverflow.com/q/32489574/575530)) but they either assume that the directory structure is one-deep, or the answers are more complex multi-line scripts.)
This is my situation:
.
├── to_keep
│ ├── a
│ │ └── duplicate1.txt
│ └── b
│ ├── duplicate2.txt
│ └── unique1.txt
└── to_purge
├── c
│ └── duplicate1.txt
└── d
├── duplicate2.txt
└── unique2.txt
Is there a simple one line script that will use the basenames found in
to_keep
(and its sub-directories) and remove files with the same name from to_purge
(and its sub-directories)?
The two attempts I have made both fail.
(In both I have used find -print
to test the command, with the intention of swapping it to find -delete
when it is working.)
The first uses $()
:
find ./to_purge/ -print -name $(find ./to_keep/ -type f -printf "%f\n")
find: paths must precede expression: `duplicate2.txt'
The second uses xargs
:
find ./to_keep/ -type f -printf "%f\n" | xargs --max-args=1 find ./to_purge/ -print -name
./to_purge/
./to_purge/c
./to_purge/c/duplicate1.txt
./to_purge/d
./to_purge/d/duplicate2.txt
./to_purge/d/unique2.txt
./to_purge/
./to_purge/c
./to_purge/c/duplicate1.txt
./to_purge/d
./to_purge/d/duplicate2.txt
./to_purge/d/unique2.txt
./to_purge/
./to_purge/c
./to_purge/c/duplicate1.txt
./to_purge/d
./to_purge/d/duplicate2.txt
./to_purge/d/unique2.txt
Neither attempt works. What have I got wrong?
dumbledad
(121 rep)
Jan 9, 2021, 07:11 AM
• Last activity: Jan 12, 2021, 02:51 PM
1
votes
3
answers
5369
views
What is the most efficient way to find duplicate files?
I have a number of folders with a few million files (amounting to a few TB) in total. I wish to find duplicates across all files. The output ideally is a simple list of dupes - I will process them further with my own scripts. I know that there is an `fdupes` command which apparently uses "file sizes...
I have a number of folders with a few million files (amounting to a few TB) in total. I wish to find duplicates across all files. The output ideally is a simple list of dupes - I will process them further with my own scripts.
I know that there is an
fdupes
command which apparently uses "file sizes and MD5 signatures" to compare files.
What is unclear to me is whether files that are unique in size are read (and their hash computed) which I do not want. With the sheer amount of data in my situation care needs to be taken not to do any more disk I/O than absolutely necessary. Also, the amount of temporary space used ought to be minimal.
Ned64
(9256 rep)
Feb 28, 2020, 06:59 PM
• Last activity: Dec 6, 2020, 09:20 PM
30
votes
5
answers
14709
views
Finding duplicate files and replace them with symlinks
I'm trying to find a way to check inside a given directory for duplicate files (even with different names) and replace them with symlinks pointing to the first occurrence. I've tried with `fdupes` but it just lists those duplicates. That's the context: I'm customizing an icon theme to my liking, and...
I'm trying to find a way to check inside a given directory for duplicate files (even with different names) and replace them with symlinks pointing to the first occurrence. I've tried with
fdupes
but it just lists those duplicates.
That's the context: I'm customizing an icon theme to my liking, and I've found that many icons, even if they have different names and different locations inside their parent folder, and are used for different purposes, basically are just the same picture. Since applying the same modification twenty or thirty times is redundant when just one is really necessary, I want to keep just one image and symlink all the others.
As an example, if I run fdupes -r ./
inside the directory testdir
, it might return to me the following results:
./file1.png
./file2.png
./subdir1/anotherfile.png
./subdir1/subdir2/yetanotherfile.png
Given this output, I'd like to keep just the file file1.png
, delete all the others and replace them with symlinks pointing to it, while maintaining all original file names. So file2.png
will retain its name, but will become a link to file1.png
instead of being a duplicate.
Those links should not point to an absolute path, but should be relative to the parent testdir
directory; i.e. yetanotherfile.png
will be point to ../../file1.png
, not to /home/testuser/.icons/testdir/file1.png
I'm interested both in solutions that involve a GUI and CLI. It is not mandatory to use fdupes
I've cited it because it is a tool that I know, but I'm open to solutions that use other tools as well.
I'm pretty sure that a bash script to handle all of this should not be that difficult to create, but I'm not expert enough to find out how to write it myself.
Sekhemty
(924 rep)
Sep 14, 2014, 02:33 PM
• Last activity: Jun 30, 2020, 10:06 PM
1
votes
0
answers
523
views
Copy unique files to new directory?
I have a number of folders with my various media (e.g. photos, music) from different points in time. The different folders have some of the same content (e.g. a photo might be in 2 folders), but should be mostly unique. There are no guarantees on the filename in different folders - e.g. a photo migh...
I have a number of folders with my various media (e.g. photos, music) from different points in time. The different folders have some of the same content (e.g. a photo might be in 2 folders), but should be mostly unique. There are no guarantees on the filename in different folders - e.g. a photo might be present as
A/foo.png
and B/bar.png
. Alternatively, A/baz.png
and B/baz.png
might not be the same file.
I'm looking for some way to consolidate all of the media into a single, flat folder, with duplicates removed. Ideally, some tracking of where the files originally came from would be nice (e.g. knowing that output/001.png
came from A/baz.png
, etc), but this isn't strictly necessary. There are a lot (1M+ files), so the faster the better :).
I originally tried to just copy all of the files from the folders into a new folder, but this took a long time, and would only deduplicate if the filenames are identical, which isn't true in this case. I think there might be some way to get this command to go faster with xargs -P
but I wasn't sure how.
find . -type f -exec cp {} \;
A two stage system or similar is fine - e.g. first flatten and rename all of the files into a new folder so that they all have unique filenames, and then filter out duplicates. I have the storage space to do that, I'm just not sure how to do it.
Vasu
(111 rep)
May 27, 2020, 06:49 AM
• Last activity: May 27, 2020, 06:59 AM
0
votes
2
answers
1750
views
Compare large directories recursively - but ignoring sub-directories - compare two backups - with gui
I've got 2 very old Backups of a friends computer. They were simply copied into a folder each on an external Harddrive. Both are about 300GB in Size and the contents are very much alike but not identical and the folder-structure is different. I want to free that space and make one single Backup of t...
I've got 2 very old Backups of a friends computer. They were simply copied into a folder each on an external Harddrive. Both are about 300GB in Size and the contents are very much alike but not identical and the folder-structure is different. I want to free that space and make one single Backup of those two. I think about 90% of the files are douplicates, but i dont want to miss the files that are not.
So what I need is a program that compares the files in two directories with all their subdirectories but ignoring these subdirectories.
All files within Folder A should be compared with All Files in Folder B.
All exact douplicates in Folder B should be marked/moved(/deleted). I will handle the remains in Folder B manually.
I've tried meld, I've tried Gnome-Commander (I'm using Xubuntu with XFCE)
I would enjoy a gui-solution but I should be able to handle terminal and scripts too.
I thought it may be possible to build a file-list for both sides and pipe these to some diff-program, but how to do it exactly is out of my capabilities.
Well, looking forward to your answers,
turtle purple
Turtlepurple
(103 rep)
Jan 26, 2020, 01:10 PM
• Last activity: May 23, 2020, 10:04 PM
0
votes
1
answers
59
views
move files from directory A to directory B, without duplicating any files
I am looking to take files from directory B and copy them into directory A. However, I know that are some same named files in each. I would only like to have only the "non-matching" file names copied into directory A. Thanks
I am looking to take files from directory B and copy them into directory A. However, I know that are some same named files in each. I would only like to have only the "non-matching" file names copied into directory A.
Thanks
stormctr2
(65 rep)
Apr 16, 2020, 03:11 PM
• Last activity: Apr 16, 2020, 03:25 PM
0
votes
0
answers
70
views
Create duplicate files on 2 different locations?
tough to find information to my idea as people normally look forward to find and remove duplicates not vice versa. I have an application running to control a heating system. It uses a file based database to store configuration and sensor data. There is a backup routine implemented in the software wh...
tough to find information to my idea as people normally look forward to find and remove duplicates not vice versa.
I have an application running to control a heating system.
It uses a file based database to store configuration and sensor data.
There is a backup routine implemented in the software which is running on a daily basis.
My question: is it possible to intervene from outside and clone the live output to an external storage? In case of an hardware failure would the loss of data be absolutely minimal instead of up to 24 hours.
I was hoping it would be as easy as using symbolic links.
You may wonder if it is necessary and worth the effort. Yes it is. These heating systems are of gigantic dimensions and need real-time data eg. for efficiency to keep fuel consumption as low as possible. We are talking about several ton of fuel per hour.
Cheers
Jan
Jan S
(57 rep)
Sep 12, 2019, 07:33 AM
Showing page 1 of 20 total questions