Find duplicate files based on first few characters of filename
2
votes
2
answers
1210
views
I am looking for a way in Linux shell, preferably bash to find duplicates of files based on first few letters of the filenames.
**Where this would be useful:**
I build mod packs for Minecraft. As of 1.14.4 Forge no longer errors if there are duplicate mods in a pack of higher versions. It simply stops the oldest versions from running. A script to help find these duplicates would be very advantageous.
Example listing:
minecolonies-0.13.312-beta-universal.jar
minecolonies-0.13.386-alpha-universal.jar
by quickly being able to identify the dupes i can keep the client pack small.
**More information as requested**
There is no specific format. However as you can see there at least 2 prevailing formats. Further there is no standard in community about what kind of characters to use or not use. Some use spaces (ick), some use [] (also ick), some use _'s (more ick), some use -'s (preferred but what can you do).
https://gist.github.com/be3cc9a77150194476b2000cb8ee16e5 for sample mods list of the filenames. Has been cleaned so no dupes in it.
https://gist.github.com/b0ac1e03145e893e880da45cf08ebd7a contains a sample where I deliberately made duplicates. It is an over-exaggeration of happens from time to time.
**Deeper Explanation**
I realize this might be resource heavy to do.
I would like to arbitrarily specify a slice range start to finish of all filenames to sample. Find duplicates based on that slice, and then hilight the duplicates. I don't need the script to actually delete them.
**Extra Credit**
The script would present a menu for files that it suspects match the duplication criterion allowing for easy deleting or renaming.
Asked by Kreezxil
(75 rep)
Oct 29, 2020, 04:43 PM
Last activity: Aug 17, 2022, 09:45 AM
Last activity: Aug 17, 2022, 09:45 AM