Find duplicate files based on first few characters of filename

2 votes

2 answers

1210 views

I am looking for a way in Linux shell, preferably bash to find duplicates of files based on first few letters of the filenames. **Where this would be useful:** I build mod packs for Minecraft. As of 1.14.4 Forge no longer errors if there are duplicate mods in a pack of higher versions. It simply stops the oldest versions from running. A script to help find these duplicates would be very advantageous. Example listing:

minecolonies-0.13.312-beta-universal.jar   
minecolonies-0.13.386-alpha-universal.jar

by quickly being able to identify the dupes i can keep the client pack small. **More information as requested** There is no specific format. However as you can see there at least 2 prevailing formats. Further there is no standard in community about what kind of characters to use or not use. Some use spaces (ick), some use [] (also ick), some use _'s (more ick), some use -'s (preferred but what can you do). https://gist.github.com/be3cc9a77150194476b2000cb8ee16e5 for sample mods list of the filenames. Has been cleaned so no dupes in it. https://gist.github.com/b0ac1e03145e893e880da45cf08ebd7a contains a sample where I deliberately made duplicates. It is an over-exaggeration of happens from time to time. **Deeper Explanation** I realize this might be resource heavy to do. I would like to arbitrarily specify a slice range start to finish of all filenames to sample. Find duplicates based on that slice, and then hilight the duplicates. I don't need the script to actually delete them. **Extra Credit** The script would present a menu for files that it suspects match the duplication criterion allowing for easy deleting or renaming.

Asked by Kreezxil (75 rep)

Oct 29, 2020, 04:43 PM
Last activity: Aug 17, 2022, 09:45 AM

Find duplicate files based on first few characters of filename

Related Questions