Sample Header Ad - 728x90

How to programmatically deduplicate files into hard links while maintaining the time stamps of the containing directories?

0 votes
2 answers
81 views
Continuing https://unix.stackexchange.com/a/22822 , how to deduplicate files, given as a list, into hardlinks, while maintaining the timestamps of their directories? Unfortunately, hardlinks changes the time stamps:
$ mkdir d1
$ mkdir d2
$ mkdir d3
$ echo "content" > d1/f1
$ echo "content" > d2/f2
$ echo "content" > d3/f3
$ ls -la --full-time d1 d2 d3
d1:
total 4
drwxr-xr-x  2 username username  60 2025-04-23 17:26:18.624828807 +0200 .
drwxrwxrwt 29 root     root     820 2025-04-23 17:26:07.397001442 +0200 ..
-rw-r--r--  1 username username   8 2025-04-23 17:26:18.624828807 +0200 f1

d2:
total 4
drwxr-xr-x  2 username username  60 2025-04-23 17:26:26.016715230 +0200 .
drwxrwxrwt 29 root     root     820 2025-04-23 17:26:07.397001442 +0200 ..
-rw-r--r--  1 username username   8 2025-04-23 17:26:26.016715230 +0200 f2

d3:
total 4
drwxr-xr-x  2 username username  60 2025-04-23 17:26:29.296664852 +0200 .
drwxrwxrwt 29 root     root     820 2025-04-23 17:26:07.397001442 +0200 ..
-rw-r--r--  1 username username   8 2025-04-23 17:26:29.296664852 +0200 f3
$ hardlink -v -c -M -O -y memcmp d1/f1 d2/f2 d3/f3
Linking /tmp/d1/f1 to /tmp/d2/f2 (-8 B)
Linking /tmp/d1/f1 to /tmp/d3/f3 (-8 B)
Mode:                     real
Method:                   memcmp
Files:                    3
Linked:                   2 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    16 B
Duration:                 0.000165 seconds
$ ls -la --full-time d1 d2 d3
d1:
total 4
drwxr-xr-x  2 username username  60 2025-04-23 17:26:18.624828807 +0200 .
drwxrwxrwt 29 root     root     820 2025-04-23 17:27:19.631893228 +0200 ..
-rw-r--r--  3 username username   8 2025-04-23 17:26:18.624828807 +0200 f1

d2:
total 4
drwxr-xr-x  2 username username  60 2025-04-23 17:28:45.922576280 +0200 .
drwxrwxrwt 29 root     root     820 2025-04-23 17:27:19.631893228 +0200 ..
-rw-r--r--  3 username username   8 2025-04-23 17:26:18.624828807 +0200 f2

d3:
total 4
drwxr-xr-x  2 username username  60 2025-04-23 17:28:45.922576280 +0200 .
drwxrwxrwt 29 root     root     820 2025-04-23 17:27:19.631893228 +0200 ..
-rw-r--r--  3 username username   8 2025-04-23 17:26:18.624828807 +0200 f3
As we see, two file have been replaced with hard links, which is good. However, the time stamps of d2 and d3 have been updated. That's NOT what we want. Ideally, we'd like to have a command that gets a list of files from
find /media/my_NTFS_drive -type f -size $(ls -la -- original_file| cut -d' ' -f5)c -exec cmp -s original_file {} \; -exec ls -t {} + 2>/dev/null
and converts them into hard links to original_file. If the time stamps of the hardlinked files are to be the same, change them to the oldest among the time stamps of original_file and its copies. The time stamps of the directories containing original_file and its copies have to be retained. Clearly, all this has to be automated. (No question we can do it with manual inspection and touch. From a user's viewpoint, it could be done with just another switch to hardlinks. As the task seems rather standard, my hope is that in the past decades, someone has already written a standalone program, perhaps even a shell script.)
Asked by AlMa1r (1 rep)
Apr 23, 2025, 03:38 PM
Last activity: Apr 23, 2025, 06:49 PM