How to programmatically deduplicate files into hard links while maintaining the time stamps of the containing directories?
0
votes
2
answers
81
views
Continuing https://unix.stackexchange.com/a/22822 , how to deduplicate files, given as a list, into hardlinks, while maintaining the timestamps of their directories? Unfortunately,
hardlinks
changes the time stamps:
$ mkdir d1
$ mkdir d2
$ mkdir d3
$ echo "content" > d1/f1
$ echo "content" > d2/f2
$ echo "content" > d3/f3
$ ls -la --full-time d1 d2 d3
d1:
total 4
drwxr-xr-x 2 username username 60 2025-04-23 17:26:18.624828807 +0200 .
drwxrwxrwt 29 root root 820 2025-04-23 17:26:07.397001442 +0200 ..
-rw-r--r-- 1 username username 8 2025-04-23 17:26:18.624828807 +0200 f1
d2:
total 4
drwxr-xr-x 2 username username 60 2025-04-23 17:26:26.016715230 +0200 .
drwxrwxrwt 29 root root 820 2025-04-23 17:26:07.397001442 +0200 ..
-rw-r--r-- 1 username username 8 2025-04-23 17:26:26.016715230 +0200 f2
d3:
total 4
drwxr-xr-x 2 username username 60 2025-04-23 17:26:29.296664852 +0200 .
drwxrwxrwt 29 root root 820 2025-04-23 17:26:07.397001442 +0200 ..
-rw-r--r-- 1 username username 8 2025-04-23 17:26:29.296664852 +0200 f3
$ hardlink -v -c -M -O -y memcmp d1/f1 d2/f2 d3/f3
Linking /tmp/d1/f1 to /tmp/d2/f2 (-8 B)
Linking /tmp/d1/f1 to /tmp/d3/f3 (-8 B)
Mode: real
Method: memcmp
Files: 3
Linked: 2 files
Compared: 0 xattrs
Compared: 2 files
Saved: 16 B
Duration: 0.000165 seconds
$ ls -la --full-time d1 d2 d3
d1:
total 4
drwxr-xr-x 2 username username 60 2025-04-23 17:26:18.624828807 +0200 .
drwxrwxrwt 29 root root 820 2025-04-23 17:27:19.631893228 +0200 ..
-rw-r--r-- 3 username username 8 2025-04-23 17:26:18.624828807 +0200 f1
d2:
total 4
drwxr-xr-x 2 username username 60 2025-04-23 17:28:45.922576280 +0200 .
drwxrwxrwt 29 root root 820 2025-04-23 17:27:19.631893228 +0200 ..
-rw-r--r-- 3 username username 8 2025-04-23 17:26:18.624828807 +0200 f2
d3:
total 4
drwxr-xr-x 2 username username 60 2025-04-23 17:28:45.922576280 +0200 .
drwxrwxrwt 29 root root 820 2025-04-23 17:27:19.631893228 +0200 ..
-rw-r--r-- 3 username username 8 2025-04-23 17:26:18.624828807 +0200 f3
As we see, two file have been replaced with hard links, which is good.
However, the time stamps of d2
and d3
have been updated. That's NOT what we want.
Ideally, we'd like to have a command that gets a list of files from
find /media/my_NTFS_drive -type f -size $(ls -la -- original_file| cut -d' ' -f5)c -exec cmp -s original_file {} \; -exec ls -t {} + 2>/dev/null
and converts them into hard links to original_file
. If the time stamps of the hardlinked files are to be the same, change them to the oldest among the time stamps of original_file
and its copies. The time stamps of the directories containing original_file
and its copies have to be retained. Clearly, all this has to be automated. (No question we can do it with manual inspection and touch
. From a user's viewpoint, it could be done with just another switch to hardlinks
. As the task seems rather standard, my hope is that in the past decades, someone has already written a standalone program, perhaps even a shell script.)
Asked by AlMa1r
(1 rep)
Apr 23, 2025, 03:38 PM
Last activity: Apr 23, 2025, 06:49 PM
Last activity: Apr 23, 2025, 06:49 PM