Sample Header Ad - 728x90

How to get `rsync --link-dest=` hard-link moved/renamed files

1 vote
0 answers
58 views
I am trying to set up [rsnapshot][] for backing up a remote server. However, I realize that my issue is with [rsync][] (rsnapshot’s back-end), not with rsnapshot itself. Thus I am focusing the question on rsync. My goal is to make periodic snapshots of a remote server, while using hard links to avoid duplicating unchanged files on disk. The process is very nicely explained in the blog post Easy Automated Snapshot-Style Backups with Linux and Rsync . As long as the original files are not renamed or moved around, everything works as expected:
# Build the original hierarchy we want to track.
mkdir original
echo "Hello, World!" > original/file1

# Make two snapshots.
rsync -a original/ snapshot1/
rsync -a --link-dest=../snapshot1 original/ snapshot2/

# Check the inode numbers.
ls -i1 snapshot{1,2}/*
The last command above shows the two snapshot copies of file1 share the same inode number. So far so good. The problem is that at some point I may need to rename/reorganize the files within the original hierarchy. Then, the hard-linking fails:
# Rename a file.
mv original/file1 original/renamed1

# Make a third snapshot.
rsync -a --link-dest=../snapshot2 --fuzzy --fuzzy original/ snapshot3/

# Check the inode numbers.
ls -i1 snapshot{1,2,3}/*
The test above shows snapshot3/renamed1 has a different inode number: it is a fresh copy. I expected the repeated --fuzzy option to hard-link the file despite its changed name, as per the manual:
--fuzzy, -y
        This option tells rsync that it should look for a basis file for
        any  destination  file  that  is missing.  The current algorithm
        looks in the same directory as the destination file for either a
        file  that  has  an  identical  size  and  modified-time,  or  a
        similarly-named file.  If found, rsync uses the fuzzy basis file
        to try to speed up the transfer.

        If  the  option is repeated, the fuzzy scan will also be done in
        any  matching  alternate  destination   directories   that   are
        specified via --compare-dest, --copy-dest, or --link-dest.
Note: as per this answer , I tried replacing, in the rsync command line, original/ by localhost:$PWD/original/. It made no difference. Why does rsync fail to hard-link here? Is there a way to convince it to do it? If not, any suggested workaround? ------------------------------------------------------------------ **Edit**: As suggested by @meuh, I tried adding the option --debug=FUZZY2. It printed the messages:
fuzzy size/modtime match for ../snapshot2/file1
fuzzy basis selected for renamed1: ../snapshot2/file1
I then tried syncing a larger file (a ∼ 15 MB copy of the Linux kernel) through ssh (rsync source = localhost:$PWD/original/) with the options -vvv --debug=FUZZY2. This gave the same messages as above, with many more messages asserting hash matches and, at the end.
total: matches=3868  hash_hits=3868  false_alarms=0 data=0
Here is the (almost) complete rsync debug output:
opening connection using: ssh localhost rsync --server --sender -vvvlogDtpre.iLsfxCIvu . /home/edgar/tmp/rsnapshot/original/  (8 args)
receiving incremental file list
server_sender starting pid=42683
[sender] make_file(.,*,0)
[sender] pushing local filters for /home/edgar/tmp/rsnapshot/original/
[sender] make_file(renamed1,*,2)
send_file_list done
send_files starting
recv_file_name(.)
recv_file_name(renamed1)
received 2 names
recv_file_list done
get_local_name count=2 snapshot3/
created directory snapshot3
generator starting pid=42643
delta-transmission enabled
recv_generator(.,0)
./ is uptodate
recv_files(2) starting
set modtime, atime of . to (1743753557) 2025/04/04 09:59:17, (1743753866) 2025/04/04 10:04:26
recv_generator(.,1)
recv_generator(renamed1,2)
[generator] make_file(../snapshot2/file1,*,1)
fuzzy size/modtime match for ../snapshot2/file1
fuzzy basis selected for renamed1: ../snapshot2/file1
generating and sending sums for 2
count=3868 rem=2560 blength=3864 s2length=2 flength=14944648
generate_files phase=1
send_files(2, /home/edgar/tmp/rsnapshot/original/renamed1)
send_files mapped /home/edgar/tmp/rsnapshot/original/renamed1 of size 14944648
calling match_sums /home/edgar/tmp/rsnapshot/original/renamed1
built hash table
hash search b=3864 len=14944648
match at 0 last_match=0 j=0 len=3864 n=0
match at 3864 last_match=3864 j=1 len=3864 n=0
match at 7728 last_match=7728 j=2 len=3864 n=0
[...snip many lines, identical but for the numbers...]
match at 14934360 last_match=14934360 j=3865 len=3864 n=0
match at 14938224 last_match=14938224 j=3866 len=3864 n=0
match at 14942088 last_match=14942088 j=3867 len=2560 n=0
done hash search
sending file_sum
false_alarms=0 hash_hits=3868 matches=3868
sender finished /home/edgar/tmp/rsnapshot/original/renamed1
recv_files(renamed1)
renamed1
recv mapped ../snapshot2/file1 of size 14944648
got file_sum
set modtime, atime of .renamed1.l4pR7E to (1743753557) 2025/04/04 09:59:17, (1743753866) 2025/04/04 10:04:26
renaming .renamed1.l4pR7E to renamed1
set modtime, atime of . to (1743753557) 2025/04/04 09:59:17, (1743753866) 2025/04/04 10:04:26
send_files phase=1
recv_files phase=1
generate_files phase=2
send_files phase=2
send files finished
total: matches=3868  hash_hits=3868  false_alarms=0 data=0
recv_files phase=2
recv_files finished
generate_files phase=3
generate_files finished

sent 23,258 bytes  received 249,330 bytes  545,176.00 bytes/sec
total size is 14,944,648  speedup is 54.83
client_run2 waiting on 42644
[generator] _exit_cleanup(code=0, file=main.c, line=1865): about to call exit(0)
It looks to me like rsync did notice that original/renamed1 was identical to snapshot2/file1, and used this fact to speed up the transfer. But it is still unclear to we why it chose to copy (maybe chunk by chunk) snapshot2/file1 instead of hard-link it.
Asked by Edgar Bonet (221 rep)
Apr 3, 2025, 02:20 PM
Last activity: Apr 4, 2025, 08:25 AM