rdfind-like tool to sparsify rsync operations

0 votes

0 answers

46 views

                          I use rsnapshot to take regular, automated snapshots of the home directory on my desktop (which runs Ubuntu on sda) and save them to a spare internal harddrive (sdb). Occasionally, I manually copy (via rsync) the contents of sdb to an external USB SSD (call it sdc). sdc also contains older manual backups of my files which predate my adoption of rsnapshot, and thus there are a lot of files on sdc that are duplicates of the incoming rsnapshot files.

I recently discovered the rdfind tool (with the -makehardlinks option) which lets me greatly reduce the disk usage on sdc by running rdfind after I manually rsync snapshots from sdb to sdc. However, this entails redudant I/O operations, because I first rsync the files over from sdb (writing about 250 GB), *then* run rdfind (freeing nearly the same 250 GB).

In principle, it should be possible to run something like rdfind *before* rsync to check hashes and determine which files from sdb need to be written out and which can be hard links—but how?

- I am seeking generic solutions for the Linux ecosystem, but distro- or filesystem-specific answers are also welcome.
- My desktop runs Ubuntu 22.04 and both sdb and sdc use BTRFS.
- My question is distinct from this one because it concerns deduplication *between* the origin and destination, not just at the origin: https://unix.stackexchange.com/questions/186004/deduplication-tool-for-rsync 
- I am aware of the implications of hardlinking files.
                        

Asked by Max (203 rep)

Feb 25, 2024, 09:43 PM
Last activity: Mar 1, 2024, 09:12 PM

rdfind-like tool to sparsify rsync operations

Related Questions