I use
rsnapshot
to take regular, automated snapshots of the home directory on my desktop (which runs Ubuntu on sda
) and save them to a spare internal harddrive (sdb
). Occasionally, I manually copy (via rsync
) the contents of sdb
to an external USB SSD (call it sdc
). sdc
also contains older manual backups of my files which predate my adoption of rsnapshot
, and thus there are a lot of files on sdc
that are duplicates of the incoming rsnapshot
files.
I recently discovered the rdfind
tool (with the -makehardlinks
option) which lets me greatly reduce the disk usage on sdc
by running rdfind
after I manually rsync
snapshots from sdb
to sdc
. However, this entails redudant I/O operations, because I first rsync
the files over from sdb
(writing about 250 GB), *then* run rdfind
(freeing nearly the same 250 GB).
In principle, it should be possible to run something like rdfind
*before* rsync
to check hashes and determine which files from sdb
need to be written out and which can be hard links—but how?
- I am seeking generic solutions for the Linux ecosystem, but distro- or filesystem-specific answers are also welcome.
- My desktop runs Ubuntu 22.04 and both sdb
and sdc
use BTRFS.
- My question is distinct from this one because it concerns deduplication *between* the origin and destination, not just at the origin: https://unix.stackexchange.com/questions/186004/deduplication-tool-for-rsync
- I am aware of the implications of hardlinking files.
Asked by Max
(203 rep)
Feb 25, 2024, 09:43 PM
Last activity: Mar 1, 2024, 09:12 PM
Last activity: Mar 1, 2024, 09:12 PM