I have a btrfs volume, which I create regular snapshots of. The snapshots are rotated, the oldest being one year old. As a consequence, deleting large files may not actually free up the space for a year after the deletion.
About a year ago I copied the partition to a bigger drive but still kept the old one around.
Now the new drive has become corrupted, so that the only way to get the data out is
btrfs-restore
. As far as I know, the data on the new drive should still fit on the old, smaller drive, and files do not really change much (at most, some new ones get added or some deleted, but the overhead from a year’s worth of snapshots should not be large). So I decided to restore the data onto the old drive.
However, the restored data filled up the old drive much more quickly than I expected. I suspect this has to do with the implementation of btrfs:
* Create a large file.
* Create a snapshot of the volume. Space usage will not change because both files (the original one and the one in the snapshot) refer to the same extent on the disk for their payload. Modifying one of the two files would, however, increase space usage due to the copy-on-write nature of btrfs.
* Overwrite the large file with identical content. I suspect space usage increases by the size of the file because btrfs does not realize the content has not changed. As a result, it copies the blocks occupied by the file and ends up filling it with identical content, creating two identical files in two separate sets of blocks.
Does btrfs offer a mechanism to revert this by finding files which are “genetically related” (i.e. descended from the same file by copying it and/or snapshotting the subvolume on which it resides), identical in content but stored in separate sets of blocks, and turning them back into reflinks so space can be freed up?
Asked by user149408
(1515 rep)
Jul 26, 2022, 05:23 PM
Last activity: Feb 3, 2025, 01:52 PM
Last activity: Feb 3, 2025, 01:52 PM