Sample Header Ad - 728x90

(Almost-)Atomic way to merge 2 folders

1 vote
1 answer
288 views
I'm exploring the different ways to add some consistency to a file deployment operation. The current situation is : *A current version folder which contains approximately 100K different files* - /current - /path1 - file1 - file2 - ... - /path2 - fileX - ... *An update folder which contains around 100 files* - /update - /path1 - file1 - /path2 - fileX - ... The final goal is to send all the files from the update to the current folder, I insist on the "all". Either none of the files should be replicated if there's an error during the operation, or all the files should be deployed in order to flag the operation as successful. In an ideal word, the scenario I'm looking for would be an "atomic" rsync which would return either a fail error code or a success error code depending on what happened during the operation and would ensure that the original current directory would be seen by the system instantly (after the rsync operation) as a newer version (= no intermediary state during the copy because of a potential electrical cut or whatsover..). From my understanding, atomic operations are not available on most UNIX systems, so I can consider that the ideal case will clearly not be reached. I'm trying to approximate this behavior as much as possible. I explored different solutions for this : - cp -al to mirror the current directory to a tmp directory, then copy all the files from the update directory in it, then removing current and renaming tmp to current - rsync (so far the most pertinent) using the --link-dest option in order to create an intermediary folder with hard links to the current directory files. Basically same as the previous case but probably much cleaner as it doesn't require any cp. - atomic-rsync I encountered an existing perl script Perl atomic-rsync which supposedly does this kind of operations, but which results in only taking into account the files present in the update directory and getting rid of the "delta files' current folder. Both solutions seem to work but I have no confidence in using any of them in a real production use case, the problem being that it might be very slow or somehow costly/useless to create 100K hard links. I also know that a very consistent solution would be to use snapshots and there are plenty of options for that, but it's not acceptable in my case due to the disk size (~70GB and the current folder already takes ~60GB). I ran out of options in my knowledge, would there be any (better) way to achieve the expected goal?
Asked by Bil5 (113 rep)
Jun 20, 2021, 10:27 AM
Last activity: Jun 21, 2021, 04:33 AM