I'm exploring the different ways to add some consistency to a file deployment operation.
The current situation is :
*A
current
version folder which contains approximately 100K different files*
- /current
- /path1
- file1
- file2
- ...
- /path2
- fileX
- ...
*An update
folder which contains around 100 files*
- /update
- /path1
- file1
- /path2
- fileX
- ...
The final goal is to send all the files from the update
to the current
folder, I insist on the "all". Either none of the files should be replicated if there's an error during the operation, or all the files should be deployed in order to flag the operation as successful.
In an ideal word, the scenario I'm looking for would be an "atomic" rsync which would return either a fail error code or a success error code depending on what happened during the operation and would ensure that the original current
directory would be seen by the system instantly (after the rsync
operation) as a newer version (= no intermediary state during the copy because of a potential electrical cut or whatsover..).
From my understanding, atomic operations are not available on most UNIX systems, so I can consider that the ideal case will clearly not be reached. I'm trying to approximate this behavior as much as possible.
I explored different solutions for this :
- cp -al
to mirror the current
directory to a tmp
directory, then copy all the files from the update
directory in it, then removing current
and renaming tmp
to current
- rsync
(so far the most pertinent) using the --link-dest
option in order to create an intermediary folder with hard links to the current
directory files. Basically same as the previous case but probably much cleaner as it doesn't require any cp
.
- atomic-rsync
I encountered an existing perl script Perl atomic-rsync which supposedly does this kind of operations, but which results in only taking into account the files present in the update
directory and getting rid of the "delta files' current
folder.
Both solutions seem to work but I have no confidence in using any of them in a real production use case, the problem being that it might be very slow or somehow costly/useless to create 100K hard links.
I also know that a very consistent solution would be to use snapshots and there are plenty of options for that, but it's not acceptable in my case due to the disk size (~70GB and the current
folder already takes ~60GB).
I ran out of options in my knowledge, would there be any (better) way to achieve the expected goal?
Asked by Bil5
(113 rep)
Jun 20, 2021, 10:27 AM
Last activity: Jun 21, 2021, 04:33 AM
Last activity: Jun 21, 2021, 04:33 AM