Sample Header Ad - 728x90

Parallelise rsync using GNU Parallel

40 votes
8 answers
132444 views
I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB. In order to sync those files, I have been using rsync command as follows: rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/ The contents of proj.lst are as follows: + proj1 + proj1/* + proj1/*/* + proj1/*/*/*.tar + proj1/*/*/*.pdf + proj2 + proj2/* + proj2/*/* + proj2/*/*/*.tar + proj2/*/*/*.pdf ... ... ... - * As a test, I picked up two of those projects (8.5GB of data) and executed the command above. Being a sequential process, it took 14 minutes and 58 seconds to complete. So, for 1.2TB of data, it would take several hours. If I would could have multiple rsync processes in parallel (using &, xargs or parallel), it would save me time. I tried with below command with parallel (after cding to the source directory) and it took 12 minutes 37 seconds to execute: parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: . This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere. How can I run multiple rsync processes in order to reduce the execution time?
Asked by Mandar Shinde (3374 rep)
Mar 13, 2015, 06:51 AM
Last activity: Jul 26, 2025, 08:00 PM