Sample Header Ad - 728x90

Simultaneous copy from multiple sources without fragmenting destination

2 votes
0 answers
504 views
I regularly need to copy large datasets from multiple smaller drives to a larger one. Lately I've been using a WD Easystore 12TB External USB 3.0 Hard Drive as my destination. Copying all the files in series takes about 3 days. The destination drive is idle the majority of the time, waiting for source reads. I can get the copy time under 20 hours by running a cp from each source at the same time, but that left most of the files fragmented. There is a patch for cp that adds a preallocate option, but that only works on filesystems that support the fallocate system call and ntfs-3g does not. Rsync has a preallocate option, but it fails with "rsync: do_fallocate" "Operation not supported (95)", presumably for the same reason. I tried using dd with a block size larger than the file size, hoping that if the write didn't take place until the entire file was already in memory the allocation would be contiguous, but the files still ended up fragmented. I tried using ntfsfallocate to preallocate space for all the files (took about 12 hours for 23k files), but cp does not appear to use the existing allocation when overwriting a file. Is there a linux NTFS driver that supports fallocate for any distribution? Other suggestions?
Asked by Pascal (323 rep)
Feb 6, 2020, 04:11 AM
Last activity: Feb 19, 2020, 07:39 PM