wget — download multiple files over multiple nodes on a cluster
1
vote
3
answers
1047
views
Hi there I'm trying to download a large number of files at once; 279 to be precise. These are large BAM (~90GB) each. The cluster where I'm working has several nodes and fortunately I can allocate multiple instances at once.
Given this situation, I would like to know whether I can use
wget
from a batch file (*see* example below) to assign each download to a separate node to carry out independently.
**batch_file.txt**
-O DNK07.bam
-O mixe0007.bam
-O IHW9118.bam
.
.
In principle, this will not only speed up things but also prevent the run from failing since the wall-time for this execution is 24h, and it won't be enough to download all those files on a single machine consecutively.
This is what my BASH script looks like:
#!/bin/bash
#
#SBATCH --nodes=279 --ntasks=1 --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --mem=10gb
#
#SBATCH --job-name=download
#SBATCH --output=sgdp.out
##SBATCH --array=[1-279]%279
#
#SBATCH --partition=
#SBATCH --qos=
#
#SBATCH --account=
#NAMES=$1
#d=$(sed -n "$SLURM_ARRAY_TASK_ID"p $NAMES)
wget -i sgdp-download-list.txt
As you can see I was thinking to use an array job
(not sure whether will work); alternatively, I thought about allocating 279 nodes hoping SLURM would haven been clever enough to send each download to a separate node (not sure about it...). If you are aware of a way to do so efficiently, any suggestion is welcome.
Thanks in advance!
Asked by Matteo
(209 rep)
Oct 23, 2023, 11:48 AM
Last activity: Dec 4, 2023, 06:36 PM
Last activity: Dec 4, 2023, 06:36 PM