Sample Header Ad - 728x90

wget — download multiple files over multiple nodes on a cluster

1 vote
3 answers
1047 views
Hi there I'm trying to download a large number of files at once; 279 to be precise. These are large BAM (~90GB) each. The cluster where I'm working has several nodes and fortunately I can allocate multiple instances at once. Given this situation, I would like to know whether I can use wget from a batch file (*see* example below) to assign each download to a separate node to carry out independently. **batch_file.txt**
-O DNK07.bam
 -O mixe0007.bam
 -O IHW9118.bam
.
.
In principle, this will not only speed up things but also prevent the run from failing since the wall-time for this execution is 24h, and it won't be enough to download all those files on a single machine consecutively. This is what my BASH script looks like:
#!/bin/bash
#
#SBATCH --nodes=279 --ntasks=1 --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --mem=10gb
#
#SBATCH --job-name=download
#SBATCH --output=sgdp.out
##SBATCH --array=[1-279]%279
#
#SBATCH --partition=
#SBATCH --qos=
#
#SBATCH --account=

#NAMES=$1
#d=$(sed -n "$SLURM_ARRAY_TASK_ID"p $NAMES)

wget -i sgdp-download-list.txt
As you can see I was thinking to use an array job (not sure whether will work); alternatively, I thought about allocating 279 nodes hoping SLURM would haven been clever enough to send each download to a separate node (not sure about it...). If you are aware of a way to do so efficiently, any suggestion is welcome. Thanks in advance!
Asked by Matteo (209 rep)
Oct 23, 2023, 11:48 AM
Last activity: Dec 4, 2023, 06:36 PM