Sample Header Ad - 728x90

job array with conditional statements

0 votes
0 answers
37 views
I'm working with a job array where I'm controlling for the potential execution of the various steps of a script multiple times. In this case, only the missing ones will be processed using appropriate if statements. As part of the script I need to rename folders before the last command, so I wish for the if statements to check also the newly named folders on subsequent runs; however, I'm missing the logic on how to do so... Below, an example of the folders I visit with the array the first time
/path/to/INLUP_00165
/path/to/INLUP_00169
/path/to/INLUP_00208
/path/to/INLUP_00214
/path/to/INLUP_00228
/path/to/INLUP_00245
/path/to/INLUP_00393
/path/to/INLUP_00418
which will became, after the first execution, the following
/path/to/INLUP_00165-35
/path/to/INLUP_00169-27
/path/to/INLUP_00208-35
/path/to/INLUP_00214-32
/path/to/INLUP_00228-32
/path/to/INLUP_00245-34
/path/to/INLUP_00393-29
/path/to/INLUP_00418-32
This is the script I'm using
#!/bin/bash
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=12
#SBATCH --time=200:00:00
#SBATCH --mem=80gb
#
#SBATCH --job-name=name1
#SBATCH --output=name_1.out
#SBATCH --array=[1-8]%8
#
#SBATCH --partition=bigmem
#SBATCH --exclude=node5

NAMES=$1
h=$(sed -n "$SLURM_ARRAY_TASK_ID"p $NAMES)

#load modules

ID="$(echo ${h}/*.fastq.gz | cut -f 1 -d '.' | sed 's#/path/to/INLUP_[0-9]\+/##')"
readarray -t cond  $h/log.txt; find . -maxdepth 1 -type f,l -not -name '*.filt.fastq.gz' -not -name '*.txt' -not -name '*.bin' -delete

		HIGH=$(grep '\[M::ha_analyze_count\] highest\:' $h/log.txt | tail -1 | sed 's#\[M::ha_analyze_count\] highest\: count\[##' | sed 's#\] = [0-9]\+$##') #highest peak
		LOW=$(grep '\[M::ha_analyze_count\] lowest\:' $h/log.txt | tail -1 | sed 's#\[M::ha_analyze_count\] lowest\: count\[##' | sed 's#\] = [0-9]\+$##') #lowest peak
		SUFFIX="$(echo $(( ($HIGH - $LOW) "*" 3/4 + $LOW )))" #estimated homozygous coverage

		mv $h $h-${SUFFIX}

		HOM_COV="$(echo ${h} | sed 's#/path/to/INLUP_[0-9]\+-##')"

		last command tool &> $h-${SUFFIX}/log_param.txt
fi
I thought to attempt, with the next code block, to parse the number after the - in the new folders' name by storing it into an array to check element by element — starting with the first and assigning it to the first folder and so on.
readarray -t cond < <(
    for filename in INLUP_00*
    do 
        printf "$filename \n" | sed 's#INLUP_[0-9]\+-##'
    done
)
How can I link it to the file I feed as input which is unchanged and contains the original paths to the folders? Maybe something related to the way a path is associated to the TASK ID like here h=$(sed -n "$SLURM_ARRAY_TASK_ID"p $NAMES). Let me know, thanks in advance!
Asked by Matteo (209 rep)
Nov 10, 2024, 05:26 PM