Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes

1 answers

1934 views

How can I submit multiple R job at once?

I have a R-script which runs multiple files say file=1 to 50. I usually submit repeated jobs say 5 times with 10 files each time by changing the number in R-script. So, how can I submit the 5-job at once without submitting the job 5 times? In addition, I want to update the **default.out** and **erro...

#!/bin/bash

#PBS -l nodes=1:ppn=20,walltime=05:00:00

#PBS -m e
#PBS -o default.out
#PBS -e errorfile

module load R/4.0

Rscript ~/r_script1.R

b_takhel (21 rep)

May 22, 2021, 07:54 PM • Last activity: Feb 13, 2025, 02:08 PM

1 votes

1 answers

119 views

qsub-like behavior for a slurm cluster

cluster slurm qsub

I recently switched to slurm and looking for a job submission tool, that behaves similar to qsub: 1. It takes input through a pipe 2. It prints the output to stdout Example: for n in `seq 1 10`; do echo "echo $n" | qsub done should send each echo command to a cluster and the output should be 1..10 p...

                                  I recently switched to slurm and looking for a job submission tool, that behaves similar to qsub: 

1. It takes input through a pipe
2. It prints the output to stdout 

Example: 

    for n in seq 1 10; do 
        echo "echo $n" | qsub
    done

should send each echo command to a cluster and the output should be 1..10 presumably in random order. 

So far I can 
1. send jobs with sbatch in parallel, but not sure to get the output to stdout
2. send jobs with srun, but then it operates sequentially one by one

Any suggestions?
                                

LazyCat (188 rep)

Mar 6, 2024, 02:01 AM • Last activity: Mar 8, 2024, 04:04 PM

0 votes

0 answers

132 views

Make a job depend on another but only if timeout

qsub

When submitting jobs with [`qsub`][1], we can make sure that a job only starts after another. More so, we can only execute it depending on the status of that other job. Perhaps only run the new job if the other one fails, or only if exits OK. But in my case, I want to start a job when it has "failed...

When submitting jobs with qsub , we can make sure that a job only starts after another. More so, we can only execute it depending on the status of that other job. Perhaps only run the new job if the other one fails, or only if exits OK. But in my case, I want to start a job when it has "failed" with a status of CANCELLED,TIMEOUT. The use case is that I sometimes have long training runs and our sysadmin only allows jobs of 32 hours. If a job "times out", that means that the training run was not finished, so a new job should take up where the timed out job left. The usual syntax is as follows.

qsub myjob.pbs -W depend=afterok:

Or afterany, afternotok, etc. Is there a way to make this work for specific statuses, so in my case the TIMEOUT status of an ended job?

Bram Vanroy (183 rep)

Jul 25, 2023, 12:05 PM • Last activity: Jul 26, 2023, 03:22 PM

0 votes

2 answers

366 views

Pass a variable that contains a comma as a -v option to qsub

bash shell environment-variables brace-expansion qsub

After seeing the reactions on [Stack Overflow](https://stackoverflow.com/questions/76616176/pass-a-variable-that-contains-a-comma-as-a-v-option-to-qsub) on this question and an unfamiliarity with qsub, I believe thqt U&L is better suited for this question. In qsub, we can pass environment variables...

info="This is some info"
qsub -v INFO=$info script.pbs

However, this becomes problematic when $info contains a comma.

info="This is some info, and here is some more!"
qsub -v INFO=$info script.pbs

This will trigger an error like so: > ERROR: -v: variable ' and here is some more!' is not set in environment variables. I have also tried encapsuling info, INFO="$info" leading to the same issue. How can I pass $info correctly, even if it contains one or more commas? The same question holds with newlines which always end up incorrectly when they are passed (backslash gets removed). Perhaps an interesting observation is that when I echo -e $info I get the output that I expect. The error is triggered in the qsub command specifically.

Bram Vanroy (183 rep)

Jul 4, 2023, 11:24 PM • Last activity: Jul 6, 2023, 07:59 AM

1 votes

1 answers

162 views

Shell Variable Expansion in qsub command through drmaa

zsh python gridengine qsub

I am running a bulk job submission to SGE (Sun Grid Engine) using [python drmaa bindings][1]. For the bulk job submission I am submitting a python script that takes in one argument and is command line executable, through a shebang. To properly parameterize the job bulk submission I am setting enviro...

I am running a bulk job submission to SGE (Sun Grid Engine) using python drmaa bindings . For the bulk job submission I am submitting a python script that takes in one argument and is command line executable, through a shebang. To properly parameterize the job bulk submission I am setting environment variables to propagate to the python script through the -v option. I am trying to do an indirect variable expansion in my zsh environment based on the $TASK_ID/$SGE_TASK_ID environment variable that SGE exports during job submittal. As a minimal reproducible example of the indirect variable expansion I am trying to do something like this, which works in my shell.

export foo1=2
export num=1

echo $(tmp=foo$num; echo ${(P)tmp})

which produces 2 The example script job_script.py

#! /usr/bin/python
import argparse
import os

parser = argparse.ArgumentParser()
parser.add_argument("input_path", type=os.path.realpath)

def main(input_path):
    # do stuff
    ...

if __name__ == "__main__":
    args = parser.parse_args
    input_path = args.input_path
    main(input_path)

The example drmaa submittal script

import os

# add path to libs
os.environ["DMRAA_LIBRARY_PATH"] = "path to DMRAA shared object"
os.environ["SGE_ROOT"] = "path to SGE root directory"
import drmaa

input_dir_suffixes = [1, 2, 5, 7, 10, 11]

INPUT_BASE_DIR = "/home/mel/input_data"

base_qsub_options = {
    "P": "project",
    "q": "queue",
    "b": "y", # means is an executable
    "shell": "y", # start up shell
}
native_specification = " ".join(f"-{k} {v}" for k,v in base_qsub_options.items())
remote_command = "job_script.py"

num_task_ids = len(input_dir_suffixes)
task_start = 1
task_stop = num_task_ids + 1
task_step = 1
task_id_zip = zip(range(1, num_task_ids + 1), input_dir_suffixes) 
task_id_env_vars = {
   f"TASK_ID_{task_id}_SUFFIX": str(suffix) for task_id, suffix in task_id_zip 
}

io_task_id = r"$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp)})"
arg_task_id = r"$(tmp=SUFFIX_TASK_ID_$SGE_TASK_ID; echo ${(P)tmp)})"

with drmaa.Session() as session:
    
    template = session.createJobTemplate()
    template.nativeSpecification = native_specification
    template.remoteCommand = remote_command
    template.jobEnvironment = task_id_env_vars
    template.outputPath = f":{INPUT_BASE_DIR}/output/{io_task_id}.o"
    template.outputPath = f":{INPUT_BASE_DIR}/error/{io_task_id}.e"

    args_list = [f"{INPUT_BASE_DIR}/data{arg_task_id}"]
    template.args = args_list
    session.runBulkJobs(template, task_start, task_stop - 1, task_step)
    session.deleteJobTemplate(template)

Apologize if there is a syntax error, I have to hand copy this, as its on a different system. With the submission done If I do a qstat -j on the job number I get the following settings displayed

sge_o_shell:         /usr/bin/zsh
stderr_path_list:    NONE::/home/mel/input_data/error_log/$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp}).e
stdout_path_list:    NONE::/home/mel/input_data/output_log/$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp}).o
job_args:            /home/mel/input_data/data$(tmp=SUFFIX_TASK_ID$SGE_TASK_ID; echo ${(P)tmp})
script_file:         job_script.py

env_list: 
SUFFIX_TASK_ID_1=1,SUFFIX_TASK_ID_2=2,SUFFIX_TASK_ID_3=5,SUFFIX_TASK_ID_4=7,SUFFIX_TASK_ID_5=10,SUFFIX_TASK_ID_6=11

error logs and output logs get made respectively but there is only a partial expansion. Examples

$(tmp=SUFFIX_TASK_ID1; echo ${(P)tmp}).e
$(tmp=SUFFIX_TASK_ID1; echo ${(P)tmp}).o

If we cat the error logs we see Illegal variable name Is what I am trying to do possible? So I am presuming something somewhere is not activating my zsh correctly.

Melendowski (111 rep)

May 31, 2023, 10:09 PM • Last activity: Jun 4, 2023, 11:45 AM

1 votes

1 answers

185 views

Will `qsub` run my jobs sequentially?

scripting qsub

Assume a script including the following content is passed to `qsub` as `qsub myscript.sh` #PBS -N Job_name #PBS -l walltime=10:30,mem=320kb #PBS -m be # step1 arg1 arg2 step2 arg3 arg4 Will `step1` and `step2` run in parallel over different nodes or sequentially on the allocated resource?

                                  Assume a script including the following content is passed to qsub as qsub myscript.sh

    #PBS -N Job_name
    #PBS -l walltime=10:30,mem=320kb
    #PBS -m be
    #
    step1 arg1 arg2
    step2 arg3 arg4
Will step1 and step2 run in parallel over different nodes or sequentially on the allocated resource?
                                

zzkr (155 rep)

Aug 24, 2022, 02:08 PM • Last activity: Aug 24, 2022, 08:09 PM

0 votes

2 answers

1588 views

Syntax for number of cores in a Sun Grid Engine job file

cpu gridengine qsub

I want to use the HPC of my university to `qsub` an array job of **3** tasks. Each task runs a Matlab code which uses a solver (MOSEK) that exploits multiple **threads** to solve an optimization problem. A parameter can control the number of threads we want the solver to use. The maximum number of t...

                                  I want to use the HPC of my university to qsub an array job of **3** tasks. 

Each task runs a Matlab code which uses a solver (MOSEK) that exploits multiple **threads** to solve an optimization problem. A parameter can control the number of threads we want the solver to use. The maximum number of threads allowed should never exceed the number of cores. 

Suppose I want the solver to use **4 threads**. Hence, I should ensure that each task is assigned to a machine with at least 4 cores free. How can I request that in the bash file? How should I count, in turn, the memory usage (i.e., should I declare the memory per core or the total memory)?

At the moment this is my bash file 

    #$ -S /bin/bash
    #$ -l h_vmem=18G
    #$ -l tmem=18G
    #$ -l h_rt=480:0:0
    #$ -cwd
    #$ -j y
    
    #Run 3 tasks
    #$ -t 1-3
    
    #$ -N try
    date
    hostname
    
    
    #Output the Task ID
    echo "Task ID is $SGE_TASK_ID"
    
    matlab -nodisplay -nodesktop -nojvm -nosplash -r "main_1; ID = $SGE_TASK_ID; f_1; exit"
                                

Star (125 rep)

Sep 9, 2020, 01:50 PM • Last activity: Jun 26, 2021, 04:21 PM

0 votes

2 answers

351 views

Bash script for reading multiple files

shell-script parallelism r qsub

I have multiple R scripts to read (up to 3 i.e. tr1.R, tr2.R, tr3.R). The bash script for reading a single script is given below ``` #!/bin/bash #PBS -l nodes=1:ppn=10,walltime=00:05:00 #PBS -M #PBS -m e module load R/4.0 Rscript ~/tr1.R ``` I tried the following as suggested by [@cas][1] ``` #!/bin...

I have multiple R scripts to read (up to 3 i.e. tr1.R, tr2.R, tr3.R).
The bash script for reading a single script is given below

#!/bin/bash
#PBS -l nodes=1:ppn=10,walltime=00:05:00
#PBS -M 
#PBS -m e
module load R/4.0
Rscript ~/tr1.R

I tried the following as suggested by @cas

#!/bin/bash
#PBS -l nodes=1:ppn=10,walltime=00:05:00
#PBS -M 
#PBS -m e
module load R/4.0
**Rscript ~/tr"$i".R**

Further, the job is submitted using

for i in {1..3} ; do
  qsub -o "default.$i.out" -e "errorfile$i" -v i script.sh
done

This couldnot read **Rscript ~/tr"$i".R**.

b_takhel (21 rep)

May 23, 2021, 09:20 PM • Last activity: Jun 1, 2021, 06:04 AM

0 votes

2 answers

1164 views

How to use "qstat" and "grep" to list lines containing a range of numbers?

linux grep cluster-ssh qsub

To monitor the job status in clusters, ```qstat``` is used to output lines like this ``` job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 146767 2.75000 REMD xxxxxx Rr...

To monitor the job status in clusters,

is used to output lines like this

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 146767 2.75000 REMD       xxxxxx      Rr    03/26/2021 10:58:17 Arya@node-c11b-027.kathleen.uc   160
 146811 2.75000 REMD       xxxxxx      r     03/26/2021 11:37:48 Arya@node-c11b-043.kathleen.uc   160
 146862 2.25862 REMD       xxxxxx      Rq    03/26/2021 06:24:39                                  160
 146911 2.19397 REMD       xxxxxx      Rq    03/26/2021 11:37:20                                  160
 146768 0.00000 REMD       xxxxxx      hqw   03/13/2021 14:47:35                                  160
 146769 0.00000 REMD       xxxxxx      hqw   03/13/2021 14:47:35                                  160
 146770 0.00000 REMD       xxxxxx      hqw   03/13/2021 14:47:36                                  160

The first element of each line is the job ID. Is there a way to show the lines for a particular range of jobs, e.g. how to only show the jobs from 146868 to 146927? It seems that

is needed.

lanselibai (143 rep)

Mar 26, 2021, 12:00 PM • Last activity: Mar 26, 2021, 03:10 PM

0 votes

1 answers

656 views

How can I run a script from an scpecific node and activating a conda environment?

python environment-variables qsub

I have a python program that runs inside a conda environment installed in a specific node of a cluster. I would like to submit it into qsub but just need help. My script is: ``` #!/bin/bash source conda activate myenv pyhton3.6 myprogram.py ``` I have already tried: - ssh **node** 'export SGE_ROOT=/...

I have a python program that runs inside a conda environment installed in a specific node of a cluster. I would like to submit it into qsub but just need help. My script is:

#!/bin/bash
source conda activate myenv
pyhton3.6 myprogram.py

I have already tried: - ssh **node** 'export SGE_ROOT=/usr/local/run/ge2011.11; /usr/local/run/ge2011.11/bin/linux-x64/qsub script.sh' but says Unable to run job: denied: host "**node**" is no submit hot. Exiting - qsub cwd -V qu=**node** script.sh but says Unable to read script because of error: error opening cwd;error opening qu=**node** Thanks!

TYSH (37 rep)

Dec 12, 2020, 02:03 PM • Last activity: Feb 18, 2021, 09:52 AM

0 votes

0 answers

3140 views

How can I cancel all waiting jobs with qsub?

cluster qsub

I am running a lot of jobs with `qsub`: some are running, some are waiting. Is there a way to cancel all the jobs for a given user which are queued/waiting without giving the individual job IDs?

                                  I am running a lot of jobs with qsub: some are running, some are waiting. Is there a way to cancel all the jobs for a given user which are queued/waiting without giving the individual job IDs?
                                

user443699 (133 rep)

Dec 13, 2020, 09:12 PM

0 votes

1 answers

806 views

Syntax for memory request in a Sun Grid Engine job file

memory gridengine qsub

I'm submitting a Matlab job in the cluster of my university using `qsub` after having logged in a node using `ssh`. The job runs out of memory. This is the advice I received to fix my issue: "**Possible solutions are run on a bigger machine or buy more RAM**." What does this mean in practice for my...

                                  I'm submitting a Matlab job in the cluster of my university using qsub after having logged in a node using ssh.

The job runs out of memory. This is the advice I received to fix my issue: "**Possible solutions are run on a bigger machine or buy more RAM**."

What does this mean in practice for my bash file? Which lines of the bash file control the size of the machine or the RAM? At the moment, in my bash file (see below) I request vmem and tmem. Is any of these RAM?

    #$ -S /bin/bash
    #$ -l h_vmem=18G
    #$ -l tmem=18G
    #$ -l h_rt=480:0:0
    #$ -cwd
    #$ -j y
    
    #Run 600 tasks where each task has a different $SGE_TASK_ID ranging from 1 to 600
    #$ -t 1-600
    
    #$ -N try
    date
    hostname
    
    #Output the Task ID
    echo "Task ID is $SGE_TASK_ID"
    
    matlab -nodisplay -nodesktop -nojvm -nosplash -r "main_1; ID = $SGE_TASK_ID; f_1; exit"

Star (125 rep)

Sep 9, 2020, 11:01 AM • Last activity: Sep 9, 2020, 02:24 PM

0 votes

1 answers

481 views

Name is nonexistent or not a directory

matlab qsub

I'm running an array job (`400` Matlab R2018b tasks) in the HPC of my university. After having `qsub` the .sh file in the terminal, the `400` tasks start by they are immediately killed. In the .o file of each task no errors are reported but only the following warning Warning: Name is nonexistent or...

                                  I'm running an array job (400 Matlab R2018b tasks) in the HPC of my university. After having qsub the .sh file in the terminal, the 400 tasks start by they are immediately killed.

In the .o file of each task no errors are reported but only the following warning

    Warning: Name is nonexistent or not a directory:
    /share/apps/.../NAG/mbl6a24dnl/mex.a64

________

In case it might be useful to know: I call the MOSEK solver in my Matlab .m file. In particular, at the beginning of my Matlab .m file I have 

    addpath /share/apps/mosek-9.2/9.2/toolbox/r2015aom

Also, this is my .sh file

    #$ -S /bin/bash
    #$ -l h_vmem=7G
    #$ -l tmem=7G
    #$ -l h_rt=480:0:0
    #$ -cwd
    #$ -j y
    
    #Run 400 tasks 
    #$ -t 1-400
    
    #$ -N count2
    date
    hostname
    
    #Output the Task ID
    echo "Task ID is $SGE_TASK_ID"
    
    /share/apps/matlabR2018b -nodisplay -nodesktop -nojvm -nosplash -r "main; ID = $SGE_TASK_ID; f; exit"

Provided that I have contacted the administrator to ask about this (no reply yet), is there anything I can do can in the immediate to fix it and run my code?

Star (125 rep)

May 9, 2020, 08:37 AM • Last activity: May 9, 2020, 02:11 PM

0 votes

1 answers

411 views

Export commands works in interactive mode, but produces error message in script

bash shell-script qsub

I need to export an environmental variable to run a program. I am able to successfully do that in interactive mode. However, when I try to export an environmental variable as part of a bash shell script, I get this error message: export: Command not found. In interactive mode, when I type in the fol...

                                  I need to export an environmental variable to run a program. I am able to successfully do that in interactive mode. However, when I try to export an environmental variable as part of a bash shell script, I get this error message: 

export: Command not found. 

In interactive mode, when I type in the following command, it works. 

    export GT_DIR=/cluster/home/SD/

But when I include the export command as part of the shell script, it does not work. I.e., 

    #!/bin/bash
    
    export GT_DIR=/cluster/home/SD/ 

I get the error message: 

    export: Command not found. 

When I type in echo $SHELL, I get 

    /bin/bash

Why  is the export command working in interactive mode but not when I try to submit it as a script?

SD23Nov18 (5 rep)

Nov 23, 2018, 09:40 PM • Last activity: Nov 24, 2018, 06:20 PM

0 votes

1 answers

397 views

How to run the same command to execute a file in multiple directories?

scripting batch-jobs qsub

I want to do the following, I have a set of directories: e.g) 400K 500K 600K and so on In each directory I have a "run.pbs" file, that I want to submit through batch with "qsub run.pbs". I was doing something like: for var in "@/run.pbs" do qsub run.pbs done I made this script based on some searchin...

                                  I want to do the following, I have a set of directories:

e.g) 400K 500K 600K and so on 

In each directory I have a "run.pbs" file, that I want to submit through batch with "qsub run.pbs". 

I was doing something like:

    for var in "@/run.pbs"
    do 
       qsub run.pbs
    done 

I made this script based on some searching I did online. However, after running the script I get an error indicating that run.pbs cannot be found. 

So I am wondering whether I am missing a step or not, such that the script is not accessing each directory. 

My script is in the same directory where the subdirectories (400K 500K ...) are. 

The path would be something like: /home/d/user/sims/study/temperatures

Thanks!

Edit:

The run.pbs is as follows:

    #!/bin/bash
    
    #PBS -N name_of_simulation
    #PBS -l nodes=1:ppn=20
    #PBS -l walltime=120:00:00
    #PBS -A name_of_allocation
    #PBS -j 
    
    
    # cd to working directory
    cd $PBS_O_WORKDIR
    
    module load module1
    module load module2
    module load module3
    
    mpirun -np 20 nameofprogram < input_file.in
                                

dareToDiffer07 (125 rep)

Nov 1, 2018, 02:44 PM • Last activity: Nov 21, 2018, 11:14 PM

0 votes

1 answers

124 views

Submitting HPC jobs within an HPC job

jobs getopts cluster-ssh qsub

I have a large script which relies on input arguments (with getopts). One of these arguments is a directory containing files (all named *bam) This script has 2 parts: - Part1: based on input *bam files, calculate one specific number. To be clear, the result is one single number, NOT one number per f...

                                  I have a large script which relies on input arguments (with getopts). One of these arguments is a directory containing files (all named *bam) This script has 2 parts: 

 - Part1: based on input *bam files, calculate one specific number. To be clear, the result is one single number, NOT one number per file.
 - Part 2: using the number found in part1, perform a series of operations on each *bam file.

Now, originally, part1 was very quick, computationally speaking. So my setup was:

 - Run script on terminal: bash script.sh
 - Within the script.sh, for part2, make a HPC job submission for each file

However, now that I need to analyze many more files than originally planned, I am realising that Part1 will also be computationally heavy - I therefore need to also run this on the HPC.

So my question is:

 - Is it possible to submit an HPC job which submits jobs in it?
 - In other words, can I submit script.sh as a job and and still have it submit jobs in its part2?

To be clear, here is an example of what my script might look like: 

    #!/usr/bin/bash

    # PART 0: accept all input arguments

    USAGE() { echo "Usage: bash $0 [-b ] [-o ] [-c ]" 1>&2; exit 1; }

    if (($# == 0)); then
            USAGE
    fi

    # Use getopts to accept each argument

    while getopts ":b:o:c:h" opt
    do
        case $opt in
           b ) BAMFILES=$OPTARG
            ;;
           o ) OUTDIR=$OPTARG
            ;;
           c ) CHROMLEN=$OPTARG
            ;;
           h ) USAGE
            ;;
           \? ) echo "Invalid option: -$OPTARG exiting" >&2
            exit
            ;;
           : ) echo "Option -$OPTARG requires an argument" >&2
            exit
            ;;
            esac
        done

    # PART1: calculate this unique number
    
    NUMBER=0    

    for i in  $(ls $BAMFILES/*.bam)
    do
      make some calculations on each file to obtain a number ...
      keep only the smallest found number and assign its value to $NUMBER
    done

    echo "Final number is ${NUMBER} "

    # PART2: Using $NUMBER that we found above, submit a job for each *bam file

    for i in $(ls $BAMFILES/*bam)
    do
    
        if [ ! -f ${OUTDIR}/${SAMPLE}.bw ];
        then 
            command=" command -options -b $NUMBER $i"

        echo $command | qsub -V -cwd -o $OUTDIR -e $OUTDIR -l tmem=6G -l h_vmem=6G -l h_rt=3600 -N result_${SAMPLE}

        fi

    done





                                

mf94 (219 rep)

Aug 21, 2018, 04:55 PM • Last activity: Aug 21, 2018, 11:27 PM

1 votes

2 answers

529 views

different results on terminal vs qsub submission

bash shell command-line qsub

I'm trying to run a command on the terminal and also submit it to the cluster but I am getting different results. When I type on the terminal this : $ for i in *_1.fastq.gz; do echo $i >> t.txt; zcat $i | \ grep "GCTGGCAAAAAGAAGGTAACATGTTTT" >> t.txt ; echo >> t.txt ; done I get the output like this...

                                  I'm trying to run a command on the terminal and also submit it to the cluster but I am getting different results.

When I type on the terminal this :

    $ for i in *_1.fastq.gz; do echo $i >> t.txt; zcat $i | \
        grep "GCTGGCAAAAAGAAGGTAACATGTTTT" >> t.txt ; echo >> t.txt ; done

I get the output like this

    adrenal_4a_ERR315335_1.fastq.gz
    GCANAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    
    adrenal_4a_ERR315452_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT
    
    adrenal_4c_ERR315392_1.fastq.gz
    
    adrenal_4c_ERR315450_1.fastq.gz
    
    and so on..


This is what the output is expected.

When I submit the same command to the HPC cluster via qsub I'm getting a completely different result:

    $ qsub -l h_vmem=4G -cwd -j y -b y -N n_tr -R y \
        "for i in *_1.fastq.gz; do echo $i >> t.txt; zcat $i | \
           grep "GCTGGCAAAAAGAAGGTAACATGTTTT" >> t.txt ; echo >> t.txt ; done"

    adrenal_4a_ERR315452_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT
    
    adrenal_4a_ERR315452_1.fastq.gz
    
    appendix_4a_ERR315437_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT
    
    adrenal_4a_ERR315452_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT
    
    adrenal_4a_ERR315452_1.fastq.gz
    GGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGA
    
    appendix_4a_ERR315465_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT
    
    adrenal_4a_ERR315452_1.fastq.gz
    
    appendix_4b_ERR315345_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT
    
    adrenal_4a_ERR315452_1.fastq.gz
    GCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGGAACTATGTAGCATAGTGTCTTAACACCTCAGTAAAGAGATCGGAAGAGCACA
    CAAGAACAGAATGAAGAAAGTCAGGGGGACTGCAAAGGCCAATGTTGGTGCTGGCAAAAAGAAGGTAACATGTTTTAAGAAACTATGTAGCATAGTGTCTT

What is it that I'm doing wrong here?
                                

user3138373 (2589 rep)

Jul 9, 2018, 05:55 PM • Last activity: Jul 9, 2018, 06:16 PM

0 votes

1 answers

554 views

using top to identify cpu core number with qsub pbspro

top parallelism qsub

I have a shared memory server with PBSpro installed as the job manager. There are 320 cores total, and pbs is configured so there is 1 job queue having 30 of the 32 cpu's so 300 physical cores to be divided up among users, with 2 cpu's or 20 cores left for head node operation. some software will run...

                                  I have a shared memory server with PBSpro installed as the job manager.  There are 320 cores total, and pbs is configured so there is 1 job queue having 30 of the 32 cpu's so 300 physical cores to be divided up among users, with 2 cpu's or 20 cores left for head node operation.

some software will run on or parallelize over 100+ cores, and i'd like to verify N processes from user A are on unique and separate core numbers than the M processes from user B.

If I use top I can do

    hit F to get into sort menu
    hit J to sort by P = Last CPU used (SMP)

this gives me a P column in the top output having core number, but on a 300 core system I can only get around 70-80 rows before I run out of screen, on a 1920x1200 monitor.  I can start shrinking font size in the terminal window but in the end I won't be able to see up to 300 rows, not that I can even visually process all that with top updating every 1..3 seconds.

My goal is to quickly and easily

 - verify users are running stuff in the job queue within the correct core numbers, and not on the head node
 - verify for a given cpu core that is at 100%, or anything over 50%, that only one process from one user is running on it.  I want to make sure that if user A with programA.x is on core #234, user B with anything.x is NOT on core #234.

what's the best way to do this, when a single image shared memory server has many cores?

ron (8647 rep)

Mar 28, 2018, 08:24 PM • Last activity: Apr 17, 2018, 11:44 AM

3 votes

3 answers

2175 views

Are there any disadvantages against using qsub to run tasks all the time?

parallelism qsub

When I'm running a task on a computer network? I've just started to realize that if I qsub *any* task, then the task won't hog up my terminal, and I can do other things on the same terminal (which is quite useful even if the task only takes a single minute to finish). And then I run qstat to see whi...

                                  When I'm running a task on a computer network? I've just started to realize that if I qsub *any* task, then the task won't hog up my terminal, and I can do other things on the same terminal (which is quite useful even if the task only takes a single minute to finish). 

And then I run qstat to see which taks have finished and which ones haven't.

http://pubs.opengroup.org/onlinepubs/009604599/utilities/qsub.html  is a good explanation of qsub.

InquilineKea (6432 rep)

Jan 26, 2012, 06:02 AM • Last activity: Sep 12, 2017, 11:53 PM

0 votes

1 answers

394 views

awk not behaving with qsub

awk qsub

I am trying to run a command which is something like this intersectBed -a yeast.v2.bed -b cov.txt -wa -wb | awk -v OFS="\t" '{print $7,$8,$9,$6,$11,$10}' > out.txt out.txt looks like this chrI 151006 151096 0 chrI 142253 142619 53 chrI 87387 87500 8 I am working on the cluster and when I qsub the ab...

                                  I am trying to run a command which is something like this

    intersectBed -a yeast.v2.bed -b cov.txt -wa -wb | awk -v OFS="\t" '{print $7,$8,$9,$6,$11,$10}' > out.txt

out.txt looks like this

    chrI	151006	151096	0
    chrI	142253	142619	53
    chrI	87387	87500	8

I am working on the cluster and when I qsub the above command(that is submitting to the cluster) I get the out.txt file like this

    chrIt151006t151096t0
    chrIt142253t142619t53
    chrIt87387t87500t8 

The command line I am using with qsub is this:

    qsub -l h_vmem=4G -cwd -j y -b y -N test "intersectBed -a yeast.v2.bed -b cov.txt -wa -wb | awk -v OFS="\t" '{print \$7,\$8,\$9,\$6,\$11,\$10}' > out.txt"

As you can see I have to escape each column($) with back slash so that shell does not consider it as its on variable. But some how tab does not work. Can anyone tell me what is going on here. Of course I can use sed 's/t/\t/g' after the awk command but I need to understand what is going on here and why does it not work.

Thanks in advance

user3138373 (2589 rep)

Mar 15, 2017, 04:01 PM • Last activity: Mar 15, 2017, 04:30 PM

Showing page 1 of 20 total questions