Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes
1 answers
162 views
Shell Variable Expansion in qsub command through drmaa
I am running a bulk job submission to SGE (Sun Grid Engine) using [python drmaa bindings][1]. For the bulk job submission I am submitting a python script that takes in one argument and is command line executable, through a shebang. To properly parameterize the job bulk submission I am setting enviro...
I am running a bulk job submission to SGE (Sun Grid Engine) using python drmaa bindings . For the bulk job submission I am submitting a python script that takes in one argument and is command line executable, through a shebang. To properly parameterize the job bulk submission I am setting environment variables to propagate to the python script through the -v option. I am trying to do an indirect variable expansion in my zsh environment based on the $TASK_ID/$SGE_TASK_ID environment variable that SGE exports during job submittal. As a minimal reproducible example of the indirect variable expansion I am trying to do something like this, which works in my shell.
export foo1=2
export num=1

echo $(tmp=foo$num; echo ${(P)tmp})
which produces 2 The example script job_script.py
#! /usr/bin/python
import argparse
import os

parser = argparse.ArgumentParser()
parser.add_argument("input_path", type=os.path.realpath)

def main(input_path):
    # do stuff
    ...

if __name__ == "__main__":
    args = parser.parse_args
    input_path = args.input_path
    main(input_path)
The example drmaa submittal script
import os

# add path to libs
os.environ["DMRAA_LIBRARY_PATH"] = "path to DMRAA shared object"
os.environ["SGE_ROOT"] = "path to SGE root directory"
import drmaa

input_dir_suffixes = [1, 2, 5, 7, 10, 11]

INPUT_BASE_DIR = "/home/mel/input_data"

base_qsub_options = {
    "P": "project",
    "q": "queue",
    "b": "y", # means is an executable
    "shell": "y", # start up shell
}
native_specification = " ".join(f"-{k} {v}" for k,v in base_qsub_options.items())
remote_command = "job_script.py"

num_task_ids = len(input_dir_suffixes)
task_start = 1
task_stop = num_task_ids + 1
task_step = 1
task_id_zip = zip(range(1, num_task_ids + 1), input_dir_suffixes) 
task_id_env_vars = {
   f"TASK_ID_{task_id}_SUFFIX": str(suffix) for task_id, suffix in task_id_zip 
}

io_task_id = r"$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp)})"
arg_task_id = r"$(tmp=SUFFIX_TASK_ID_$SGE_TASK_ID; echo ${(P)tmp)})"

with drmaa.Session() as session:
    
    template = session.createJobTemplate()
    template.nativeSpecification = native_specification
    template.remoteCommand = remote_command
    template.jobEnvironment = task_id_env_vars
    template.outputPath = f":{INPUT_BASE_DIR}/output/{io_task_id}.o"
    template.outputPath = f":{INPUT_BASE_DIR}/error/{io_task_id}.e"

    args_list = [f"{INPUT_BASE_DIR}/data{arg_task_id}"]
    template.args = args_list
    session.runBulkJobs(template, task_start, task_stop - 1, task_step)
    session.deleteJobTemplate(template)
Apologize if there is a syntax error, I have to hand copy this, as its on a different system. With the submission done If I do a qstat -j on the job number I get the following settings displayed
sge_o_shell:         /usr/bin/zsh
stderr_path_list:    NONE::/home/mel/input_data/error_log/$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp}).e
stdout_path_list:    NONE::/home/mel/input_data/output_log/$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp}).o
job_args:            /home/mel/input_data/data$(tmp=SUFFIX_TASK_ID$SGE_TASK_ID; echo ${(P)tmp})
script_file:         job_script.py

env_list: 
SUFFIX_TASK_ID_1=1,SUFFIX_TASK_ID_2=2,SUFFIX_TASK_ID_3=5,SUFFIX_TASK_ID_4=7,SUFFIX_TASK_ID_5=10,SUFFIX_TASK_ID_6=11
error logs and output logs get made respectively but there is only a partial expansion. Examples
$(tmp=SUFFIX_TASK_ID1; echo ${(P)tmp}).e
$(tmp=SUFFIX_TASK_ID1; echo ${(P)tmp}).o
If we cat the error logs we see Illegal variable name Is what I am trying to do possible? So I am presuming something somewhere is not activating my zsh correctly.
Melendowski (111 rep)
May 31, 2023, 10:09 PM • Last activity: Jun 4, 2023, 11:45 AM
0 votes
0 answers
32 views
Small TaskQueue shared on two computers
There are two computers with 12 physical cores each. Computer A should accept jobs and distribute them among A and B I want to setup Computers A and B such that - A will accept jobs (via ssh) and distribute them among A and B (more or less intelligently) - if possible I'd like to block 4 cores on ea...
There are two computers with 12 physical cores each. Computer A should accept jobs and distribute them among A and B I want to setup Computers A and B such that - A will accept jobs (via ssh) and distribute them among A and B (more or less intelligently) - if possible I'd like to block 4 cores on each computer as "personal requiremets" Jobs are expected to be either python scripts or executables written in c++ (can involve mpi code). I have read of slurm and the Sun Grid Engine but that seems a bit too powerful/complicated for this use case (I don't want to spend a week reading how to do it and troubleshooting). Is there an easier solution that satisfies the requirements?
infinitezero (207 rep)
Mar 14, 2022, 03:12 PM • Last activity: Mar 14, 2022, 04:38 PM
0 votes
2 answers
1592 views
Syntax for number of cores in a Sun Grid Engine job file
I want to use the HPC of my university to `qsub` an array job of **3** tasks. Each task runs a Matlab code which uses a solver (MOSEK) that exploits multiple **threads** to solve an optimization problem. A parameter can control the number of threads we want the solver to use. The maximum number of t...
I want to use the HPC of my university to qsub an array job of **3** tasks. Each task runs a Matlab code which uses a solver (MOSEK) that exploits multiple **threads** to solve an optimization problem. A parameter can control the number of threads we want the solver to use. The maximum number of threads allowed should never exceed the number of cores. Suppose I want the solver to use **4 threads**. Hence, I should ensure that each task is assigned to a machine with at least 4 cores free. How can I request that in the bash file? How should I count, in turn, the memory usage (i.e., should I declare the memory per core or the total memory)? At the moment this is my bash file #$ -S /bin/bash #$ -l h_vmem=18G #$ -l tmem=18G #$ -l h_rt=480:0:0 #$ -cwd #$ -j y #Run 3 tasks #$ -t 1-3 #$ -N try date hostname #Output the Task ID echo "Task ID is $SGE_TASK_ID" matlab -nodisplay -nodesktop -nojvm -nosplash -r "main_1; ID = $SGE_TASK_ID; f_1; exit"
Star (125 rep)
Sep 9, 2020, 01:50 PM • Last activity: Jun 26, 2021, 04:21 PM
0 votes
1 answers
807 views
Syntax for memory request in a Sun Grid Engine job file
I'm submitting a Matlab job in the cluster of my university using `qsub` after having logged in a node using `ssh`. The job runs out of memory. This is the advice I received to fix my issue: "**Possible solutions are run on a bigger machine or buy more RAM**." What does this mean in practice for my...
I'm submitting a Matlab job in the cluster of my university using qsub after having logged in a node using ssh. The job runs out of memory. This is the advice I received to fix my issue: "**Possible solutions are run on a bigger machine or buy more RAM**." What does this mean in practice for my bash file? Which lines of the bash file control the size of the machine or the RAM? At the moment, in my bash file (see below) I request vmem and tmem. Is any of these RAM? #$ -S /bin/bash #$ -l h_vmem=18G #$ -l tmem=18G #$ -l h_rt=480:0:0 #$ -cwd #$ -j y #Run 600 tasks where each task has a different $SGE_TASK_ID ranging from 1 to 600 #$ -t 1-600 #$ -N try date hostname #Output the Task ID echo "Task ID is $SGE_TASK_ID" matlab -nodisplay -nodesktop -nojvm -nosplash -r "main_1; ID = $SGE_TASK_ID; f_1; exit"
Star (125 rep)
Sep 9, 2020, 11:01 AM • Last activity: Sep 9, 2020, 02:24 PM
1 votes
0 answers
77 views
Determine slot ID for a running job
On a compute node with multiple slots, are the running jobs each explicitly assigned a slot ID as they start, and if so how can the user or submission script see it? To see the job ID, one can use the `$JOB_ID` environment variable within the submission script. What about the slot number? I looked f...
On a compute node with multiple slots, are the running jobs each explicitly assigned a slot ID as they start, and if so how can the user or submission script see it? To see the job ID, one can use the $JOB_ID environment variable within the submission script. What about the slot number? I looked for slot information using qstat -j but the information about the job does not contain any information about which of the slots the job is using. I was hoping there would be an integer variable related to the slot number. EDIT: in the general case, a job might be assigned multiple slots if it is parallelized, so in this case the list of slot IDs could be determined.
feedMe (219 rep)
Feb 6, 2019, 11:47 AM • Last activity: Feb 7, 2019, 09:27 AM
0 votes
1 answers
240 views
Accessing Job ID during gridengine submission
I am using a bash script to submit jobs to gridengine. Is there a way for the script to know the job ID assigned to it by the scheduler?
I am using a bash script to submit jobs to gridengine. Is there a way for the script to know the job ID assigned to it by the scheduler?
feedMe (219 rep)
Feb 7, 2019, 08:30 AM • Last activity: Feb 7, 2019, 09:20 AM
1 votes
2 answers
2206 views
Qsub to any node with more than n cores available
I have a program that is parallelized using MPI. It thinks that it is able to run across multiple nodes on our (CentOS 6.6)-based HPC grid, when in actual fact it only runs successfully on multiple cores *of the same compute node*. e.g. If I `qsub` a job to the grid asking for 20 cores, and Grid Eng...
I have a program that is parallelized using MPI. It thinks that it is able to run across multiple nodes on our (CentOS 6.6)-based HPC grid, when in actual fact it only runs successfully on multiple cores *of the same compute node*. e.g. If I qsub a job to the grid asking for 20 cores, and Grid Engine decides to split it over two different nodes, the program fails. However, if there is a node with 20 cores available, and Grid Engine sends it all to that one, the program runs successfully. The qsub script contains the command #$ -pe mpi 20 to select the number of cores. So at the moment, I do a qstat -f -u "*" to manually identify a compute node with 20 available cores, and submit to that node with qsub -q general.q@node-X-X What I am looking for is a way to tell Grid Engine to wait and only submit the job to a single compute node that has the required number of available cores. This will allow me to automate my job submission. I am considering writing a bash script to parse the qstat -f -u "*" command, but there must be a more elegant solution. I have looked through the qsub manual but am unable to find a suitable flag or command line argument. I'm not able to modify the program itself at this time and I am not a system administrator. Here is some information on the different software versions I have available: MPI/gridengine info: > ompi_info | grep gridengine MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2) Grid engine version is: OGS/GE 2011.11p1
feedMe (219 rep)
May 15, 2017, 09:15 AM • Last activity: Oct 22, 2018, 09:52 AM
0 votes
1 answers
74 views
SSH connections difficulties
I'm using RED HAT 5.9 OS on my grid, having 3 machine: 1 Head node (known as ilmn-qm.ilmn) and 2 compute nodes (aka compute-00-00 and compute-00-01). **Problem is that i cant use SSH from either one of the compute nodes units.** I tried: 1) SSH FROM and TO head node works perfectly. 2) SSH from head...
I'm using RED HAT 5.9 OS on my grid, having 3 machine: 1 Head node (known as ilmn-qm.ilmn) and 2 compute nodes (aka compute-00-00 and compute-00-01). **Problem is that i cant use SSH from either one of the compute nodes units.** I tried: 1) SSH FROM and TO head node works perfectly. 2) SSH from head node to compute nodes works. 3) vise versa SSH from compute nodes to head nodes work as well. 4) Head node define as gateway: [root@compute-00-01 ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 172.20.22.0 * 255.255.255.0 U 0 0 0 eth1 172.20.20.0 * 255.255.255.0 U 0 0 0 eth0 169.254.0.0 * 255.255.0.0 U 0 0 0 eth0 default ilmn-qm.ilmn 0.0.0.0 UG 0 0 0 eth0 5) I've checked that ipv4 forwarding is enabled on the Head node cat /etc/sysctl.conf # Kernel sysctl configuration file for Red Hat Linux # # For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and # sysctl.conf(5) for more details. # Controls IP packet forwarding net.ipv4.ip_forward = 1 # Controls source route verification net.ipv4.conf.default.rp_filter = 1 # Do not accept source routing net.ipv4.conf.default.accept_source_route = 0 # Controls the System Request debugging functionality of the kernel kernel.sysrq = 0 # Controls whether core dumps will append the PID to the core filename # Useful for debugging multi-threaded applications kernel.core_uses_pid = 1 # Controls the use of TCP syncookies net.ipv4.tcp_syncookies = 1 # Controls the maximum size of a message, in bytes kernel.msgmnb = 65536 # Controls the default maxmimum size of a mesage queue kernel.msgmax = 65536 # Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 and yet any ssh attempt ends up with: ssh: connect to host 132.68.107.69 port 22: Connection timed out from Head node: root@ilmn-qm ~ # ip a show 1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:b9 brd ff:ff:ff:ff:ff:ff inet 132.68.106.1/28 brd 132.68.106.15 scope global eth0 inet6 fe80::f24d:a2ff:fe0b:2db9/64 scope link valid_lft forever preferred_lft forever 3: eth1: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:bb brd ff:ff:ff:ff:ff:ff inet 172.20.20.5/24 brd 172.20.20.255 scope global eth1 inet6 fe80::f24d:a2ff:fe0b:2dbb/64 scope link valid_lft forever preferred_lft forever 4: eth2: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:bd brd ff:ff:ff:ff:ff:ff inet 172.20.21.2/24 brd 172.20.21.255 scope global eth2 inet6 fe80::f24d:a2ff:fe0b:2dbd/64 scope link valid_lft forever preferred_lft forever 5: eth3: mtu 1500 qdisc noop qlen 1000 link/ether f0:4d:a2:0b:2d:bf brd ff:ff:ff:ff:ff:ff 6: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 root@ilmn-qm ~ # ip route show 132.68.106.0/28 dev eth0 proto kernel scope link src 132.68.106.1 172.20.21.0/24 dev eth2 proto kernel scope link src 172.20.21.2 172.20.20.0/24 dev eth1 proto kernel scope link src 172.20.20.5 169.254.0.0/16 dev eth2 scope link default via 132.68.106.14 dev eth0 from compute-00-00: [root@compute-00-00 ~]# ip a show 1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:c2 brd ff:ff:ff:ff:ff:ff inet 172.20.20.6/24 brd 172.20.20.255 scope global eth0 inet6 fe80::f24d:a2ff:fe0b:2dc2/64 scope link valid_lft forever preferred_lft forever 3: eth1: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:c4 brd ff:ff:ff:ff:ff:ff inet 172.20.22.6/24 brd 172.20.22.255 scope global eth1 4: eth2: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:c6 brd ff:ff:ff:ff:ff:ff 5: eth3: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether f0:4d:a2:0b:2d:c8 brd ff:ff:ff:ff:ff:ff 6: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 [root@compute-00-00 ~]# ip route show 172.20.22.0/24 dev eth1 proto kernel scope link src 172.20.22.6 172.20.20.0/24 dev eth0 proto kernel scope link src 172.20.20.6 169.254.0.0/16 dev eth1 scope link default via 172.20.20.5 dev eth0 from compute-00-01: [root@compute-00-01 ~]# ip a show 1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 84:2b:2b:f9:9e:11 brd ff:ff:ff:ff:ff:ff inet 172.20.20.7/24 brd 172.20.20.255 scope global eth0 inet6 fe80::862b:2bff:fef9:9e11/64 scope link valid_lft forever preferred_lft forever 3: eth1: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 84:2b:2b:f9:9e:13 brd ff:ff:ff:ff:ff:ff inet 172.20.22.7/24 brd 172.20.22.255 scope global eth1 4: eth2: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 84:2b:2b:f9:9e:15 brd ff:ff:ff:ff:ff:ff 5: eth3: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 84:2b:2b:f9:9e:17 brd ff:ff:ff:ff:ff:ff 6: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 [root@compute-00-01 ~]# ip route show 172.20.22.0/24 dev eth1 proto kernel scope link src 172.20.22.7 172.20.20.0/24 dev eth0 proto kernel scope link src 172.20.20.7 169.254.0.0/16 dev eth0 scope link default via 172.20.20.5 dev eth0
hamaor (3 rep)
Feb 6, 2018, 11:52 AM • Last activity: Feb 16, 2018, 11:50 AM
0 votes
1 answers
622 views
Grid engine/cluster management and job scheduler for Debian/ubuntu
I need to perform large amount of computations on a something resembling a cluster, the hardware and the OS are identical (the OS is ubuntu) but no central management software or grid engine is installed. The web search results in mostly outdated or proprietary software. I hope my question is not to...
I need to perform large amount of computations on a something resembling a cluster, the hardware and the OS are identical (the OS is ubuntu) but no central management software or grid engine is installed. The web search results in mostly outdated or proprietary software. I hope my question is not too general but, what are the cluster management and job scheduling options for Debian and its derivatives? For the general management of the cluster I use cssh but this approach is not very efficient when it comes to job scheduling and monitoring. I have experience using the venerable Sun grid engine RIP. Thanks for reading this!
lazaraza (3 rep)
Jul 15, 2017, 09:18 AM • Last activity: Aug 5, 2017, 09:44 PM
1 votes
1 answers
173 views
Stack screen output into columns to make use of screen width and avoid scrolling
I often use `gridengine`'s qstat command on our HPC cluster but since I have many jobs running on the cluster the output is too long to fit on my screen and I end up doing a lot of scrolling to see the upper section of the output. My terminal has enough space for two columns so it would be nice if t...
I often use gridengine's qstat command on our HPC cluster but since I have many jobs running on the cluster the output is too long to fit on my screen and I end up doing a lot of scrolling to see the upper section of the output. My terminal has enough space for two columns so it would be nice if they output could flow into columns and shown side-by-side. **Example using simple data file:** Obviously this should be general to any screen output so to illustrate here is a simpler example: My file data1.txt contains 100 lines of "This is a test". >> cat data1.txt This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test (etc. until 100th line) >> **Desired output:** >> cat data1.txt | something | something_else -n 2 This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test This is a test (etc. until 50 rows) Of course, it would be nice to specify any arbitrary number of columns. The only similar question/answer that I found was this one but I'm hoping there is a simpler way to do this in one line using pipes and no script files.
feedMe (219 rep)
Feb 22, 2017, 04:12 PM
1 votes
1 answers
268 views
Generalising Grid Engine qsub job file for multiple programs and input file names
I am using Grid Engine on a Linux cluster. I am running many jobs with different programs and different input files. I don't want to create multiple specific job scripts for each pair of program and input file. Instead I want to be able to specify the program name and the input file on the `qsub` li...
I am using Grid Engine on a Linux cluster. I am running many jobs with different programs and different input files. I don't want to create multiple specific job scripts for each pair of program and input file. Instead I want to be able to specify the program name and the input file on the qsub line. Therefore I can use qsub job.sh Where job.sh takes two arguments. This works fine. But there is another twist: my programs are located in a very very long directory which I don't want to type every time I submit a job - so aliases are an obvious choice. So I want to do something like qsub job.sh I initially set the alias in my .bashrc but was getting the error: : command not found So I set the alias in submit.sh. But I am getting the same error. Thoughts on how can I can get the command qsub job.sh $1 $2 to accept aliases also?
cyuut (11 rep)
Jan 19, 2017, 07:11 PM • Last activity: Jan 22, 2017, 05:04 PM
0 votes
1 answers
182 views
Grid Engine for program that needs X11 but doesn't require user input
I have a bash script that calls an executable (some commercial software) in "batch mode". On the commandline, if X is available the program runs to completion and then quits, but if not the program hangs. I think this because: - It works over VNC - It doesn't work over ssh if `ssh -X` has not been s...
I have a bash script that calls an executable (some commercial software) in "batch mode". On the commandline, if X is available the program runs to completion and then quits, but if not the program hangs. I think this because: - It works over VNC - It doesn't work over ssh if ssh -X has not been specified. - It works over ssh if -X has been specified - It doesn't work with Grid Engine. When I qsub the script it just stays on status 'r' indefinitely and I cannot see any output in the .sh.o.XXX or .sh.e.XXX files The upshot is, I want to submit this script to Grid Engine, but I can't! The program never asks for user input when in the so-called "batch mode". Is there some way to provide an X environment in Grid Engine, just to allow the program to complete on its own? I guess one problem is that, since I cannot see the source code, it is difficult to see exactly what the program is asking for.
feedMe (219 rep)
Jun 15, 2016, 08:03 AM • Last activity: Dec 15, 2016, 10:30 AM
0 votes
0 answers
345 views
submitting a job array script on SGE
I am trying to make a job array script to do a particular task for several files. Let's assume as a start for only 2 fastq files. Name : abc.fastq , def.fastq #!/bin/bash file=$(ls -1 *.fastq | tail -n +${SGE_TASK_ID}| head -1) filename=${file%.fastq} awk 'NR % 2 == 0{print substr($1,7,100)};NR % 2...
I am trying to make a job array script to do a particular task for several files. Let's assume as a start for only 2 fastq files. Name : abc.fastq , def.fastq #!/bin/bash file=$(ls -1 *.fastq | tail -n +${SGE_TASK_ID}| head -1) filename=${file%.fastq} awk 'NR % 2 == 0{print substr($1,7,100)};NR % 2 ==1' $file > ${filename}_BR.fastq I submitted the script as: qsub -t 1-2:1 -cwd -j y -N array_job ./jobarray.sh But only 1 file is processed which is abc.fastq. What happened to the def.fastq file?. I provided the -t parameter with 2 jobs and SGE_TASK_ID is declared in my script. Hope to hear from you guys soon.
user3138373 (2589 rep)
Sep 2, 2015, 04:36 PM • Last activity: Aug 25, 2016, 01:39 AM
1 votes
2 answers
6913 views
How do I check if a job is running on cluster using job name (CentOS)
I am running a bash script to submit multiple jobs. The submission of a job only happens if such job is not already running. I want to use an if statement inside my bash script to simply check if "job123" is already running or in the queue. I have tried different options with qstat and qstatus but I...
I am running a bash script to submit multiple jobs. The submission of a job only happens if such job is not already running. I want to use an if statement inside my bash script to simply check if "job123" is already running or in the queue. I have tried different options with qstat and qstatus but I can't seem to check by job name. How can this information be retrieved? Also these outputs are just strings, I also did not have any luck using grep but I think there must be a specific command.
Herman Toothrot (353 rep)
Aug 1, 2016, 03:00 PM • Last activity: Aug 2, 2016, 10:56 AM
0 votes
1 answers
2177 views
How to tell the memory usage of each background job
I am working on SGE, and I am logged on to it. I use qlogin -l mf=30G so as to get onto one compute node. I am running 2 jobs on this compute node in the background. [1] 4408 Running /apps1/sratoolkit/2.3.5-2/bin/fastq-dump --split-files SRR1660.sra & [2] 4415 Running /apps1/sratoolkit/2.3.5-2/bin/f...
I am working on SGE, and I am logged on to it. I use qlogin -l mf=30G so as to get onto one compute node. I am running 2 jobs on this compute node in the background. 4408 Running /apps1/sratoolkit/2.3.5-2/bin/fastq-dump --split-files SRR1660.sra & 4415 Running /apps1/sratoolkit/2.3.5-2/bin/fastq-dump --split-files SRR1661.sra & I want to know how much memory each of my background jobs is consuming out of 30G i assigned in the beginning. Any command to find that out?? Thanks
user3138373 (2589 rep)
Mar 24, 2015, 04:32 PM • Last activity: Mar 24, 2015, 04:41 PM
3 votes
1 answers
1266 views
stdout redirect. sh: resource temporarily unavailable
I have large batches of bash processes. Each bash script invokes executeables which have their stdout redirected to distinct log files. About 5% of the runs end up with: _sh: [name of log]: Resource temporarily unavailable_ I tried to reduce amount of jobs running in parallel, but still the error pe...
I have large batches of bash processes. Each bash script invokes executeables which have their stdout redirected to distinct log files. About 5% of the runs end up with: _sh: [name of log]: Resource temporarily unavailable_ I tried to reduce amount of jobs running in parallel, but still the error persisted on some of the bash scripts. ### Additional info: ### - Ubuntu 14.04 LTS running on VM using ESXi - Happens on a new partition, allocated with gparted and LVM (new logical volume consisting of the entire partition) - The LV is exported using nfs-kernel-server - The LV is also shared to windows using Samba - The LV is formatted using ext4 - I have admin rights on this machine ### More detailed info ### - Everything is run in a cluster, using Sun-Grid-Engine - There are 4 virtual machines: m1, m2, m3, m4 - m1 runs sge master, sge exec, and ldap server - m2, m3, m4 run sge exec - m3 runs nfs-kernel-server, exporting a _home_ folder sitting in logical volume (using LVM) that uses a partition on a local disk, to m1, m2, m4 - m3 has a soft link to the _home_ folder - m1, m2, m4 mount the _home_ folder through fstab, so all machines end up pointing to the same _home_ folder - m3, m2, m4 run ldap clients, connecting to m1 - All jobs are submitted to the cluster through m1 (configured as a submission host) - Jobs fail exclusively on m3 (which exports the disk). Most of the jobs on m3 are passing though. Failures are random, but consistently on m3 alone. - m3 also shares the _home_ via samba to windows clients Any help would be greatly appreciated :) (how to debug, which logs are relevant, how to get more info out of the system, etc...) Thank you in advance!
lev haikin (131 rep)
Dec 31, 2014, 07:47 AM • Last activity: Jan 5, 2015, 09:56 AM
2 votes
2 answers
2093 views
Remotely compile and run program using ssh and screen
I'm trying to compile and run a program remotely. However, I'd like to this within a screen and also I'd like to run this using grid engine on another node after I ssh. Currently I have: ssh me@server screen -R session 'qlogin; cd path; mvn options program' This basically works, but I get a message...
I'm trying to compile and run a program remotely. However, I'd like to this within a screen and also I'd like to run this using grid engine on another node after I ssh. Currently I have: ssh me@server screen -R session 'qlogin; cd path; mvn options program' This basically works, but I get a message saying that I must be connected to a terminal. I read about this and added the -t option to ssh. With that, my command breaks: it seems like I ssh over, screen starts, then doesn't know about the "mvn" command and terminates my session. I'm wondering why this is happening and how to correctly launch jobs from my local machine, within a screen, on a remote node while using grid engine.
akobre01 (121 rep)
Aug 8, 2013, 10:35 PM • Last activity: Sep 18, 2014, 04:20 AM
2 votes
1 answers
10558 views
usr/bin/xterm Xt error: Can't open display: /usr/bin/xterm: DISPLAY is not set?
I'm trying to submit a job to a school server (HPC) with: #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -o ./out_$JOB_ID.txt #$ -e ./err_$JOB_ID.txt #$ -notify #$ -pe orte 1 date pwd ################################## RESULT_DIR=~/Results SCRIPT_FILE=sample_job ################################## . /etc/pro...
I'm trying to submit a job to a school server (HPC) with: #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -o ./out_$JOB_ID.txt #$ -e ./err_$JOB_ID.txt #$ -notify #$ -pe orte 1 date pwd ################################## RESULT_DIR=~/Results SCRIPT_FILE=sample_job ################################## . /etc/profile . /etc/bashrc module load packages/comsol/4.4 module load packages/matlab/r2012b comsol server matlab "sample_job, exit" -nodesktop -mlnosplash /bin/uname -a mkdir $RESULT_DIR/$name cp *.csv $RESULT_DIR/$name The job aborts saying: Sun Jun 8 14:20:21 EDT 2014 COMSOL 4.4 (Build: 150) started listening on port 2036 Use the console command 'close' to exit the program /usr/bin/xterm Xt error: Can't open display: /usr/bin/xterm: DISPLAY is not set Program_did_not_exit_normally Exception: com.comsol.util.exceptions.FlException: Program did not exit normally Messages: Program did not exit normally Stack trace: at com.comsol.mli.application.a.a(Unknown Source) at com.comsol.mli.application.MatlabApplication.doStart(Unknown Source) at com.comsol.util.application.ComsolApplication.doStart(Unknown Source) at com.comsol.util.application.ComsolApplication.doRun(Unknown Source) at com.comsol.bridge.Bridge$2.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR: Could not start COMSOL Application. See log file: /home/.comsol/v44/logs/server2.log java.lang.IllegalStateException: Shutdown in progress at java.lang.ApplicationShutdownHooks.add(Unknown Source) at java.lang.Runtime.addShutdownHook(Unknown Source) at org.apache.catalina.startup.Catalina.start(Catalina.java:699) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:451) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.comsol.util.application.ServerApplication.a(Unknown Source) at com.comsol.util.application.ServerApplication.a(Unknown Source) at com.comsol.util.application.ServerApplication.a(Unknown Source) at com.comsol.util.application.ServerApplication.main(Unknown Source) What might be the reason and how should I fix it?
Sibbs Gambling (1746 rep)
Jun 8, 2014, 06:33 PM • Last activity: Jun 8, 2014, 09:58 PM
3 votes
1 answers
2112 views
stat meanings of computing nodes
I submitted a job to a Linux cluster which uses SGE job scheduler. The job stat is qw for a long time, so I inspected the stats of computing nodes using "qstat -f". I found that many nodes were labelled with stats "d", "adu" and "E". I wonder what these stats mean. The [Grid Engine Man pages][1] lis...
I submitted a job to a Linux cluster which uses SGE job scheduler. The job stat is qw for a long time, so I inspected the stats of computing nodes using "qstat -f". I found that many nodes were labelled with stats "d", "adu" and "E". I wonder what these stats mean. The Grid Engine Man pages listed these stats for filtering queue instances ( -qs {a|c|d|o|s|u|A|C|D|E|S} ), but no further explanation on the meaning of these stats. What do the states mean?
Dejian (828 rep)
May 16, 2014, 01:53 PM • Last activity: May 16, 2014, 02:10 PM
2 votes
1 answers
8156 views
what is the difference between qsub and ./
Can anyone tell me the difference between the following ways of submitting a script: $ qsub script_name.sh and ./script_name.sh What are the differences between the above two ways of submitting a job to a cluster? Also how come sometimes I need to type: $ chmod +x script_name.sh ...before I can type...
Can anyone tell me the difference between the following ways of submitting a script: $ qsub script_name.sh and ./script_name.sh What are the differences between the above two ways of submitting a job to a cluster? Also how come sometimes I need to type: $ chmod +x script_name.sh ...before I can type ./script_name.sh to submit a job? How come sometimes I just need to type qsub script_name.sh? Sorry I'm not very familiar with Unix.
john_w (153 rep)
Feb 25, 2014, 11:00 PM • Last activity: Feb 25, 2014, 11:18 PM
Showing page 1 of 20 total questions