Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
1
votes
1
answers
162
views
Shell Variable Expansion in qsub command through drmaa
I am running a bulk job submission to SGE (Sun Grid Engine) using [python drmaa bindings][1]. For the bulk job submission I am submitting a python script that takes in one argument and is command line executable, through a shebang. To properly parameterize the job bulk submission I am setting enviro...
I am running a bulk job submission to SGE (Sun Grid Engine) using python drmaa bindings .
For the bulk job submission I am submitting a python script that takes in one argument and is command line executable, through a shebang. To properly parameterize the job bulk submission I am setting environment variables to propagate to the python script through the
-v
option. I am trying to do an indirect variable expansion in my zsh environment based on the $TASK_ID
/$SGE_TASK_ID
environment variable that SGE exports during job submittal.
As a minimal reproducible example of the indirect variable expansion I am trying to do something like this, which works in my shell.
export foo1=2
export num=1
echo $(tmp=foo$num; echo ${(P)tmp})
which produces 2
The example script job_script.py
#! /usr/bin/python
import argparse
import os
parser = argparse.ArgumentParser()
parser.add_argument("input_path", type=os.path.realpath)
def main(input_path):
# do stuff
...
if __name__ == "__main__":
args = parser.parse_args
input_path = args.input_path
main(input_path)
The example drmaa submittal script
import os
# add path to libs
os.environ["DMRAA_LIBRARY_PATH"] = "path to DMRAA shared object"
os.environ["SGE_ROOT"] = "path to SGE root directory"
import drmaa
input_dir_suffixes = [1, 2, 5, 7, 10, 11]
INPUT_BASE_DIR = "/home/mel/input_data"
base_qsub_options = {
"P": "project",
"q": "queue",
"b": "y", # means is an executable
"shell": "y", # start up shell
}
native_specification = " ".join(f"-{k} {v}" for k,v in base_qsub_options.items())
remote_command = "job_script.py"
num_task_ids = len(input_dir_suffixes)
task_start = 1
task_stop = num_task_ids + 1
task_step = 1
task_id_zip = zip(range(1, num_task_ids + 1), input_dir_suffixes)
task_id_env_vars = {
f"TASK_ID_{task_id}_SUFFIX": str(suffix) for task_id, suffix in task_id_zip
}
io_task_id = r"$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp)})"
arg_task_id = r"$(tmp=SUFFIX_TASK_ID_$SGE_TASK_ID; echo ${(P)tmp)})"
with drmaa.Session() as session:
template = session.createJobTemplate()
template.nativeSpecification = native_specification
template.remoteCommand = remote_command
template.jobEnvironment = task_id_env_vars
template.outputPath = f":{INPUT_BASE_DIR}/output/{io_task_id}.o"
template.outputPath = f":{INPUT_BASE_DIR}/error/{io_task_id}.e"
args_list = [f"{INPUT_BASE_DIR}/data{arg_task_id}"]
template.args = args_list
session.runBulkJobs(template, task_start, task_stop - 1, task_step)
session.deleteJobTemplate(template)
Apologize if there is a syntax error, I have to hand copy this, as its on a different system.
With the submission done
If I do a qstat -j
on the job number
I get the following settings displayed
sge_o_shell: /usr/bin/zsh
stderr_path_list: NONE::/home/mel/input_data/error_log/$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp}).e
stdout_path_list: NONE::/home/mel/input_data/output_log/$(tmp=SUFFIX_TASK_ID_$TASK_ID; echo ${(P)tmp}).o
job_args: /home/mel/input_data/data$(tmp=SUFFIX_TASK_ID$SGE_TASK_ID; echo ${(P)tmp})
script_file: job_script.py
env_list:
SUFFIX_TASK_ID_1=1,SUFFIX_TASK_ID_2=2,SUFFIX_TASK_ID_3=5,SUFFIX_TASK_ID_4=7,SUFFIX_TASK_ID_5=10,SUFFIX_TASK_ID_6=11
error logs and output logs get made respectively but there is only a partial expansion.
Examples
$(tmp=SUFFIX_TASK_ID1; echo ${(P)tmp}).e
$(tmp=SUFFIX_TASK_ID1; echo ${(P)tmp}).o
If we cat
the error logs we see Illegal variable name
Is what I am trying to do possible?
So I am presuming something somewhere is not activating my zsh correctly.
Melendowski
(111 rep)
May 31, 2023, 10:09 PM
• Last activity: Jun 4, 2023, 11:45 AM
0
votes
0
answers
32
views
Small TaskQueue shared on two computers
There are two computers with 12 physical cores each. Computer A should accept jobs and distribute them among A and B I want to setup Computers A and B such that - A will accept jobs (via ssh) and distribute them among A and B (more or less intelligently) - if possible I'd like to block 4 cores on ea...
There are two computers with 12 physical cores each.
Computer A should accept jobs and distribute them among A and B
I want to setup Computers A and B such that
- A will accept jobs (via ssh) and distribute them among A and B (more or less intelligently)
- if possible I'd like to block 4 cores on each computer as "personal requiremets"
Jobs are expected to be either python scripts or executables written in c++ (can involve mpi code).
I have read of slurm and the Sun Grid Engine but that seems a bit too powerful/complicated for this use case (I don't want to spend a week reading how to do it and troubleshooting). Is there an easier solution that satisfies the requirements?
infinitezero
(207 rep)
Mar 14, 2022, 03:12 PM
• Last activity: Mar 14, 2022, 04:38 PM
0
votes
2
answers
1592
views
Syntax for number of cores in a Sun Grid Engine job file
I want to use the HPC of my university to `qsub` an array job of **3** tasks. Each task runs a Matlab code which uses a solver (MOSEK) that exploits multiple **threads** to solve an optimization problem. A parameter can control the number of threads we want the solver to use. The maximum number of t...
I want to use the HPC of my university to
qsub
an array job of **3** tasks.
Each task runs a Matlab code which uses a solver (MOSEK) that exploits multiple **threads** to solve an optimization problem. A parameter can control the number of threads we want the solver to use. The maximum number of threads allowed should never exceed the number of cores.
Suppose I want the solver to use **4 threads**. Hence, I should ensure that each task is assigned to a machine with at least 4 cores free. How can I request that in the bash file? How should I count, in turn, the memory usage (i.e., should I declare the memory per core or the total memory)?
At the moment this is my bash file
#$ -S /bin/bash
#$ -l h_vmem=18G
#$ -l tmem=18G
#$ -l h_rt=480:0:0
#$ -cwd
#$ -j y
#Run 3 tasks
#$ -t 1-3
#$ -N try
date
hostname
#Output the Task ID
echo "Task ID is $SGE_TASK_ID"
matlab -nodisplay -nodesktop -nojvm -nosplash -r "main_1; ID = $SGE_TASK_ID; f_1; exit"
Star
(125 rep)
Sep 9, 2020, 01:50 PM
• Last activity: Jun 26, 2021, 04:21 PM
0
votes
1
answers
807
views
Syntax for memory request in a Sun Grid Engine job file
I'm submitting a Matlab job in the cluster of my university using `qsub` after having logged in a node using `ssh`. The job runs out of memory. This is the advice I received to fix my issue: "**Possible solutions are run on a bigger machine or buy more RAM**." What does this mean in practice for my...
I'm submitting a Matlab job in the cluster of my university using
qsub
after having logged in a node using ssh
.
The job runs out of memory. This is the advice I received to fix my issue: "**Possible solutions are run on a bigger machine or buy more RAM**."
What does this mean in practice for my bash file? Which lines of the bash file control the size of the machine or the RAM? At the moment, in my bash file (see below) I request vmem
and tmem
. Is any of these RAM?
#$ -S /bin/bash
#$ -l h_vmem=18G
#$ -l tmem=18G
#$ -l h_rt=480:0:0
#$ -cwd
#$ -j y
#Run 600 tasks where each task has a different $SGE_TASK_ID ranging from 1 to 600
#$ -t 1-600
#$ -N try
date
hostname
#Output the Task ID
echo "Task ID is $SGE_TASK_ID"
matlab -nodisplay -nodesktop -nojvm -nosplash -r "main_1; ID = $SGE_TASK_ID; f_1; exit"
Star
(125 rep)
Sep 9, 2020, 11:01 AM
• Last activity: Sep 9, 2020, 02:24 PM
1
votes
0
answers
77
views
Determine slot ID for a running job
On a compute node with multiple slots, are the running jobs each explicitly assigned a slot ID as they start, and if so how can the user or submission script see it? To see the job ID, one can use the `$JOB_ID` environment variable within the submission script. What about the slot number? I looked f...
On a compute node with multiple slots, are the running jobs each explicitly assigned a slot ID as they start, and if so how can the user or submission script see it?
To see the job ID, one can use the
$JOB_ID
environment variable within the submission script. What about the slot number?
I looked for slot information using qstat -j
but the information about the job does not contain any information about which of the slots the job is using. I was hoping there would be an integer variable related to the slot number.
EDIT: in the general case, a job might be assigned multiple slots if it is parallelized, so in this case the list of slot IDs could be determined.
feedMe
(219 rep)
Feb 6, 2019, 11:47 AM
• Last activity: Feb 7, 2019, 09:27 AM
0
votes
1
answers
240
views
Accessing Job ID during gridengine submission
I am using a bash script to submit jobs to gridengine. Is there a way for the script to know the job ID assigned to it by the scheduler?
I am using a bash script to submit jobs to gridengine.
Is there a way for the script to know the job ID assigned to it by the scheduler?
feedMe
(219 rep)
Feb 7, 2019, 08:30 AM
• Last activity: Feb 7, 2019, 09:20 AM
1
votes
2
answers
2206
views
Qsub to any node with more than n cores available
I have a program that is parallelized using MPI. It thinks that it is able to run across multiple nodes on our (CentOS 6.6)-based HPC grid, when in actual fact it only runs successfully on multiple cores *of the same compute node*. e.g. If I `qsub` a job to the grid asking for 20 cores, and Grid Eng...
I have a program that is parallelized using MPI. It thinks that it is able to run across multiple nodes on our (CentOS 6.6)-based HPC grid, when in actual fact it only runs successfully on multiple cores *of the same compute node*.
e.g. If I
qsub
a job to the grid asking for 20 cores, and Grid Engine decides to split it over two different nodes, the program fails. However, if there is a node with 20 cores available, and Grid Engine sends it all to that one, the program runs successfully. The qsub script contains the command #$ -pe mpi 20
to select the number of cores.
So at the moment, I do a qstat -f -u "*"
to manually identify a compute node with 20 available cores, and submit to that node with qsub -q general.q@node-X-X
What I am looking for is a way to tell Grid Engine to wait and only submit the job to a single compute node that has the required number of available cores. This will allow me to automate my job submission.
I am considering writing a bash script to parse the qstat -f -u "*"
command, but there must be a more elegant solution. I have looked through the qsub manual but am unable to find a suitable flag or command line argument.
I'm not able to modify the program itself at this time and I am not a system administrator.
Here is some information on the different software versions I have available:
MPI/gridengine info:
> ompi_info | grep gridengine
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)
Grid engine version is: OGS/GE 2011.11p1
feedMe
(219 rep)
May 15, 2017, 09:15 AM
• Last activity: Oct 22, 2018, 09:52 AM
0
votes
1
answers
74
views
SSH connections difficulties
I'm using RED HAT 5.9 OS on my grid, having 3 machine: 1 Head node (known as ilmn-qm.ilmn) and 2 compute nodes (aka compute-00-00 and compute-00-01). **Problem is that i cant use SSH from either one of the compute nodes units.** I tried: 1) SSH FROM and TO head node works perfectly. 2) SSH from head...
I'm using RED HAT 5.9 OS on my grid, having 3 machine:
1 Head node (known as ilmn-qm.ilmn) and 2 compute nodes (aka compute-00-00 and compute-00-01).
**Problem is that i cant use SSH from either one of the compute nodes units.**
I tried:
1) SSH FROM and TO head node works perfectly.
2) SSH from head node to compute nodes works.
3) vise versa SSH from compute nodes to head nodes work as well.
4) Head node define as gateway:
[root@compute-00-01 ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
172.20.22.0 * 255.255.255.0 U 0 0 0 eth1
172.20.20.0 * 255.255.255.0 U 0 0 0 eth0
169.254.0.0 * 255.255.0.0 U 0 0 0 eth0
default ilmn-qm.ilmn 0.0.0.0 UG 0 0 0 eth0
5) I've checked that ipv4 forwarding is enabled on the Head node
cat /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.
# Controls IP packet forwarding
net.ipv4.ip_forward = 1
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
and yet any ssh attempt ends up with:
ssh: connect to host 132.68.107.69 port 22: Connection timed out
from Head node:
root@ilmn-qm ~ # ip a show
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:b9 brd ff:ff:ff:ff:ff:ff
inet 132.68.106.1/28 brd 132.68.106.15 scope global eth0
inet6 fe80::f24d:a2ff:fe0b:2db9/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:bb brd ff:ff:ff:ff:ff:ff
inet 172.20.20.5/24 brd 172.20.20.255 scope global eth1
inet6 fe80::f24d:a2ff:fe0b:2dbb/64 scope link
valid_lft forever preferred_lft forever
4: eth2: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:bd brd ff:ff:ff:ff:ff:ff
inet 172.20.21.2/24 brd 172.20.21.255 scope global eth2
inet6 fe80::f24d:a2ff:fe0b:2dbd/64 scope link
valid_lft forever preferred_lft forever
5: eth3: mtu 1500 qdisc noop qlen 1000
link/ether f0:4d:a2:0b:2d:bf brd ff:ff:ff:ff:ff:ff
6: sit0: mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
root@ilmn-qm ~ # ip route show
132.68.106.0/28 dev eth0 proto kernel scope link src 132.68.106.1
172.20.21.0/24 dev eth2 proto kernel scope link src 172.20.21.2
172.20.20.0/24 dev eth1 proto kernel scope link src 172.20.20.5
169.254.0.0/16 dev eth2 scope link
default via 132.68.106.14 dev eth0
from compute-00-00:
[root@compute-00-00 ~]# ip a show
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:c2 brd ff:ff:ff:ff:ff:ff
inet 172.20.20.6/24 brd 172.20.20.255 scope global eth0
inet6 fe80::f24d:a2ff:fe0b:2dc2/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:c4 brd ff:ff:ff:ff:ff:ff
inet 172.20.22.6/24 brd 172.20.22.255 scope global eth1
4: eth2: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:c6 brd ff:ff:ff:ff:ff:ff
5: eth3: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether f0:4d:a2:0b:2d:c8 brd ff:ff:ff:ff:ff:ff
6: sit0: mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
[root@compute-00-00 ~]# ip route show
172.20.22.0/24 dev eth1 proto kernel scope link src 172.20.22.6
172.20.20.0/24 dev eth0 proto kernel scope link src 172.20.20.6
169.254.0.0/16 dev eth1 scope link
default via 172.20.20.5 dev eth0
from compute-00-01:
[root@compute-00-01 ~]# ip a show
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 84:2b:2b:f9:9e:11 brd ff:ff:ff:ff:ff:ff
inet 172.20.20.7/24 brd 172.20.20.255 scope global eth0
inet6 fe80::862b:2bff:fef9:9e11/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 84:2b:2b:f9:9e:13 brd ff:ff:ff:ff:ff:ff
inet 172.20.22.7/24 brd 172.20.22.255 scope global eth1
4: eth2: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 84:2b:2b:f9:9e:15 brd ff:ff:ff:ff:ff:ff
5: eth3: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 84:2b:2b:f9:9e:17 brd ff:ff:ff:ff:ff:ff
6: sit0: mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
[root@compute-00-01 ~]# ip route show
172.20.22.0/24 dev eth1 proto kernel scope link src 172.20.22.7
172.20.20.0/24 dev eth0 proto kernel scope link src 172.20.20.7
169.254.0.0/16 dev eth0 scope link
default via 172.20.20.5 dev eth0
hamaor
(3 rep)
Feb 6, 2018, 11:52 AM
• Last activity: Feb 16, 2018, 11:50 AM
0
votes
1
answers
622
views
Grid engine/cluster management and job scheduler for Debian/ubuntu
I need to perform large amount of computations on a something resembling a cluster, the hardware and the OS are identical (the OS is ubuntu) but no central management software or grid engine is installed. The web search results in mostly outdated or proprietary software. I hope my question is not to...
I need to perform large amount of computations on a something resembling a cluster, the hardware and the OS are identical (the OS is ubuntu) but no central management software or grid engine is installed. The web search results in mostly outdated or proprietary software.
I hope my question is not too general but, what are the cluster management and job scheduling options for Debian and its derivatives?
For the general management of the cluster I use cssh but this approach is not very efficient when it comes to job scheduling and monitoring. I have experience using the venerable Sun grid engine RIP.
Thanks for reading this!
lazaraza
(3 rep)
Jul 15, 2017, 09:18 AM
• Last activity: Aug 5, 2017, 09:44 PM
1
votes
1
answers
173
views
Stack screen output into columns to make use of screen width and avoid scrolling
I often use `gridengine`'s qstat command on our HPC cluster but since I have many jobs running on the cluster the output is too long to fit on my screen and I end up doing a lot of scrolling to see the upper section of the output. My terminal has enough space for two columns so it would be nice if t...
I often use
gridengine
's qstat command on our HPC cluster but since I have many jobs running on the cluster the output is too long to fit on my screen and I end up doing a lot of scrolling to see the upper section of the output. My terminal has enough space for two columns so it would be nice if they output could flow into columns and shown side-by-side.
**Example using simple data file:**
Obviously this should be general to any screen output so to illustrate here is a simpler example:
My file data1.txt
contains 100 lines of "This is a test"
.
>> cat data1.txt
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
This is a test
(etc. until 100th line)
>>
**Desired output:**
>> cat data1.txt | something | something_else -n 2
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
This is a test This is a test
(etc. until 50 rows)
Of course, it would be nice to specify any arbitrary number of columns.
The only similar question/answer that I found was this one but I'm hoping there is a simpler way to do this in one line using pipes and no script files.
feedMe
(219 rep)
Feb 22, 2017, 04:12 PM
1
votes
1
answers
268
views
Generalising Grid Engine qsub job file for multiple programs and input file names
I am using Grid Engine on a Linux cluster. I am running many jobs with different programs and different input files. I don't want to create multiple specific job scripts for each pair of program and input file. Instead I want to be able to specify the program name and the input file on the `qsub` li...
I am using Grid Engine on a Linux cluster. I am running many jobs with different programs and different input files. I don't want to create multiple specific job scripts for each pair of program and input file. Instead I want to be able to specify the program name and the input file on the
qsub
line.
Therefore I can use qsub job.sh
Where job.sh
takes two arguments. This works fine. But there is another twist: my programs are located in a very very long directory which I don't want to type every time I submit a job - so aliases are an obvious choice.
So I want to do something like qsub job.sh
I initially set the alias in my .bashrc
but was getting the error: : command not found
So I set the alias in submit.sh
. But I am getting the same error.
Thoughts on how can I can get the command qsub job.sh $1 $2
to accept aliases also?
cyuut
(11 rep)
Jan 19, 2017, 07:11 PM
• Last activity: Jan 22, 2017, 05:04 PM
0
votes
1
answers
182
views
Grid Engine for program that needs X11 but doesn't require user input
I have a bash script that calls an executable (some commercial software) in "batch mode". On the commandline, if X is available the program runs to completion and then quits, but if not the program hangs. I think this because: - It works over VNC - It doesn't work over ssh if `ssh -X` has not been s...
I have a bash script that calls an executable (some commercial software) in "batch mode". On the commandline, if X is available the program runs to completion and then quits, but if not the program hangs.
I think this because:
- It works over VNC
- It doesn't work over ssh if
ssh -X
has not been specified.
- It works over ssh if -X has been specified
- It doesn't work with Grid Engine. When I qsub the script it just stays on status 'r' indefinitely and I cannot see any output in the .sh.o.XXX or .sh.e.XXX files
The upshot is, I want to submit this script to Grid Engine, but I can't!
The program never asks for user input when in the so-called "batch mode".
Is there some way to provide an X environment in Grid Engine, just to allow the program to complete on its own?
I guess one problem is that, since I cannot see the source code, it is difficult to see exactly what the program is asking for.
feedMe
(219 rep)
Jun 15, 2016, 08:03 AM
• Last activity: Dec 15, 2016, 10:30 AM
0
votes
0
answers
345
views
submitting a job array script on SGE
I am trying to make a job array script to do a particular task for several files. Let's assume as a start for only 2 fastq files. Name : abc.fastq , def.fastq #!/bin/bash file=$(ls -1 *.fastq | tail -n +${SGE_TASK_ID}| head -1) filename=${file%.fastq} awk 'NR % 2 == 0{print substr($1,7,100)};NR % 2...
I am trying to make a job array script to do a particular task for several files. Let's assume as a start for only 2 fastq files. Name : abc.fastq , def.fastq
#!/bin/bash
file=$(ls -1 *.fastq | tail -n +${SGE_TASK_ID}| head -1)
filename=${file%.fastq}
awk 'NR % 2 == 0{print substr($1,7,100)};NR % 2 ==1' $file > ${filename}_BR.fastq
I submitted the script as:
qsub -t 1-2:1 -cwd -j y -N array_job ./jobarray.sh
But only 1 file is processed which is abc.fastq. What happened to the def.fastq file?. I provided the -t parameter with 2 jobs and SGE_TASK_ID is declared in my script.
Hope to hear from you guys soon.
user3138373
(2589 rep)
Sep 2, 2015, 04:36 PM
• Last activity: Aug 25, 2016, 01:39 AM
1
votes
2
answers
6913
views
How do I check if a job is running on cluster using job name (CentOS)
I am running a bash script to submit multiple jobs. The submission of a job only happens if such job is not already running. I want to use an if statement inside my bash script to simply check if "job123" is already running or in the queue. I have tried different options with qstat and qstatus but I...
I am running a bash script to submit multiple jobs. The submission of a job only happens if such job is not already running. I want to use an if statement inside my bash script to simply check if "job123" is already running or in the queue.
I have tried different options with qstat and qstatus but I can't seem to check by job name. How can this information be retrieved? Also these outputs are just strings, I also did not have any luck using grep but I think there must be a specific command.
Herman Toothrot
(353 rep)
Aug 1, 2016, 03:00 PM
• Last activity: Aug 2, 2016, 10:56 AM
0
votes
1
answers
2177
views
How to tell the memory usage of each background job
I am working on SGE, and I am logged on to it. I use qlogin -l mf=30G so as to get onto one compute node. I am running 2 jobs on this compute node in the background. [1] 4408 Running /apps1/sratoolkit/2.3.5-2/bin/fastq-dump --split-files SRR1660.sra & [2] 4415 Running /apps1/sratoolkit/2.3.5-2/bin/f...
I am working on SGE, and I am logged on to it. I use qlogin -l mf=30G so as to get onto one compute node.
I am running 2 jobs on this compute node in the background.
4408 Running /apps1/sratoolkit/2.3.5-2/bin/fastq-dump --split-files SRR1660.sra &
4415 Running /apps1/sratoolkit/2.3.5-2/bin/fastq-dump --split-files SRR1661.sra &
I want to know how much memory each of my background jobs is consuming out of 30G i assigned in the beginning. Any command to find that out??
Thanks
user3138373
(2589 rep)
Mar 24, 2015, 04:32 PM
• Last activity: Mar 24, 2015, 04:41 PM
3
votes
1
answers
1266
views
stdout redirect. sh: resource temporarily unavailable
I have large batches of bash processes. Each bash script invokes executeables which have their stdout redirected to distinct log files. About 5% of the runs end up with: _sh: [name of log]: Resource temporarily unavailable_ I tried to reduce amount of jobs running in parallel, but still the error pe...
I have large batches of bash processes.
Each bash script invokes executeables which have their stdout redirected to distinct log files.
About 5% of the runs end up with:
_sh: [name of log]: Resource temporarily unavailable_
I tried to reduce amount of jobs running in parallel, but still the error persisted on some of the bash scripts.
### Additional info: ###
- Ubuntu 14.04 LTS running on VM using ESXi
- Happens on a new partition, allocated with gparted and LVM (new logical volume consisting of the entire partition)
- The LV is exported using nfs-kernel-server
- The LV is also shared to windows using Samba
- The LV is formatted using ext4
- I have admin rights on this machine
### More detailed info ###
- Everything is run in a cluster, using Sun-Grid-Engine
- There are 4 virtual machines: m1, m2, m3, m4
- m1 runs sge master, sge exec, and ldap server
- m2, m3, m4 run sge exec
- m3 runs nfs-kernel-server, exporting a _home_ folder sitting in logical volume (using LVM) that uses a partition on a local disk, to m1, m2, m4
- m3 has a soft link to the _home_ folder
- m1, m2, m4 mount the _home_ folder through fstab, so all machines end up pointing to the same _home_ folder
- m3, m2, m4 run ldap clients, connecting to m1
- All jobs are submitted to the cluster through m1 (configured as a submission host)
- Jobs fail exclusively on m3 (which exports the disk). Most of the jobs on m3 are passing though. Failures are random, but consistently on m3 alone.
- m3 also shares the _home_ via samba to windows clients
Any help would be greatly appreciated :) (how to debug, which logs are relevant, how to get more info out of the system, etc...)
Thank you in advance!
lev haikin
(131 rep)
Dec 31, 2014, 07:47 AM
• Last activity: Jan 5, 2015, 09:56 AM
2
votes
2
answers
2093
views
Remotely compile and run program using ssh and screen
I'm trying to compile and run a program remotely. However, I'd like to this within a screen and also I'd like to run this using grid engine on another node after I ssh. Currently I have: ssh me@server screen -R session 'qlogin; cd path; mvn options program' This basically works, but I get a message...
I'm trying to compile and run a program remotely. However, I'd like to this within a screen and also I'd like to run this using grid engine on another node after I ssh. Currently I have:
ssh me@server screen -R session 'qlogin; cd path; mvn options program'
This basically works, but I get a message saying that I must be connected to a terminal. I read about this and added the -t option to ssh. With that, my command breaks: it seems like I ssh over, screen starts, then doesn't know about the "mvn" command and terminates my session.
I'm wondering why this is happening and how to correctly launch jobs from my local machine, within a screen, on a remote node while using grid engine.
akobre01
(121 rep)
Aug 8, 2013, 10:35 PM
• Last activity: Sep 18, 2014, 04:20 AM
2
votes
1
answers
10558
views
usr/bin/xterm Xt error: Can't open display: /usr/bin/xterm: DISPLAY is not set?
I'm trying to submit a job to a school server (HPC) with: #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -o ./out_$JOB_ID.txt #$ -e ./err_$JOB_ID.txt #$ -notify #$ -pe orte 1 date pwd ################################## RESULT_DIR=~/Results SCRIPT_FILE=sample_job ################################## . /etc/pro...
I'm trying to submit a job to a school server (HPC) with:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -o ./out_$JOB_ID.txt
#$ -e ./err_$JOB_ID.txt
#$ -notify
#$ -pe orte 1
date
pwd
##################################
RESULT_DIR=~/Results
SCRIPT_FILE=sample_job
##################################
. /etc/profile
. /etc/bashrc
module load packages/comsol/4.4
module load packages/matlab/r2012b
comsol server matlab "sample_job, exit" -nodesktop -mlnosplash
/bin/uname -a
mkdir $RESULT_DIR/$name
cp *.csv $RESULT_DIR/$name
The job aborts saying:
Sun Jun 8 14:20:21 EDT 2014
COMSOL 4.4 (Build: 150) started listening on port 2036
Use the console command 'close' to exit the program
/usr/bin/xterm Xt error: Can't open display:
/usr/bin/xterm: DISPLAY is not set
Program_did_not_exit_normally
Exception:
com.comsol.util.exceptions.FlException: Program did not exit normally
Messages:
Program did not exit normally
Stack trace:
at com.comsol.mli.application.a.a(Unknown Source)
at com.comsol.mli.application.MatlabApplication.doStart(Unknown Source)
at com.comsol.util.application.ComsolApplication.doStart(Unknown Source)
at com.comsol.util.application.ComsolApplication.doRun(Unknown Source)
at com.comsol.bridge.Bridge$2.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR: Could not start COMSOL Application. See log file: /home/.comsol/v44/logs/server2.log
java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(Unknown Source)
at java.lang.Runtime.addShutdownHook(Unknown Source)
at org.apache.catalina.startup.Catalina.start(Catalina.java:699)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:451)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.comsol.util.application.ServerApplication.a(Unknown Source)
at com.comsol.util.application.ServerApplication.a(Unknown Source)
at com.comsol.util.application.ServerApplication.a(Unknown Source)
at com.comsol.util.application.ServerApplication.main(Unknown Source)
What might be the reason and how should I fix it?
Sibbs Gambling
(1746 rep)
Jun 8, 2014, 06:33 PM
• Last activity: Jun 8, 2014, 09:58 PM
3
votes
1
answers
2112
views
stat meanings of computing nodes
I submitted a job to a Linux cluster which uses SGE job scheduler. The job stat is qw for a long time, so I inspected the stats of computing nodes using "qstat -f". I found that many nodes were labelled with stats "d", "adu" and "E". I wonder what these stats mean. The [Grid Engine Man pages][1] lis...
I submitted a job to a Linux cluster which uses SGE job scheduler. The job stat is qw for a long time, so I inspected the stats of computing nodes using "qstat -f".
I found that many nodes were labelled with stats "d", "adu" and "E". I wonder what these stats mean. The Grid Engine Man pages listed these stats for filtering queue instances (
-qs {a|c|d|o|s|u|A|C|D|E|S}
), but no further explanation on the meaning of these stats.
What do the states mean?
Dejian
(828 rep)
May 16, 2014, 01:53 PM
• Last activity: May 16, 2014, 02:10 PM
2
votes
1
answers
8156
views
what is the difference between qsub and ./
Can anyone tell me the difference between the following ways of submitting a script: $ qsub script_name.sh and ./script_name.sh What are the differences between the above two ways of submitting a job to a cluster? Also how come sometimes I need to type: $ chmod +x script_name.sh ...before I can type...
Can anyone tell me the difference between the following ways of submitting a script:
$ qsub script_name.sh
and
./script_name.sh
What are the differences between the above two ways of submitting a job to a cluster?
Also how come sometimes I need to type:
$ chmod +x script_name.sh
...before I can type
./script_name.sh
to submit a job? How come sometimes I just need to type qsub script_name.sh
?
Sorry I'm not very familiar with Unix.
john_w
(153 rep)
Feb 25, 2014, 11:00 PM
• Last activity: Feb 25, 2014, 11:18 PM
Showing page 1 of 20 total questions