Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes

3 answers

2151 views

Option to cancel job by jobname not ID?

Is it possible to delete multiple [TORQUE][1] batch jobs with the same name, instead of typing in each individual job number? I do not want to use the `qdel -u username` option, as I have other jobs that I want to spare. It is +100 individual jobs so would rather not type in each jobnumber if there'...

                                  Is it possible to delete multiple TORQUE  batch jobs with the same name, instead of typing in each individual job number? 

I do not want to use the qdel -u username option, as I have other jobs that I want to spare. It is +100 individual jobs so would rather not type in each jobnumber if there's a quicker option!

I found this option online;
~~~
qdel wc_jobname
~~~
But it returns the error, 

> qdel: illegally formed job identifier: wc_jobname
                                

Leucine (11 rep)

Nov 21, 2019, 11:51 AM • Last activity: Jun 9, 2024, 09:39 PM

1 votes

0 answers

699 views

CentOS 7: Python2.7.5 got overwritten and can't use yum or python2

centos python yum rpm torque

I was installing the Torque queue system onto my CentOS 7 server, and after running the config command for the installation, my default python2 version (2.7.5) got overwritten and replaced by python2.7.18. Python2.7.18 was installed onto my server but with the intent of turning it into a virtual environment so I can integrate it with the environment module system. This had not been done yet, but Python2.7.18 was never added to the path nor was any command involving python ever involved. After looking through my bash history, I noticed that my last working yum command was right before running the config command for Torque and after that point, it stopped working. I attempted to reinstall 2.7.5 through RPM but even after installing all the necessary dependencies and the python2 RPM, I am presented with this error:

Could not find platform independent libraries 

Could not find platform dependent libraries 

Consider setting $PYTHONHOME to [:]

ImportError: No module named site

I'm at my wits end with this and can not figure out how to get python2 to start working again, any help would be appreciated.

mharper10114 (11 rep)

Feb 12, 2022, 05:41 PM • Last activity: Feb 14, 2022, 08:08 PM

6 votes

1 answers

7205 views

How can a service with PrivateTmp=true access a unix socket in the /tmp directory (e.g. to submit Torque jobs from PHP running in Apache)

systemd apache-httpd unix-sockets torque

We have a webserver that performs scientific calculations submitted by users. The calculations can be long-running, so we use The **[Torque][1]** resource manager (aka pbs_server) to distribute/schedule them on a handful of compute nodes. Torque makes use of a unix domain socket in the `/tmp` direct...

                                  We have a webserver that performs scientific calculations submitted by users. The calculations can be long-running, so we use The **Torque ** resource manager (aka pbs_server) to distribute/schedule them on a handful of compute nodes.  Torque makes use of a unix domain socket in the /tmp directory for communication but the http server (and processes forked from it) can't access the true /tmp directory, so to those processes, the socket appears to be missing, resulting in an error.  

The Details:

 - The webserver is running Apache, which runs as a service with the systemd property PrivateTmp=true set. This casuses the service to have its own /tmp directory unrelated to the "true" root /tmp. 
 - The jobs are actually submitted from PHP (running in the Apache process). PHP makes a system call to qsub, which is a Torque command to submit a job. Because qsub is called from PHP, it inherits the "fake" /tmp directory from Apache.
 - qsub internally attempts to connect to the unix socket located at /tmp/trqauthd-unix.  But since it doesn't see the real /tmp directory, it fails with the following error:   `Error in connection to trqauthd (15137)-[could not connect to unix
   socket /tmp/trqauthd-unix: 2]`

The only solution I could achieve was to edit the httpd.service file under systemd and change PrivateTmp to false. This DID fix the problem. But I'd rather not do this because (I assume) PrivateTmp was set to true for good reason.

What I want to know is whether there is any way to have the socket created in a different location or to somehow make a link to the socket that could be used from within Apache (and its forked processes).  

Creating a link to the socket is trivial, but it doesn't solve the problem because I don't know of any way to configure qsub to look for the socket in a different location.

Note that the socket is created by the trqauthd service (a Torque program that performs user authorization for running jobs). The documentation for trqauthd mentions (in an obscure note) that the location of the socket can be configured, but there is no indication in any of the documentation about how that can be achieved (and more importantly, how to let qsub and other commands know about the new location).

Thanks for any suggestions that might help me find a way to submit jobs to Torque from PHP *without* disabling PrivateTmp for Apache.

drwatsoncode (293 rep)

Nov 10, 2016, 06:57 AM • Last activity: Dec 20, 2018, 07:13 AM

1 votes

0 answers

59 views

Create a file at time 't' after PBS job execution begins

job-control batch-jobs torque

I submit a PBS job for `02:00:00 hours`. I need to create a file in the PBS working directory at a specified time ***t*** `(say 01:30:00 hours)` after the job has begun to ensure that the job terminates smoothly for any subsequent restart. For example something as follows echo "LABORT" > file1.txt I...

                                  I submit a PBS job for 02:00:00 hours. I need to create a file in the PBS working directory at a specified time ***t*** (say 01:30:00 hours) after the job has begun to ensure that the job terminates smoothly for any subsequent restart. For example something as follows

    echo "LABORT" > file1.txt

I do not want to rely on PBS for the file creation through chaining jobs, since  I want to create it exactly at the specified time. Is there a clean and automated way to achieve this while running many such jobs?

EDIT 1:
I believe crontab can perform the task, but how to perform the following

 - monitor the jobs under a given username,  
 - get the job start time and working directory
 - Add the time t to start time and Pass the modified
 - time and working directory to crontab
                                

rambalachandran (435 rep)

Dec 21, 2016, 01:13 PM • Last activity: Dec 21, 2016, 01:22 PM

1 votes

1 answers

52 views

How to disable automatic update of the correct GPU count for each MOM node in Torque?

cluster gpu torque

I have small installation of Toque 4.2.9. It is compiled with --enable-nvidia-gpus option. According to the documentation, when this option is used nodes file if automatically updated with the correct number of gpus. Is it possible to switch it off? I ask about that because I want to temporary limit...

                                  I have small installation of Toque 4.2.9. It is compiled with --enable-nvidia-gpus option. According to the documentation, when this option is used nodes file if automatically updated with the correct number of gpus. Is it possible to switch it off?

I ask about that because I want to temporary limit available resources. May be there is other way to achieve it?

tljm (11 rep)

Jul 22, 2016, 01:35 PM • Last activity: Aug 4, 2016, 09:23 AM

3 votes

2 answers

4944 views

Running shell job on remote server, closing terminal without closing job

linux bash shell ssh torque

I am running jobs on a remote server using torque. I currently have a an annoying problem. When I run my jobs this is what I currently do: 1. Log into another computer via Teamviewer 2. From that computer `ssh` into the remote server as so `ssh user@remoteserver.com` 3. Here i run my job script `sh...

                                  I am running jobs on a remote server using torque. I currently have a an annoying problem.

When I run my jobs this is what I currently do:

1. Log into another computer via Teamviewer
2. From that computer ssh into the remote server as so ssh user@remoteserver.com
3. Here i run my job script sh verycomplicatedrunscript.sh which echos to the user something like:

    I am now running job 1...

    I am now running job 2...

    I am now running job 3...

    I am now running job 4...

just to show that it is running and what it is running.

Only problem is that this takes several days as the jobs are somewhat large and there are thousands of them.

Now, I would like to bypass the Teamviewer step and just ssh into the remote from my own computer. But if I do that, and run my jobs script, and then close the terminal; it kills the process (hence why I am currently running it on the other computer which I can just leave running for days by leaving the terminal open).

Is there any way in which I can execute the script on the remote server, and then logout without it killing the execution of my jobs script?

Thanks

**SOLUTION**

I ended up using the screen option which is quite frankly amazing.

My jobs are now running with 

screen sh awesomejobscript.sh option1 option2 option3

Astrid (167 rep)

Jan 15, 2016, 09:56 AM • Last activity: Jan 17, 2016, 07:06 PM

2 votes

0 answers

128 views

equivalent to 'cpusets' for gpu's

limit torque

I work with clusters of computers and manage the nodes with [torque][1] and [Moab][2]. A user is able to submit a job to a node, and request the amount of resources they need. #The following submits the job foo.sh to 1 node, requesting 8 cores, and 1 gpu qsub foo.sh -l nodes=1:ppn=8:gpus=1 Because i...

                                  I work with clusters of computers and manage the nodes with torque  and Moab . 

A user is able to submit a job to a node, and request the amount of resources they need. 

    #The following submits the job foo.sh to 1 node, requesting 8 cores, and 1 gpu
    qsub foo.sh -l nodes=1:ppn=8:gpus=1

Because it is possible for a user to take more resources than requested, I've enabled the hwloc library  (cpusets) to keep users in check.  

From what I have found, there is no possible way to prevent a user from taking more gpu's than they have requested. 

**Is there a 'cpuset' equivalent  for gpu's?**

----------

*Resources*  

Moab Documentation 

Torque Documentation 

hwloc documentation

spuder (18573 rep)

Aug 5, 2013, 06:08 PM • Last activity: Nov 13, 2014, 07:49 PM

1 votes

0 answers

182 views

Submitting a script to TORQUE `qsub`: PBS lines before or after shebang line?

shell-script torque

I am submitting a script to a cluster that uses TORQUE. Should I add the `#PBS` lines before or after the shebang line?

                                  I am submitting a script to a cluster that uses TORQUE. Should I add the #PBS lines before or after the shebang line?
                                

a06e (1817 rep)

Sep 24, 2014, 03:04 PM • Last activity: Sep 26, 2014, 11:07 PM

0 votes

1 answers

179 views

slow down of mpi by Torque

mpi torque

I'm running Torque with Open MPI on a cluster with 30 nodes and 360 cores. I have found that the wall time of `mpirun -np N ~./myjob` and `qsub -l nodes=1:ppn=N mpirun -np N ~./myjob` differs many times. For small jobs it grows from 1.2 s to 20 s, from 2 s to 37 s and so on. For larger jobs the diff...

                                  I'm running Torque with Open MPI on a cluster with 30 nodes and 360 cores. 
I have found that the wall time of 
     mpirun -np N ~./myjob
and 
     qsub -l nodes=1:ppn=N mpirun -np N ~./myjob
differs many times. For  small jobs it grows from 1.2 s to 20 s, from 2 s to 37 s  and so on.  
For larger jobs the difference becomes important. How to overcome it? 
                                

Arsen (1 rep)

Jun 2, 2014, 03:18 PM • Last activity: Sep 26, 2014, 11:06 PM

4 votes

0 answers

1442 views

Requesting specific nodes with TORQUE qsub?

torque

There's a cluster with TORQUE qsub installed. I want to send a job, but I want to make sure that it runs on one of a specific set of nodes. Is it possible to request a list of possible nodes in `qsub`, so that the job is sent to one of the nodes in the requested set, never to a node outside the set?

                                  There's a cluster with TORQUE qsub installed. I want to send a job, but I want to make sure that it runs on one of a specific set of nodes.

Is it possible to request a list of possible nodes in qsub, so that the job is sent to one of the nodes in the requested set, never to a node outside the set?

a06e (1817 rep)

Sep 26, 2014, 12:51 PM • Last activity: Sep 26, 2014, 11:06 PM

Showing page 1 of 10 total questions