For some reason when launching an mpi job with SLURM on CentOS8 cluster, slurm ties mpi processes to CPUs always starting from CPU0.
Say there are 128 CPU cores on a compute node. I launch mpi job asking for 64 CPUs on that node. Fine, it gets allocated on first 64 cores (1st socket) and runs there fine.
Now if i submit another 64-CPU mpi job to the same node, SLURM places it again on 1st socket, so CPUs 0-63 are used by both jobs, but CPUs 64-127 of the 2nd socket are not used at all.
Played with various mpi parameters to no avail. The only way I was able to assign 2jobs to different sockets is when using rank files with openmpi. But that should not be necessary if SLURM works correctly.
Consumable resources in SLURM are CR_Core. TaskPlugin=task/affinity.
If I run the same 2 x mpi code on the same node without SLURM, the same openmpi allocates CPUs correctly.
What can make SLURM to behave in such a bizarre way?
Asked by Alex P
(81 rep)
Apr 14, 2021, 08:03 PM
Last activity: Apr 14, 2021, 08:37 PM
Last activity: Apr 14, 2021, 08:37 PM