mpirun kill parallel processes when lose internet connection ssh
0
votes
1
answer
139
views
When I'm connected via ssh and are parallel processes running and it loses the internet connection all parallel processes. When I reconnect I find the following message in log file:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 12 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode 15.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
0:Terminate signal was sent, status=: 15
(rank:0 hostname: pid:2953):ARMCI DASSERT fail. ../../ga-5-4/armci/src/common/signaltrap.c:SigTermHandler():477 cond:0
Distribution
> Description: Ubuntu 16.04.6 LTS Release:
> 16.04 Codename: xenial
How can I prevent this crash?
Asked by gvd
(111 rep)
Aug 1, 2022, 08:32 PM
Last activity: Aug 1, 2022, 09:44 PM
Last activity: Aug 1, 2022, 09:44 PM