Sample Header Ad - 728x90

mpirun kill parallel processes when lose internet connection ssh

0 votes
1 answer
139 views
When I'm connected via ssh and are parallel processes running and it loses the internet connection all parallel processes. When I reconnect I find the following message in log file: -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 12 in communicator MPI COMMUNICATOR 4 DUP FROM 0 with errorcode 15. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- 0:Terminate signal was sent, status=: 15 (rank:0 hostname: pid:2953):ARMCI DASSERT fail. ../../ga-5-4/armci/src/common/signaltrap.c:SigTermHandler():477 cond:0 Distribution > Description: Ubuntu 16.04.6 LTS Release: > 16.04 Codename: xenial How can I prevent this crash?
Asked by gvd (111 rep)
Aug 1, 2022, 08:32 PM
Last activity: Aug 1, 2022, 09:44 PM