Hi Kurt,

Without knowing your exact MPI launch command, my cristal orb thinks you might 
want to try the -mpi=pmix flag for srun as documented for slurm+openmpi:
https://slurm.schedmd.com/mpi_guide.html#open_mpi

-Joachim
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Mccall, Kurt E. 
(MSFC-EV41) via users <users@lists.open-mpi.org>
Sent: Thursday, June 15, 2023 11:56:28 PM
To: users@lists.open-mpi.org <users@lists.open-mpi.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov>
Subject: [OMPI users] OpenMPI crashes with TCP connection error


My job immediately crashes with the error message below.   I don’t know where 
to begin looking for the cause

of the error, or what information to provide to help you understand it.   Maybe 
you could clue me in 😊.



I am on RedHat 4.18.0, using Slurm 20.11.8 and OpenMPI 4.1.5 compiled with gcc 
8.5.0.

I built OpenMPI with the following  “configure” command:



./configure --prefix=/opt/openmpi/4.1.5_gnu --with-slurm --enable-debug







WARNING: Open MPI accepted a TCP connection from what appears to be a

another Open MPI process but cannot find a corresponding process

entry for that peer.



This attempted connection will be ignored; your MPI job may or may not

continue properly.



  Local host: n001

  PID:        985481




Reply via email to