
Sorry to make you resort to divination.   My sbatch command is as follows:

sbatch --ntasks-per-node=24 --nodes=16 --ntasks=384  --job-name $job_name  
--exclusive --no-kill --verbose $release_dir/script.bash &

--mpi=pmix isn’t an option recognized by sbatch.   Is there an alternative?   
The slurm doc you mentioned has the following paragraph.  Is it still true with 
OpenMpi 4.1.5?

“NOTE: OpenMPI has a limitation that does not support calls to MPI_Comm_spawn() 
from within a Slurm allocation. If you need to use the MPI_Comm_spawn() 
function you will need to use another MPI implementation combined with PMI-2 
since PMIx doesn't support it either.”

I use MPI_Comm_spawn extensively in my application.


Hi Kurt,

Without knowing your exact MPI launch command, my cristal orb thinks you might 
want to try the -mpi=pmix flag for srun as documented for slurm+openmpi:

My job immediately crashes with the error message below.   I don’t know where 
to begin looking for the cause

of the error, or what information to provide to help you understand it.   Maybe 
you could clue me in 😊.

I am on RedHat 4.18.0, using Slurm 20.11.8 and OpenMPI 4.1.5 compiled with gcc 

I built OpenMPI with the following  “configure” command:

./configure --prefix=/opt/openmpi/4.1.5_gnu --with-slurm --enable-debug

WARNING: Open MPI accepted a TCP connection from what appears to be a

another Open MPI process but cannot find a corresponding process

entry for that peer.

This attempted connection will be ignored; your MPI job may or may not

continue properly.

  Local host: n001

  PID:        985481

