Hi,

I am trying to use MPI_Comm_spawn function in my code. I am having trouble
with openmpi 2.0.x + sbatch (batch system Slurm).
My test program is located here:
http://user.it.uu.se/~anakr367/files/MPI_test/

When I am running my code I am getting an error:

OPAL ERROR: Timeout in file
../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 193
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   ompi_dpm_dyn_init() failed
   --> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------

The interesting thing is that there is no error when I am firstly
allocating nodes with salloc and then run my program. So, I noticed that
the program works fine using openmpi 1.x+sbach/salloc or openmpi
2.0.x+salloc but not openmpi 2.0.x+sbatch.

The error was reproduced on three different computer clusters.

Best regards,
Anastasia
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to