Hi,

The email is intended to follow the thread about

"Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch".




https://mail-archive.com/users@lists.open-mpi.org/msg30650.html


We have installed the latest version v2.0.2 on the cluster that

<https://mail-archive.com/users@lists.open-mpi.org/msg30654.html>Anastasia 
Kruchinina  were running.


It seems to me that the issue still is not fixed in v2.0.2.


The job script and sample codes can be found at


https://www.pdc.kth.se/~gongjing/files/test_spawn/


The messages we got


$ cat error_file.e



Currently Loaded Modulefiles:
[t03n06.pdc.kth.se:39767] OPAL ERROR: Timeout in file base/pmix_base_fns.c at 
line 193
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)


$ cat  output_file.o

--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------



Please let me know if you need additional information.


Thanks a lot for your help.


Regards, Jing Gong



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to