Hi,

I’m using OpenMPI 4.0.4, where the Fortran side has been compiled with
Intel Fortran suite v17.0.6 and the C/C++ side with GNU suite v4.8.5 due to
requirements I cannot modify.

I am trying to parallelise a Fortran application by dynamically creating
processes on the fly with “MPI_Comm_Spawn” subroutine. The application
starts with only one parent and it takes a file throughout standard input,
but it looks like children are not inheriting the access to such file. I
would like to have all children processes inheriting the standard input of
the parent. I’m aware that perhaps the “-stdin all” argument of the
“mpirun“ binary might do it, but I am attempting to execute the binary
without mpirun unless strictly necessary.

So far, I have already tried to pass a non-null “MPI_Info“ to
“MPI_Comm_Spawn” with a key of “ompi_stdin_target“ and a value of “all” but
it does not work. I have also tried other values like none, 0, 1, -1, etc.)
without success either.

Here is the subroutine provoking the error at the MPI_Comm_spawn call:

===========================================================
SUBROUTINE ADmn_createSpawn(iNumberChilds, iCommBigWorld, iIDMpi,
 iNumberProcess)
    IMPLICIT NONE

    !ID of the communicator that contains all the process
    INTEGER:: iCommBigWorld
    !Number of child process
    INTEGER :: iNumberChilds
    INTEGER:: iNumberProcess
     CHARACTER(LEN=1)                         :: arguments(1)
     INTEGER                                  :: bigWorld, iC, iInic,
iFinal;
     INTEGER                                  :: ierror
      INTEGER                                  :: iIDFamiliar=0;

    CHARACTER(LEN=128)        :: command
    INTEGER                               :: iInfoMPI
    CHARACTER(LEN=*), Parameter  :: key=" ompi_stdin_target ",valueMPI=
"all";
    logical :: FLAG

    !Id number of the current process
    INTEGER :: iIDMpi

    CALL GET_COMMAND_ARGUMENT(0, command)
    CALL MPI_Comm_get_parent(iParent, ierror)

    IF (iParent .EQ. MPI_COMM_NULL) THEN
        arguments(1) = ''
        iIDFamiliar = 0;

        call MPI_INFO_CREATE(iInfoMPI, ierror)
        call MPI_INFO_SET(iInfoMPI, key, valueMPI, ierror)

        CALL MPI_Comm_spawn(command, arguments, iNumberChilds, iInfoMPI, 0,
MPI_COMM_WORLD, iChild, iSpawn_error, ierror)

        CALL MPI_INTERCOMM_MERGE(iChild, .false., iCommBigWorld, ierror)
    ELSE
        call MPI_COMM_RANK(MPI_COMM_WORLD, iIDFamiliar, ierror)

        iIDFamiliar = iIDFamiliar + 1;

        CALL MPI_INTERCOMM_MERGE(iParent, .true., iCommBigWorld, ierror)
    END IF

    CALL MPI_COMM_RANK(iCommBigWorld,iIDMpi,ierror)
    call MPI_COMM_SIZE(iCommBigWorld, intasks, ierror)
    iProcessIDInternal = iIDMpi
    iNumberProcess = intasks

END SUBROUTINE ADmn_createSpawn
===========================================================

Binary is executed as:
Binaryname.bin < inputfilename.dat
And here is the segmentation fault produced when passing the MPI_Info
variable to MPI_Comm_Spawn:

===========================================================
[sles12sp3-srv:10384] *** Process received signal ***
[sles12sp3-srv:10384] Signal: Segmentation fault (11)
[sles12sp3-srv:10384] Signal code: Address not mapped (1)
[sles12sp3-srv:10384] Failing at address: 0xfffffffe
[sles12sp3-srv:10384] [ 0] /lib64/libpthread.so.0(+0x10c10)[0x7fc6a8dd5c10]
[sles12sp3-srv:10384] [ 1]
/usr/local/lib64/libopen-rte.so.40(pmix_server_spawn_fn+0x1052)[0x7fc6aa283232]
[sles12sp3-srv:10384] [ 2]
/usr/local/lib64/openmpi/mca_pmix_pmix3x.so(+0x46210)[0x7fc6a602b210]
[sles12sp3-srv:10384] [ 3]
/usr/local/lib64/openmpi/mca_pmix_pmix3x.so(pmix_server_spawn+0x7c6)[0x7fc6a60a5ab6]
[sles12sp3-srv:10384] [ 4]
/usr/local/lib64/openmpi/mca_pmix_pmix3x.so(+0xb1a2f)[0x7fc6a6096a2f]
[sles12sp3-srv:10384] [ 5]
/usr/local/lib64/openmpi/mca_pmix_pmix3x.so(pmix_server_message_handler+0x41)[0x7fc6a6097511]
[sles12sp3-srv:10384] [ 6]
/usr/local/lib64/openmpi/mca_pmix_pmix3x.so(OPAL_MCA_PMIX3X_pmix_ptl_base_process_msg+0x1bf)[0x7fc6a610481f]
[sles12sp3-srv:10384] [ 7]
/usr/local/lib64/libopen-pal.so.40(opal_libevent2022_event_base_loop+0x8fc)[0x7fc6a9facd6c]
[sles12sp3-srv:10384] [ 8]
/usr/local/lib64/openmpi/mca_pmix_pmix3x.so(+0xcf7ce)[0x7fc6a60b47ce]
[sles12sp3-srv:10384] [ 9] /lib64/libpthread.so.0(+0x8724)[0x7fc6a8dcd724]
[sles12sp3-srv:10384] [10] /lib64/libc.so.6(clone+0x6d)[0x7fc6a8b0ce8d]
[sles12sp3-srv:10384] *** End of error message ***

===========================================================

Do you have any idea about what might be happening?

Thank you in advance!

Best regards,
Álvaro

Reply via email to