I'm running into problems trying to spawn MPI processes across multiple nodes 
on a cluster using recent versions of OpenMPI. Specifically, using the attached 
Fortan code, compiled using OpenMPI 3.1.2 with:

mpif90 test.F90 -o test.exe

and run via a PBS scheduler using the attached test1.pbs, it fails as can be 
seen in the attached testFAIL.err file. 

If I do the same but using OpenMPI v1.10.3 then it works successfully, giving 
me the output in the attached testSUCCESS.err file.

>From testing a few different versions of OpenMPI it seems that the behavior 
changed between v1.10.7 and v2.0.4. 

Is there some change in options needed to make this work with newer OpenMPIs?

Output from omp_info --all is attached. config.log can be found here:

http://users.obs.carnegiescience.edu/abenson/config.log.bz2

Thanks for any help you can offer!

-Andrew

Attachment: ompi_info.log.bz2
Description: application/bzip

program test
  use MPI
  implicit none
  integer                                             :: status             , spawnStatus         (16), &
       &                                                 childCommunicator  , rank                    , &
       &                                                 parentCommunicator , mpiSize                 , &
       &                                                 processorNameLength, mpiThreadingProvided
  character(len=MPI_Max_Processor_Name), dimension(1) :: processorName
  call MPI_Init_Thread       (MPI_Thread_Multiple,mpiThreadingProvided,status)
  call MPI_Comm_Rank         (MPI_Comm_World     ,rank                ,status)
  call MPI_Comm_Size         (MPI_Comm_World     ,mpiSize             ,status)
  call MPI_Comm_Get_Parent   (parentCommunicator                      ,status)
  call MPI_Get_Processor_Name(processorName(1)   ,processorNameLength ,status)
  if (parentCommunicator == MPI_Comm_Null) then
     write (0,*) "parent process: rank, size, processor name = ",rank,mpiSize,trim(processorName(1))
     call MPI_Comm_Spawn('test.exe',[''],16,MPI_INFO_NULL,0,MPI_Comm_World,childCommunicator,spawnStatus,status)
     call MPI_Barrier  (childCommunicator,status)
     write (0,*) "parent passed interbarrier: rank = ",rank
     call MPI_Comm_Free(childCommunicator,status)
  else
     write (0,*) " child process: rank, size, processor name = ",rank,mpiSize,trim(processorName(1))
     call MPI_Barrier(MPI_Comm_World,status)
     write (0,*) " child passed intrabarrier: rank = ",rank
     call MPI_Barrier(parentCommunicator,status)
     write (0,*) " child passed interbarrier: rank = ",rank
  end if
  call MPI_Finalize(status)
end program

Attachment: test1.pbs
Description: application/shellscript

Attachment: testFAIL.err.bz2
Description: application/bzip

Attachment: testSUCCESS.err.bz2
Description: application/bzip

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to