It might call disconnect more than once if it creates multiple communicators. 
Here’s another test case for that behavior:

Attachment: intercomm_create.c
Description: Binary data



> On Jun 4, 2018, at 7:08 AM, Bennet Fauber <ben...@umich.edu> wrote:
> 
> Just out of curiosity, but would using Rmpi and/or doMPI help in any way?
> 
> -- bennet
> 
> 
> On Mon, Jun 4, 2018 at 10:00 AM, marcin.krotkiewski
> <marcin.krotkiew...@gmail.com> wrote:
>> Thanks, Ralph!
>> 
>> Your code finishes normally, I guess then the reason might be lying in R.
>> Running the R code with -mca pmix_base_verbose 1 i see that each rank calls
>> ext2x:client disconnect twice (each PID prints the line twice)
>> 
>> [...]
>>    3 slaves are spawned successfully. 0 failed.
>> [localhost.localdomain:11659] ext2x:client disconnect
>> [localhost.localdomain:11661] ext2x:client disconnect
>> [localhost.localdomain:11658] ext2x:client disconnect
>> [localhost.localdomain:11646] ext2x:client disconnect
>> [localhost.localdomain:11658] ext2x:client disconnect
>> [localhost.localdomain:11659] ext2x:client disconnect
>> [localhost.localdomain:11661] ext2x:client disconnect
>> [localhost.localdomain:11646] ext2x:client disconnect
>> 
>> In your example it's only called once per process.
>> 
>> Do you have any suspicion where the second call comes from? Might this be
>> the reason for the hang?
>> 
>> Thanks!
>> 
>> Marcin
>> 
>> 
>> On 06/04/2018 03:16 PM, r...@open-mpi.org wrote:
>> 
>> Try running the attached example dynamic code - if that works, then it
>> likely is something to do with how R operates.
>> 
>> 
>> 
>> 
>> 
>> On Jun 4, 2018, at 3:43 AM, marcin.krotkiewski
>> <marcin.krotkiew...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> I have some problems running R + Rmpi with OpenMPI 3.1.0 + PMIx 2.1.1. A
>> simple R script, which starts a few tasks, hangs at the end on diconnect.
>> Here is the script:
>> 
>> library(parallel)
>> numWorkers <- as.numeric(Sys.getenv("SLURM_NTASKS")) - 1
>> myCluster <- makeCluster(numWorkers, type = "MPI")
>> stopCluster(myCluster)
>> 
>> And here is how I run it:
>> 
>> SLURM_NTASKS=5 mpirun -np 1 -mca pml ^yalla -mca mtl ^mxm -mca coll ^hcoll R
>> --slave < mk.R
>> 
>> Notice -np 1 - this is apparently how you start Rmpi jobs: ranks are spawned
>> by R dynamically inside the script. So I ran into a number of issues here:
>> 
>> 1. with HPCX it seems that dynamic starting of ranks is not supported, hence
>> I had to turn off all of yalla/mxm/hcoll
>> 
>> --------------------------------------------------------------------------
>> Your application has invoked an MPI function that is not supported in
>> this environment.
>> 
>>  MPI function: MPI_Comm_spawn
>>  Reason:       the Yalla (MXM) PML does not support MPI dynamic process
>> functionality
>> --------------------------------------------------------------------------
>> 
>> 2. when I do that, the program does create a 'cluster' and starts the ranks,
>> but hangs in PMIx at MPI Disconnect. Here is the top of the trace from gdb:
>> 
>> #0  0x00007f66b1e1e995 in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x00007f669eaeba5b in PMIx_Disconnect (procs=procs@entry=0x2e25d20,
>> nprocs=nprocs@entry=10, info=info@entry=0x0, ninfo=ninfo@entry=0) at
>> client/pmix_client_connect.c:232
>> #2  0x00007f669ed6239c in ext2x_disconnect (procs=0x7ffd58322440) at
>> ext2x_client.c:1432
>> #3  0x00007f66a13bc286 in ompi_dpm_disconnect (comm=0x2cc0810) at
>> dpm/dpm.c:596
>> #4  0x00007f66a13e8668 in PMPI_Comm_disconnect (comm=0x2cbe058) at
>> pcomm_disconnect.c:67
>> #5  0x00007f66a16799e9 in mpi_comm_disconnect () from
>> /cluster/software/R-packages/3.5/Rmpi/libs/Rmpi.so
>> #6  0x00007f66b2563de5 in do_dotcall () from
>> /cluster/software/R/3.5.0/lib64/R/lib/libR.so
>> #7  0x00007f66b25a207b in bcEval () from
>> /cluster/software/R/3.5.0/lib64/R/lib/libR.so
>> #8  0x00007f66b25b0fd0 in Rf_eval.localalias.34 () from
>> /cluster/software/R/3.5.0/lib64/R/lib/libR.so
>> #9  0x00007f66b25b2c62 in R_execClosure () from
>> /cluster/software/R/3.5.0/lib64/R/lib/libR.so
>> 
>> Might this also be related to the dynamic rank creation in R?
>> 
>> Thanks!
>> 
>> Marcin
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to