On May 28, 2014, at 1:18 AM, Gilles Gouaillardet
wrote:
> i finally got it :-)
Hooray! Thanks for digging deeper.
>
> /* i previously got it "almost" right ... */
>
> here is what happens on job 2 (with trunk) :
> MPI_Intercomm_create calls ompi_comm_get_rprocs that calls ompi_proc_unpack
>
i finally got it :-)
/* i previously got it "almost" right ... */
here is what happens on job 2 (with trunk) :
MPI_Intercomm_create calls ompi_comm_get_rprocs that calls ompi_proc_unpack
=> ompi_proc_unpack store job 3 info into opal_dstore_peer
then ompi_comm_get_rprocs calls ompi_proc_set_loc
Hmmm...I did some digging, and the best I can tell is that root cause is that
the second job ("b" in the test program) is never actually calling
connect_accept! This looks like a change may have occurred in Intercomm_create
that is causing it to not recognize the need to do so.
Anyone confirm
Hi Gilles
I concur on the typo and fixed it - thanks for catching it. I'll have to look
into the problem you reported as it has been fixed in the past, and was working
last I checked it. The info required for this 3-way connect/accept is supposed
to be in the modex provided by the common commun
Folks,
while debugging the dynamic/intercomm_create from the ibm test suite, i
found something odd.
i ran *without* any batch manager on a VM (one socket and four cpus)
mpirun -np 1 ./dynamic/intercomm_create
it hangs by default
it works with --mca coll ^ml
basically :
- task 0 spawns task 1
-