Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Boris Bierbaum
I thought about it again: There's probably no call to dat_ep_query() *because* it returns wrong port numbers and the port numbers saved by the uDAPL BTL code itself are used. I'll leave the debugging to those who know the code ... ;-) Boris Andrew Friedley wrote: > OK, strange but good. Yeah I

Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Donald Kerr
Looking at that section it appears that we store the port value locally in udapl_addr and use the local copy, so changing the udapl attribute may not be doing anything for the BTL. I will run some tests as well. -DON Andrew Friedley wrote: OK, strange but good. Yeah I wouldn't be surprised

Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Andrew Friedley
OK, strange but good. Yeah I wouldn't be surprised if something has been changed, though I wouldn't know what, and I don't have time right now to go digging :( Maybe Don Kerr knows something? Andrew Boris Bierbaum wrote: I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 pr

Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Boris Bierbaum
I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 processes per node and --mca btl udapl,self. I didn't encouter any problems. The comment above line 197 says that dat_ep_query() returns wrong port numbers (which it does indeed), but I can't find any call to dat_ep_query() in the

Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Andrew Friedley
You say that fixes the problem, does it work even when running more than one MPI process per node? (that is the case the hack fixes) Simply doing an mpirun with a -np paremeter higher than the number of nodes you have set up should trigger this case, and making sure to use '-mca btl udapl,self

Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Boris Bierbaum
It has been explained in a different thread on [ofa-general] that the problem lies in a combination of the OpenIB-cma provider not setting the local and remote port numbers on endpoints correctly and Open MPI stepping over the IA to save the port number to circumvent this problem, thereby confusing