I believe Devendar Bureddy nailed the root cause. I am providing his excellent
analysis below:
>From Devendar:
with curiosity i looked at this issue. here's my 2 cents
I think issue is because of BTL components is opened&closed twice(ompi_init,
yoda) which leading to incorrect usage of var grou
If you weren't at the OMPI developer's meeting last week, you should look at
the wiki to see all the things that were discussed and the decisions that were
made:
https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting
One important change is that we decided for v1.7.3 to change OMPI's default
The proposed solution at the bottom is wrong. There aren't two different
BMLs, there's one, and it lives in OMPI.
The solution is to open the bml and btls in ompi_mpi_init and not in the
pmls. I checked, and the bml will deal with add_procs being called
multiple times on the same proc, so just m
I actually asked Ralph to make one more change. The default MCW rank ordering
was to match the mapping. So in the np>2 case, on a 2 socket system, we'd
order like this:
-
[savbu-usnic-a:01121] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
[BB/../../../../../../..][../../../../../../../..
So to be clear:
if nothing is specified, we map-by socket and rank-by slot
if map-by is specified, we will also rank-by as that seems to be
the user expectation (based on feedback on the user/devel lists)
On Dec 17, 2013, at 2:19 PM, Jeff Squyres (jsquyres) wrote:
> I actually asked Ralph
Usual place:
http://www.open-mpi.org/software/ompi/v1.7/
Please test and report problems - we want to release by end of week.
Thanks
Ralph