Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Barrett, Brian via devel
Ok, that makes total sense. I'm leaning towards us fixing this in the OFI MTL rather than making everyone load. I agree with you that it probably doesn't matter, but let's not create a corner case. I'm also going to follow up with the dev who wrote this code, but my guess is that we should

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Ralph Castain via devel
If you call "hwloc_topology_load", then hwloc merrily does its discovery and slams many-core systems. If you call "opal_hwloc_get_topology", then that is fine - it checks if we already have it, tries to get it from PMIx (using shared mem for hwloc 2.x), and only does the discovery if no other

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Barrett, Brian via devel
But does raise the question; should we call get_topology() for belt and suspenders in OFI? Or will that cause your concerns from the start of this thread? Brian From: Ralph Castain Date: Friday, March 20, 2020 at 9:31 AM To: OpenMPI Devel Cc: "Barrett, Brian" Subject: RE: [EXTERNAL] [OMPI

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Ralph Castain via devel
https://github.com/open-mpi/ompi/pull/7547 fixes it and has an explanation as to why it wasn't catching us elsewhere in the MPI code On Mar 20, 2020, at 9:22 AM, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Odd - the topology object gets filled in during init, well before

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Ralph Castain via devel
Odd - the topology object gets filled in during init, well before the fence (as it doesn't need the fence, being a purely local op). Let me take a look > On Mar 20, 2020, at 9:15 AM, Barrett, Brian wrote: > > PMIx folks - > > When using mpirun for launching, it looks like opal_hwloc_topology

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Barrett, Brian via devel
PMIx folks - When using mpirun for launching, it looks like opal_hwloc_topology isn't filled in at the point where we need the information (mtl_ofi_component_init()). This would end up being before the modex fence, since the goal is to figure out which address the process should publish. I'm