Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-05 Thread r...@open-mpi.org
> On Sep 5, 2016, at 11:25 AM, George Bosilca wrote: > > Thanks for all these suggestions. I could get the expected bindings by 1) > removing the vm and 2) adding hetero. This is far from an ideal setting, as > now I have to make my own machinefile for every single run, or spawn daemons > on

Re: [OMPI devel] Question about Open MPI bindings

2016-09-05 Thread George Bosilca
Indeed. As indicated on the other thread if I add the novm and hetero and specify both the --bind-to and --map-by I get the expected behavior. Thanks, George. On Mon, Sep 5, 2016 at 2:14 PM, r...@open-mpi.org wrote: > I didn’t define the default behaviors - I just implemented what everyone >

Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-05 Thread George Bosilca
Thanks for all these suggestions. I could get the expected bindings by 1) removing the vm and 2) adding hetero. This is far from an ideal setting, as now I have to make my own machinefile for every single run, or spawn daemons on all the machines on the cluster. Wouldn't it be useful to make the d

Re: [OMPI devel] Question about Open MPI bindings

2016-09-05 Thread r...@open-mpi.org
I didn’t define the default behaviors - I just implemented what everyone said they wanted, as eventually captured in a Google spreadsheet Jeff posted (and was available and discussed for weeks before implemented). So the defaults are: * if np <= 2, we map-by core bind-to core * if np > 2, we ma

Re: [OMPI devel] Question about Open MPI bindings

2016-09-05 Thread George Bosilca
On Sat, Sep 3, 2016 at 10:34 AM, r...@open-mpi.org wrote: > Interesting - well, it looks like ORTE is working correctly. The map is > what you would expect, and so is planned binding. > > What this tells us is that we are indeed binding (so far as ORTE is > concerned) to the correct places. Rank

Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-03 Thread r...@open-mpi.org
Ah, indeed - if the node where mpirun is executing doesn’t match the compute nodes, then you must remove that --novm option. Otherwise, we have no way of knowing what the compute node topology looks like. > On Sep 3, 2016, at 4:13 PM, Gilles Gouaillardet > wrote: > > George, > > If i unders

Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-03 Thread Gilles Gouaillardet
George, If i understand correctly, you are running mpirun on dancer, which has 2 sockets, 4 cores per socket and 2 hwthreads per core, and orted are running on arc[00-08], though the tasks only run on arc00, which has 2 sockets, 10 cores per socket and 2 hwthreads per core to me, it looks like O

Re: [OMPI devel] Question about Open MPI bindings

2016-09-03 Thread r...@open-mpi.org
Interesting - well, it looks like ORTE is working correctly. The map is what you would expect, and so is planned binding. What this tells us is that we are indeed binding (so far as ORTE is concerned) to the correct places. Rank 0 is being bound to 0,8, and that is what the OS reports. Rank 1 i

Re: [OMPI devel] Question about Open MPI bindings

2016-09-03 Thread George Bosilca
$mpirun -np 3 --tag-output --bind-to core --report-bindings --display-devel-map --mca rmaps_base_verbose 10 true [dancer.icl.utk.edu:17451] [[41198,0],0]: Final mapper priorities > [dancer.icl.utk.edu:17451] Mapper: ppr Priority: 90 > [dancer.icl.utk.edu:17451] Mapper: seq Priority: 60 >

Re: [OMPI devel] Question about Open MPI bindings

2016-09-03 Thread r...@open-mpi.org
Okay, can you add --display-devel-map --mca rmaps_base_verbose 10 to your cmd line? It sounds like there is something about that topo that is bothering the mapper > On Sep 2, 2016, at 9:31 PM, George Bosilca wrote: > > Thanks Gilles, that's a very useful trick. The bindings reported by ORTE ar

[OMPI devel] Question about Open MPI bindings

2016-09-02 Thread Gilles Gouaillardet
George, did you mean to write *not* in sync instead ? note the ORTE output is different than the one you posted earlier (though btl were differents) as far as I understand, CPUs_allowed_list should really be 0,20 and 10,30 and in order to match the ORTE output, they should be 0,4 and 10,14 Ch

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
Thanks Gilles, that's a very useful trick. The bindings reported by ORTE are in sync with the one reported by the OS. $ mpirun -np 2 --tag-output --bind-to core --report-bindings grep Cpus_allowed_list /proc/self/status [1,0]:[arc00:90813] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
On Sat, Sep 3, 2016 at 12:18 AM, r...@open-mpi.org wrote: > I’ll dig more later, but just checking offhand, I can’t replicate this on > my box, so it may be something in hwloc for that box (or maybe you have > some MCA params set somewhere?): > Yes, I have 2 MCA parameters set (orte_default_host

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread Gilles Gouaillardet
George, I cannot help much with this i am afraid My best bet would be to rebuild OpenMPI with --enable-debug and an external recent hwloc (iirc hwloc v2 cannot be used in Open MPI yet) You might also want to try mpirun --tag-output --bind-to xxx --report-bindings grep Cpus_allowed_list /proc/s

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread r...@open-mpi.org
I’ll dig more later, but just checking offhand, I can’t replicate this on my box, so it may be something in hwloc for that box (or maybe you have some MCA params set somewhere?): $ mpirun -n 2 --bind-to core --report-bindings hostname [rhc001:83938] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:

[OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
While investigating the ongoing issue with OMPI messaging layer, I run into some troubles with process binding. I read the documentation, but I still find this puzzling. Disclaimer: all experiments were done with current master (9c496f7) compiled in optimized mode. The hardware: a single node 20 c