> On Sep 5, 2016, at 11:25 AM, George Bosilca wrote:
>
> Thanks for all these suggestions. I could get the expected bindings by 1)
> removing the vm and 2) adding hetero. This is far from an ideal setting, as
> now I have to make my own machinefile for every single run, or spawn daemons
> on
Indeed. As indicated on the other thread if I add the novm and hetero and
specify both the --bind-to and --map-by I get the expected behavior.
Thanks,
George.
On Mon, Sep 5, 2016 at 2:14 PM, r...@open-mpi.org wrote:
> I didn’t define the default behaviors - I just implemented what everyone
>
Thanks for all these suggestions. I could get the expected bindings by 1)
removing the vm and 2) adding hetero. This is far from an ideal setting, as
now I have to make my own machinefile for every single run, or spawn
daemons on all the machines on the cluster.
Wouldn't it be useful to make the d
I didn’t define the default behaviors - I just implemented what everyone said
they wanted, as eventually captured in a Google spreadsheet Jeff posted (and
was available and discussed for weeks before implemented). So the defaults are:
* if np <= 2, we map-by core bind-to core
* if np > 2, we ma
On Sat, Sep 3, 2016 at 10:34 AM, r...@open-mpi.org wrote:
> Interesting - well, it looks like ORTE is working correctly. The map is
> what you would expect, and so is planned binding.
>
> What this tells us is that we are indeed binding (so far as ORTE is
> concerned) to the correct places. Rank
Ah, indeed - if the node where mpirun is executing doesn’t match the compute
nodes, then you must remove that --novm option. Otherwise, we have no way of
knowing what the compute node topology looks like.
> On Sep 3, 2016, at 4:13 PM, Gilles Gouaillardet
> wrote:
>
> George,
>
> If i unders
George,
If i understand correctly, you are running mpirun on dancer, which has
2 sockets, 4 cores per socket and 2 hwthreads per core,
and orted are running on arc[00-08], though the tasks only run on arc00, which
has
2 sockets, 10 cores per socket and 2 hwthreads per core
to me, it looks like O
Interesting - well, it looks like ORTE is working correctly. The map is what
you would expect, and so is planned binding.
What this tells us is that we are indeed binding (so far as ORTE is concerned)
to the correct places. Rank 0 is being bound to 0,8, and that is what the OS
reports. Rank 1 i
$mpirun -np 3 --tag-output --bind-to core --report-bindings
--display-devel-map --mca rmaps_base_verbose 10 true
[dancer.icl.utk.edu:17451] [[41198,0],0]: Final mapper priorities
> [dancer.icl.utk.edu:17451] Mapper: ppr Priority: 90
> [dancer.icl.utk.edu:17451] Mapper: seq Priority: 60
>
Okay, can you add --display-devel-map --mca rmaps_base_verbose 10 to your cmd
line?
It sounds like there is something about that topo that is bothering the mapper
> On Sep 2, 2016, at 9:31 PM, George Bosilca wrote:
>
> Thanks Gilles, that's a very useful trick. The bindings reported by ORTE ar
George,
did you mean to write *not* in sync instead ?
note the ORTE output is different than the one you posted earlier
(though btl were differents)
as far as I understand, CPUs_allowed_list should really be
0,20
and
10,30
and in order to match the ORTE output, they should be
0,4
and
10,14
Ch
Thanks Gilles, that's a very useful trick. The bindings reported by ORTE
are in sync with the one reported by the OS.
$ mpirun -np 2 --tag-output --bind-to core --report-bindings grep
Cpus_allowed_list /proc/self/status
[1,0]:[arc00:90813] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket 0[core
On Sat, Sep 3, 2016 at 12:18 AM, r...@open-mpi.org wrote:
> I’ll dig more later, but just checking offhand, I can’t replicate this on
> my box, so it may be something in hwloc for that box (or maybe you have
> some MCA params set somewhere?):
>
Yes, I have 2 MCA parameters set (orte_default_host
George,
I cannot help much with this i am afraid
My best bet would be to rebuild OpenMPI with --enable-debug and an external
recent hwloc (iirc hwloc v2 cannot be used in Open MPI yet)
You might also want to try
mpirun --tag-output --bind-to xxx --report-bindings grep Cpus_allowed_list
/proc/s
I’ll dig more later, but just checking offhand, I can’t replicate this on my
box, so it may be something in hwloc for that box (or maybe you have some MCA
params set somewhere?):
$ mpirun -n 2 --bind-to core --report-bindings hostname
[rhc001:83938] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
While investigating the ongoing issue with OMPI messaging layer, I run into
some troubles with process binding. I read the documentation, but I still
find this puzzling.
Disclaimer: all experiments were done with current master (9c496f7)
compiled in optimized mode. The hardware: a single node 20 c
16 matches
Mail list logo