I think you have confused “slot” with a physical “core”. The two have 
absolutely nothing to do with each other.

A “slot” is nothing more than a scheduling entry in which a process can be 
placed. So when you --rank-by slot, the ranks are assigned round-robin by 
scheduler entry - i.e., you assign all the ranks on the first node, then assign 
all the ranks on the next node, etc.

It doesn’t matter where those ranks are placed, or what core or socket they are 
running on. We just blindly go thru and assign numbers.

If you rank-by core, then we cycle across the procs by looking at the core 
number they are bound to, assigning all the procs on a node before moving to 
the next node. If you rank-by socket, then you cycle across the procs on a node 
by round-robin of sockets, assigning all procs on the node before moving to the 
next node. If you then added “span” to that directive, we’d round-robin by 
socket across all nodes before circling around to the next proc on this node.

HTH
Ralph


> On Nov 30, 2016, at 11:26 AM, David Shrader <dshra...@lanl.gov> wrote:
> 
> Hello All,
> 
> The man page for mpirun says that the default ranking procedure is 
> round-robin by slot. It doesn't seem to be that straight-forward to me, 
> though, and I wanted to ask about the behavior.
> 
> To help illustrate my confusion, here are a few examples where the ranking 
> behavior changed based on the mapping behavior, which doesn't make sense to 
> me, yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores 
> each):
> 
> $> mpirun -n 128 --map-by core --report-bindings true
> [gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119614] MCW rank 2 bound to socket 0[core 2[hwt 0]]: 
> [././B/././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> Things look as I would expect: ranking happens round-robin through the cpu 
> cores. Now, here's a map by socket example:
> 
> $> mpirun -n 128 --map-by socket --report-bindings true
> [gr0649.localdomain:119926] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119926] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
> [./././././././././././././././././.][B/././././././././././././././././.]
> [gr0649.localdomain:119926] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> Why is rank 1 on a different socket? I know I am mapping by socket in this 
> example, but, fundamentally, nothing should really be different in terms of 
> ranking, correct? The same number of processes are available on each host as 
> in the first example, and available in the same locations. How is "slot" 
> different in this case? If I use "--rank-by core," I recover the output from 
> the first example.
> 
> I thought that maybe "--rank-by slot" might be following something laid down 
> by "--map-by", but the following example shows that isn't completely correct, 
> either:
> 
> $> mpirun -n 128 --map-by socket:span --report-bindings true
> [gr0649.localdomain:119319] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119319] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
> [./././././././././././././././././.][B/././././././././././././././././.]
> [gr0649.localdomain:119319] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> If ranking by slot were somehow following something left over by mapping, I 
> would have expected rank 2 to end up on a different host. So, now I don't 
> know what to expect from using "--rank-by slot." Does anyone have any 
> pointers?
> 
> Thank you for the help!
> David
> 
> -- 
> David Shrader
> HPC-ENV High Performance Computer Systems
> Los Alamos National Lab
> Email: dshrader <at> lanl.gov
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to