I should have said, "I would like to run 128 MPI processes on 2 nodes" and
not 64 like I initially said...

On Sat, 27 Feb 2021, 15:03 Luis Cebamanos, <luic...@gmail.com> wrote:

> Hello OMPI users,
>
> On 128 core nodes, 2 sockets x 64 cores/socket (2 hwthreads/core) , I am
> trying to match the behavior of running with a rankfile with manual
> mapping/ranking/binding.
>
> I would like to run 64 MPI processes on 2 nodes, 1 MPI process every 2
> cores. This is, I want to run 32 MPI processes per socket on 2 128-core
> nodes. My mapping should be something like:
>
> Node 0
> =====
> rank 0  -  core 0
> rank 1  -  core 2
> rank 3 -   core 4
> ...
> rank 63 - core 126
>
>
> Node 1
> ====
> rank 64  -  core 0
> rank 65  -  core 2
> rank 66 -   core 4
> ...
> rank 127- core 126
>
> If I use a rankfile:
> rank 0=epsilon102 slot=0
> rank 1=epsilon102 slot=2
> rank 2=epsilon102 slot=4
> rank 3=epsilon102 slot=6
> rank 4=epsilon102 slot=8
> rank 5=epsilon102slot=10
> ....
> rank 123=epsilon103 slot=118
> rank 124=epsilon103 slot=120
> rank 125=epsilon103 slot=122
> rank 126=epsilon103 slot=124
> rank 127=epsilon103 slot=126
>
> My --report-binding looks like:
>
> [epsilon102:2635370] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
> [BB/../../..
>
> /../../../../../../../../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
> ../../../../../../../../../../../../../../../../../..]
> [epsilon102:2635370] MCW rank 1 bound to socket 0[core 2[hwt 0-1]]:
> [../../BB/..
>
> /../../../../../../../../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
> ../../../../../../../../../../../../../../../../../..]
> [epsilon102:2635370] MCW rank 2 bound to socket 0[core 4[hwt 0-1]]:
> [../../../..
>
> /BB/../../../../../../../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../..][../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
> ../../../../../../../../../../../../../../../../../..]
>
>
> However, I cannot match this report-binding output by manually using
> --map-by and --bind-to. I had the impression that this will be the same:
>
> mpirun -np $SLURM_NTASKS  --report-bindings --map-by ppr:32:socket:PE=4
> --bind-to hwthread
>
> But this output is not quite the same:
>
> [epsilon102:2631529] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
> socket 0[cor
> e 1[hwt 0-1]]:
> [BB/BB/../../../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../../../../../../../../../../..][../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../../../../../../../../../../../../../../../../../../../../../..]
> [epsilon102:2631529] MCW rank 1 bound to socket 0[core 2[hwt 0-1]],
> socket 0[cor
> e 3[hwt 0-1]]:
> [../../BB/BB/../../../../../../../../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../../../../../../../../../../..][../../../../../../../../../../.
>
> ./../../../../../../../../../../../../../../../../../../../../../../../../../../
>
> ../../../../../../../../../../../../../../../../../../../../../../../../../../..]
>
> What am I missing to match the rankfile behavior? Regarding performance,
> what difference does it make between the first and the second outputs?
>
> Thanks for your help!
> Luis
>

Reply via email to