Hi Ralph, Thanks for this, however --map-by ppr:32:socket:PE=2 --bind-to core reports the same binding than --map-by ppr:32:socket:PE=4 --bind-to hwthread:
[epsilon104:2861230] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]]: [BB/BB/../../../../ ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../ ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../.. /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../.. /../../../../../../../..] [epsilon104:2861230] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]]: [../../BB/BB/../../ ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../ ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../.. /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../.. /../../../../../../../..] [epsilon104:2861230] MCW rank 2 bound to socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [../../../../BB/BB/ ../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../ ../../../../../../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../.. /../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../.. /../../../../../../../..] And this is still different from the output produce using the rankfile. Cheers, Luis On 28/02/2021 14:06, Ralph Castain via users wrote: > Your command line is incorrect: > > --map-by ppr:32:socket:PE=4 --bind-to hwthread > > should be > > --map-by ppr:32:socket:PE=2 --bind-to core > > > >> On Feb 28, 2021, at 5:57 AM, Luis Cebamanos via users >> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >> >> I should have said, "I would like to run 128 MPI processes on 2 >> nodes" and not 64 like I initially said... >> >> On Sat, 27 Feb 2021, 15:03 Luis Cebamanos, <luic...@gmail.com >> <mailto:luic...@gmail.com>> wrote: >> >> Hello OMPI users, >> >> On 128 core nodes, 2 sockets x 64 cores/socket (2 hwthreads/core) >> , I am >> trying to match the behavior of running with a rankfile with manual >> mapping/ranking/binding. >> >> I would like to run 64 MPI processes on 2 nodes, 1 MPI process >> every 2 >> cores. This is, I want to run 32 MPI processes per socket on 2 >> 128-core >> nodes. My mapping should be something like: >> >> Node 0 >> ===== >> rank 0 - core 0 >> rank 1 - core 2 >> rank 3 - core 4 >> ... >> rank 63 - core 126 >> >> >> Node 1 >> ==== >> rank 64 - core 0 >> rank 65 - core 2 >> rank 66 - core 4 >> ... >> rank 127- core 126 >> >> If I use a rankfile: >> rank 0=epsilon102 slot=0 >> rank 1=epsilon102 slot=2 >> rank 2=epsilon102 slot=4 >> rank 3=epsilon102 slot=6 >> rank 4=epsilon102 slot=8 >> rank 5=epsilon102slot=10 >> .... >> rank 123=epsilon103 slot=118 >> rank 124=epsilon103 slot=120 >> rank 125=epsilon103 slot=122 >> rank 126=epsilon103 slot=124 >> rank 127=epsilon103 slot=126 >> >> My --report-binding looks like: >> >> [epsilon102:2635370] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: >> [BB/../../.. >> >> /../../../../../../../../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> ../../../../../../../../../../../../../../../../../..] >> [epsilon102:2635370] MCW rank 1 bound to socket 0[core 2[hwt 0-1]]: >> [../../BB/.. >> >> /../../../../../../../../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> ../../../../../../../../../../../../../../../../../..] >> [epsilon102:2635370] MCW rank 2 bound to socket 0[core 4[hwt 0-1]]: >> [../../../.. >> >> /BB/../../../../../../../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../..][../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> ../../../../../../../../../../../../../../../../../..] >> >> >> However, I cannot match this report-binding output by manually using >> --map-by and --bind-to. I had the impression that this will be >> the same: >> >> mpirun -np $SLURM_NTASKS --report-bindings --map-by >> ppr:32:socket:PE=4 >> --bind-to hwthread >> >> But this output is not quite the same: >> >> [epsilon102:2631529] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], >> socket 0[cor >> e 1[hwt 0-1]]: >> [BB/BB/../../../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../../../../../../../../../../..][../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../../../../../../../../../../../../../../../../../../../../../..] >> [epsilon102:2631529] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], >> socket 0[cor >> e 3[hwt 0-1]]: >> [../../BB/BB/../../../../../../../../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../../../../../../../../../../..][../../../../../../../../../../. >> >> ./../../../../../../../../../../../../../../../../../../../../../../../../../../ >> >> ../../../../../../../../../../../../../../../../../../../../../../../../../../..] >> >> What am I missing to match the rankfile behavior? Regarding >> performance, >> what difference does it make between the first and the second >> outputs? >> >> Thanks for your help! >> Luis >> >