Dear Gilles, thanks a lot for your response!
1. You're right, my stupid error, I forgot the "export" of OMP_PROC_BIND in my
job script. Now this example is working nearly as expected:
[pascal-1-07:25617] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-07:25617] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]],
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-0-06:02774] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-0-06:02774] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]],
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-07, Cpus_allowed_list:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0001(pid 25634),
cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0002(pid 25634),
cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0003(pid 25634),
cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0004(pid 25634),
cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0005(pid 25634),
cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0006(pid 25634),
cpu# 010, 0x00000400, Cpus_allowed_list: 10
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0007(pid 25634),
cpu# 012, 0x00001000, Cpus_allowed_list: 12
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0008(pid 25634),
cpu# 014, 0x00004000, Cpus_allowed_list: 14
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0009(pid 25634),
cpu# 016, 0x00010000, Cpus_allowed_list: 16
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0010(pid 25634),
cpu# 018, 0x00040000, Cpus_allowed_list: 18
MPI Instance 0002 of 0004 is on pascal-1-07, Cpus_allowed_list:
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0001(pid 25633),
cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0002(pid 25633),
cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0003(pid 25633),
cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0004(pid 25633),
cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0005(pid 25633),
cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0006(pid 25633),
cpu# 011, 0x00000800, Cpus_allowed_list: 11
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0007(pid 25633),
cpu# 013, 0x00002000, Cpus_allowed_list: 13
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0008(pid 25633),
cpu# 015, 0x00008000, Cpus_allowed_list: 15
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0009(pid 25633),
cpu# 017, 0x00020000, Cpus_allowed_list: 17
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0010(pid 25633),
cpu# 019, 0x00080000, Cpus_allowed_list: 19
MPI Instance 0003 of 0004 is on pascal-0-06, Cpus_allowed_list:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0001(pid 02787),
cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0002(pid 02787),
cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0003(pid 02787),
cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0004(pid 02787),
cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0005(pid 02787),
cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0006(pid 02787),
cpu# 010, 0x00000400, Cpus_allowed_list: 10
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0007(pid 02787),
cpu# 012, 0x00001000, Cpus_allowed_list: 12
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0008(pid 02787),
cpu# 014, 0x00004000, Cpus_allowed_list: 14
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0009(pid 02787),
cpu# 016, 0x00010000, Cpus_allowed_list: 16
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0010(pid 02787),
cpu# 018, 0x00040000, Cpus_allowed_list: 18
MPI Instance 0004 of 0004 is on pascal-0-06, Cpus_allowed_list:
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0001(pid 02786),
cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0002(pid 02786),
cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0003(pid 02786),
cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0004(pid 02786),
cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0005(pid 02786),
cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0006(pid 02786),
cpu# 011, 0x00000800, Cpus_allowed_list: 11
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0007(pid 02786),
cpu# 013, 0x00002000, Cpus_allowed_list: 13
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0008(pid 02786),
cpu# 015, 0x00008000, Cpus_allowed_list: 15
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0009(pid 02786),
cpu# 017, 0x00020000, Cpus_allowed_list: 17
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0010(pid 02786),
cpu# 019, 0x00080000, Cpus_allowed_list: 19
Only remaining question: why is "Cpus_allowed_list" of the OpenMPI job still
listing the full range of all cores/hwthreads, but OpenMP jobs only use numbers
0-19 (as expected)?
2. I have a different scenario which doesn't still work as expected:
Now i like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node ->
2 per socket, each executing one OpenMP job with 5 threads
mpirun -np 8 --map-by ppr:2:socket --use-hwthread-cpus -report-bindings
--mca plm_rsh_agent "qrsh" ./myid
I'd like to have a binding like this
cores
node 0 socket 0: 0+2+4+6+8 10+12+14+16+18
socket 1: 1+3+5+7+9 11+13+15+17+19
node 1 socket 0: 0+2+4+6+8 10+12+14+16+18
socket 1: 1+3+5+7+9 11+13+15+17+19
but as you find below all jobs are bound to all cores again which leads to a
situation like
cores
node 0 socket 0: 0+2+4+6+8 0+2+4+6+8
socket 1: 1+3+5+7+9 1+3+5+7+9
node 1 socket 0: 0+2+4+6+8 0+2+4+6+8
socket 1: 1+3+5+7+9 1+3+5+7+9
Could you give me a hint again how I could imporve that?
[pascal-0-01:01972] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-0-01:01972] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-0-01:01972] MCW rank 2 bound to socket 1[core 10[hwt 0-1]], socket
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]],
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-0-01:01972] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]],
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-2-01:18506] MCW rank 4 bound to socket 0[core 0[hwt 0-1]], socket
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-01:18506] MCW rank 5 bound to socket 0[core 0[hwt 0-1]], socket
0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-01:18506] MCW rank 6 bound to socket 1[core 10[hwt 0-1]], socket
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]],
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-2-01:18506] MCW rank 7 bound to socket 1[core 10[hwt 0-1]], socket
1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]],
socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt
0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core
19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0008 is on pascal-0-01, Cpus_allowed_list:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0001(pid 01999),
cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0002(pid 01999),
cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0003(pid 01999),
cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0004(pid 01999),
cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0005(pid 01999),
cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0002 of 0008 is on pascal-0-01, Cpus_allowed_list:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0001(pid 01996),
cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0002(pid 01996),
cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0003(pid 01996),
cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0004(pid 01996),
cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0005(pid 01996),
cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0003 of 0008 is on pascal-0-01, Cpus_allowed_list:
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0001(pid 01998),
cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0002(pid 01998),
cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0003(pid 01998),
cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0004(pid 01998),
cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0005(pid 01998),
cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0004 of 0008 is on pascal-0-01, Cpus_allowed_list:
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0001(pid 01997),
cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0002(pid 01997),
cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0003(pid 01997),
cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0004(pid 01997),
cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0005(pid 01997),
cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0005 of 0008 is on pascal-2-01, Cpus_allowed_list:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0001(pid 18531),
cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0002(pid 18531),
cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0003(pid 18531),
cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0004(pid 18531),
cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0005(pid 18531),
cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0006 of 0008 is on pascal-2-01, Cpus_allowed_list:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0001(pid 18530),
cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0002(pid 18530),
cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0003(pid 18530),
cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0004(pid 18530),
cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0005(pid 18530),
cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0007 of 0008 is on pascal-2-01, Cpus_allowed_list:
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0001(pid 18528),
cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0002(pid 18528),
cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0003(pid 18528),
cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0004(pid 18528),
cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0005(pid 18528),
cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0008 of 0008 is on pascal-2-01, Cpus_allowed_list:
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0001(pid 18527),
cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0002(pid 18527),
cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0003(pid 18527),
cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0004(pid 18527),
cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0005(pid 18527),
cpu# 009, 0x00000200, Cpus_allowed_list: 9
Thanks a lot in advance for your advice any have nice Easter days!
Ado
On 13.04.2017 08:48, Gilles Gouaillardet wrote:
> Heinz-Ado,
>
>
> it seems the OpenMP runtime did *not* bind the OMP threads at all as
> requested,
>
> and the root cause could be the OMP_PROC_BIND environment variable was not
> propagated
>
> can you try
>
> mpirun -x OMP_PROC_BIND ...
>
> and see if it helps ?
>
>
> Cheers,
>
>
> On 4/13/2017 12:23 AM, Heinz-Ado Arnolds wrote:
>> Dear Gilles,
>>
>> thanks for your answer.
>>
>> - compiler: gcc-6.3.0
>> - OpenMP environment vars: OMP_PROC_BIND=true, GOMP_CPU_AFFINITY not set
>> - hyperthread a given OpenMP thread is on: it's printed in the output below
>> as a 3-digit number after the first ",", read by sched_getcpu() in the
>> OpenMP test code
>> - the migration between cores/hyperthreads should be prevented by
>> OMP_PROC_BIND=true
>> - I didn't find a migration, but the similar use of one core/hyperthread by
>> two OpenMP threads in example "4"/"MPI Instance 0002": 011/031 are both on
>> core #11.
>>
>> Are there any hints how to cleanly transfer the OpenMPI binding to the
>> OpenMP tasks?
>>
>> Thanks and kind regards,
>>
>> Ado
>>
>> On 12.04.2017 15:40, Gilles Gouaillardet wrote:
>>> That should be a two steps tango
>>> - Open MPI bind a MPI task to a socket
>>> - the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside
>>> the socket assigned by Open MPI
>>>
>>> which compiler are you using ?
>>> do you set some environment variables to direct OpenMP to bind threads ?
>>>
>>> Also, how do you measure the hyperthread a given OpenMP thread is on ?
>>> is it the hyperthread used at a given time ? If yes, then the thread might
>>> migrate unless it was pinned by the OpenMP runtime.
>>>
>>> If you are not sure, please post the source of your program so we can have
>>> a look
>>>
>>> Last but not least, as long as OpenMP threads are pinned to distinct cores,
>>> you should not worry about them migrating between hyperthreads from the
>>> same core.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Wednesday, April 12, 2017, Heinz-Ado Arnolds
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>> Dear rhc,
>>>
>>> to make it more clear what I try to achieve, I collected some examples
>>> for several combinations of command line options. Would be great if you
>>> find time to look to these below. The most promise one is example "4".
>>>
>>> I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads,
>>> running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads.
>>> Only 10 cores (no hwthreads) should be used on each socket.
>>>
>>> 4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
>>> 2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
>>>
>>> 1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh"
>>> -report-bindings ./myid
>>>
>>> Machines :
>>> pascal-2-05...DE 20
>>> pascal-1-03...DE 20
>>>
>>> [pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
>>> 0[core 9[hwt 0-1]]:
>>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
>>> [pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
>>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
>>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
>>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
>>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]:
>>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
>>> [pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
>>> 0[core 9[hwt 0-1]]:
>>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
>>> [pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
>>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
>>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
>>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
>>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]:
>>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
>>> MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid
>>> 28833), 018, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid
>>> 28833), 014, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid
>>> 28833), 028, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid
>>> 28833), 012, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid
>>> 28833), 030, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid
>>> 28833), 016, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid
>>> 28833), 038, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid
>>> 28833), 034, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid
>>> 28833), 020, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid
>>> 28833), 022, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid
>>> 28834), 007, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid
>>> 28834), 037, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid
>>> 28834), 039, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid
>>> 28834), 035, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid
>>> 28834), 031, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid
>>> 28834), 005, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid
>>> 28834), 027, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid
>>> 28834), 017, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid
>>> 28834), 019, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid
>>> 28834), 029, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid
>>> 19269), 012, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid
>>> 19269), 034, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid
>>> 19269), 008, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid
>>> 19269), 038, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid
>>> 19269), 032, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid
>>> 19269), 036, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid
>>> 19269), 020, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid
>>> 19269), 002, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid
>>> 19269), 004, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid
>>> 19269), 006, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid
>>> 19268), 005, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid
>>> 19268), 029, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid
>>> 19268), 015, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid
>>> 19268), 007, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid
>>> 19268), 031, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid
>>> 19268), 013, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid
>>> 19268), 037, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid
>>> 19268), 039, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid
>>> 19268), 021, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid
>>> 19268), 023, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>>
>>> I get a distribution to 4 sockets on 2 nodes as expected, but cores
>>> and corresponding hwthreads are used simultaneously:
>>> MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP
>>> thread #0007 runs on CPU 038,
>>> MP thread #0002 runs on CPU 014, MP
>>> thread #0008 runs on CPU 034
>>> according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same
>>> physical cores
>>>
>>> 2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to
>>> hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
>>>
>>> Machines :
>>> pascal-1-05...DE 20
>>> pascal-2-05...DE 20
>>>
>>> I get this warning:
>>>
>>> WARNING: a request was made to bind a process. While the system
>>> supports binding the process itself, at least one node does NOT
>>> support binding memory to the process location.
>>>
>>> Node: pascal-1-05
>>>
>>> Open MPI uses the "hwloc" library to perform process and memory
>>> binding. This error message means that hwloc has indicated that
>>> processor binding support is not available on this machine.
>>>
>>> On OS X, processor and memory binding is not available at all
>>> (i.e.,
>>> the OS does not expose this functionality).
>>>
>>> On Linux, lack of the functionality can mean that you are on a
>>> platform where processor and memory affinity is not supported in
>>> Linux
>>> itself, or that hwloc was built without NUMA and/or processor
>>> affinity
>>> support. When building hwloc (which, depending on your Open MPI
>>> installation, may be embedded in Open MPI itself), it is important
>>> to
>>> have the libnuma header and library files available. Different
>>> linux
>>> distributions package these files under different names; look for
>>> packages with the word "numa" in them. You may also need a
>>> developer
>>> version of the package (e.g., with "dev" or "devel" in the name) to
>>> obtain the relevant header files.
>>>
>>> If you are getting this message on a non-OS X, non-Linux platform,
>>> then hwloc does not support processor / memory affinity on this
>>> platform. If the OS/platform does actually support processor /
>>> memory
>>> affinity, then you should contact the hwloc maintainers:
>>> https://github.com/open-mpi/hwloc
>>> <https://github.com/open-mpi/hwloc>.
>>>
>>> This is a warning only; your job will continue, though performance
>>> may
>>> be degraded.
>>>
>>> and these results:
>>>
>>> [pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>>> [B./../../../../../../../../..][../../../../../../../../../..]
>>> [pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]:
>>> [.B/../../../../../../../../..][../../../../../../../../../..]
>>> [pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]:
>>> [B./../../../../../../../../..][../../../../../../../../../..]
>>> [pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]:
>>> [.B/../../../../../../../../..][../../../../../../../../../..]
>>> MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: >>> 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid
>>> 33193), 000, Cpus_allowed_list: 0
>>> MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list:
>>> 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid
>>> 33192), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: >>> 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid
>>> 28930), 000, Cpus_allowed_list: 0
>>> MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list:
>>> 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid
>>> 28929), 020, Cpus_allowed_list: 20
>>>
>>> Only 2 CPUs are used and these are the same physical cores.
>>>
>>> 3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca
>>> plm_rsh_agent "qrsh" -report-bindings ./myid
>>>
>>> Machines :
>>> pascal-1-03...DE 20
>>> pascal-2-02...DE 20
>>>
>>> I get a warning again:
>>>
>>> WARNING: a request was made to bind a process. While the system
>>> supports binding the process itself, at least one node does NOT
>>> support binding memory to the process location.
>>>
>>> Node: pascal-1-03
>>>
>>> Open MPI uses the "hwloc" library to perform process and memory
>>> binding. This error message means that hwloc has indicated that
>>> processor binding support is not available on this machine.
>>>
>>> On OS X, processor and memory binding is not available at all
>>> (i.e.,
>>> the OS does not expose this functionality).
>>>
>>> On Linux, lack of the functionality can mean that you are on a
>>> platform where processor and memory affinity is not supported in
>>> Linux
>>> itself, or that hwloc was built without NUMA and/or processor
>>> affinity
>>> support. When building hwloc (which, depending on your Open MPI
>>> installation, may be embedded in Open MPI itself), it is important
>>> to
>>> have the libnuma header and library files available. Different
>>> linux
>>> distributions package these files under different names; look for
>>> packages with the word "numa" in them. You may also need a
>>> developer
>>> version of the package (e.g., with "dev" or "devel" in the name) to
>>> obtain the relevant header files.
>>>
>>> If you are getting this message on a non-OS X, non-Linux platform,
>>> then hwloc does not support processor / memory affinity on this
>>> platform. If the OS/platform does actually support processor /
>>> memory
>>> affinity, then you should contact the hwloc maintainers:
>>> https://github.com/open-mpi/hwloc
>>> <https://github.com/open-mpi/hwloc>.
>>>
>>> This is a warning only; your job will continue, though performance
>>> may
>>> be degraded.
>>>
>>> and these results:
>>>
>>> [pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>>> [B./../../../../../../../../..][../../../../../../../../../..]
>>> [pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]:
>>> [../../../../../../../../../..][B./../../../../../../../../..]
>>> [pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]:
>>> [.B/../../../../../../../../..][../../../../../../../../../..]
>>> [pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]:
>>> [../../../../../../../../../..][.B/../../../../../../../../..]
>>> MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: >>> 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid
>>> 19373), 000, Cpus_allowed_list: 0
>>> MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list:
>>> 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid
>>> 19372), 001, Cpus_allowed_list: 1
>>> MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list:
>>> 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid
>>> 19370), 020, Cpus_allowed_list: 20
>>> MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list:
>>> 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>> MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid
>>> 19371), 021, Cpus_allowed_list: 21
>>>
>>> The jobs are scheduled to one machine only.
>>>
>>> 4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca
>>> plm_rsh_agent "qrsh" -report-bindings ./myid
>>>
>>> Machines :
>>> pascal-1-00...DE 20
>>> pascal-3-00...DE 20
>>>
>>> [pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
>>> 0[core 9[hwt 0-1]]:
>>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
>>> [pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
>>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
>>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
>>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
>>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]:
>>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
>>> [pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
>>> 0[core 9[hwt 0-1]]:
>>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
>>> [pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
>>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
>>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
>>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
>>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]:
>>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
>>> MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid
>>> 05884), 034, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid
>>> 05884), 038, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid
>>> 05884), 002, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid
>>> 05884), 008, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid
>>> 05884), 036, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid
>>> 05884), 000, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid
>>> 05884), 004, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid
>>> 05884), 006, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid
>>> 05884), 030, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid
>>> 05884), 032, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid
>>> 05883), 031, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid
>>> 05883), 017, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid
>>> 05883), 027, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid
>>> 05883), 039, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid
>>> 05883), 011, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid
>>> 05883), 033, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid
>>> 05883), 015, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid
>>> 05883), 021, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid
>>> 05883), 003, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid
>>> 05883), 025, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid
>>> 07513), 016, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid
>>> 07513), 020, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid
>>> 07513), 022, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid
>>> 07513), 018, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid
>>> 07513), 012, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid
>>> 07513), 004, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid
>>> 07513), 008, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid
>>> 07513), 006, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid
>>> 07513), 030, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid
>>> 07513), 034, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid
>>> 07514), 017, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid
>>> 07514), 025, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid
>>> 07514), 029, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid
>>> 07514), 003, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid
>>> 07514), 033, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid
>>> 07514), 001, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid
>>> 07514), 007, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid
>>> 07514), 039, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid
>>> 07514), 035, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid
>>> 07514), 031, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>>
>>> This distribution looks very well with this combination of options
>>> "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at
>>> "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU
>>> 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same
>>> physical core.
>>> All others are real perfect! Is this error due to my fault or might
>>> their be a small remaining binding problem in OpenMPI?
>>>
>>> I'd appreciate any hint very much!
>>>
>>> Kind regards,
>>>
>>> Ado
>>>
>>> On 11.04.2017 01:36, [email protected] <javascript:;> wrote:
>>> > I’m not entirely sure I understand your reference to “real cores”.
>>> When we bind you to a core, we bind you to all the HT’s that comprise that
>>> core. So, yes, with HT enabled, the binding report will list things by HT,
>>> but you’ll always be bound to the full core if you tell us bind-to core
>>> >
>>> > The default binding directive is bind-to socket when more than 2
>>> processes are in the job, and that’s what you are showing. You can override
>>> that by adding "-bind-to core" to your cmd line if that is what you desire.
>>> >
>>> > If you want to use individual HTs as independent processors, then
>>> “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right
>>> combination.
>>> >
>>> >> On Apr 10, 2017, at 3:55 AM, Heinz-Ado Arnolds
>>> <[email protected] <javascript:;>> wrote:
>>> >>
>>> >> Dear OpenMPI users & developers,
>>> >>
>>> >> I'm trying to distribute my jobs (with SGE) to a machine with a
>>> certain number of nodes, each node having 2 sockets, each socket having 10
>>> cores & 10 hyperthreads. I like to use only the real cores, no
>>> hyperthreading.
>>> >>
>>> >> lscpu -a -e
>>> >>
>>> >> CPU NODE SOCKET CORE L1d:L1i:L2:L3
>>> >> 0 0 0 0 0:0:0:0
>>> >> 1 1 1 1 1:1:1:1
>>> >> 2 0 0 2 2:2:2:0
>>> >> 3 1 1 3 3:3:3:1
>>> >> 4 0 0 4 4:4:4:0
>>> >> 5 1 1 5 5:5:5:1
>>> >> 6 0 0 6 6:6:6:0
>>> >> 7 1 1 7 7:7:7:1
>>> >> 8 0 0 8 8:8:8:0
>>> >> 9 1 1 9 9:9:9:1
>>> >> 10 0 0 10 10:10:10:0
>>> >> 11 1 1 11 11:11:11:1
>>> >> 12 0 0 12 12:12:12:0
>>> >> 13 1 1 13 13:13:13:1
>>> >> 14 0 0 14 14:14:14:0
>>> >> 15 1 1 15 15:15:15:1
>>> >> 16 0 0 16 16:16:16:0
>>> >> 17 1 1 17 17:17:17:1
>>> >> 18 0 0 18 18:18:18:0
>>> >> 19 1 1 19 19:19:19:1
>>> >> 20 0 0 0 0:0:0:0
>>> >> 21 1 1 1 1:1:1:1
>>> >> 22 0 0 2 2:2:2:0
>>> >> 23 1 1 3 3:3:3:1
>>> >> 24 0 0 4 4:4:4:0
>>> >> 25 1 1 5 5:5:5:1
>>> >> 26 0 0 6 6:6:6:0
>>> >> 27 1 1 7 7:7:7:1
>>> >> 28 0 0 8 8:8:8:0
>>> >> 29 1 1 9 9:9:9:1
>>> >> 30 0 0 10 10:10:10:0
>>> >> 31 1 1 11 11:11:11:1
>>> >> 32 0 0 12 12:12:12:0
>>> >> 33 1 1 13 13:13:13:1
>>> >> 34 0 0 14 14:14:14:0
>>> >> 35 1 1 15 15:15:15:1
>>> >> 36 0 0 16 16:16:16:0
>>> >> 37 1 1 17 17:17:17:1
>>> >> 38 0 0 18 18:18:18:0
>>> >> 39 1 1 19 19:19:19:1
>>> >>
>>> >> How do I have to choose the options & parameters of mpirun to
>>> achieve this behavior?
>>> >>
>>> >> mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh"
>>> -report-bindings ./myid
>>> >>
>>> >> distributes to
>>> >>
>>> >> [pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
>>> 0[core 9[hwt 0-1]]:
>>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
>>> >> [pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
>>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
>>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
>>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
>>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]:
>>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
>>> >> [pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
>>> socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
>>> 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
>>> 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
>>> 0[core 9[hwt 0-1]]:
>>> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
>>> >> [pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
>>> socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
>>> 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
>>> socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
>>> 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]:
>>> [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
>>> >> MPI Instance 0001 of 0004 is on
>>> pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE
>>> <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> >> MPI Instance 0002 of 0004 is on
>>> pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE
>>> <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> >> MPI Instance 0003 of 0004 is on
>>> pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE
>>> <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list:
>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>> >> MPI Instance 0004 of 0004 is on
>>> pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE
>>> <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list:
>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>> >>
>>> >> i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but
>>> uses all hwthreads
>>> >>
>>> >> I have tried several combinations of --use-hwthread-cpus, --bind-to
>>> hwthreads, but didn't find the right combination.
>>> >>
>>> >> Would be great to get any hints?
>>> >>
>>> >> Thank a lot in advance,
>>> >>
>>> >> Heinz-Ado Arnolds
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> [email protected] <javascript:;>
>>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > [email protected] <javascript:;>
>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>> >
>>> >
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ users mailing list [email protected] https://rfd.newmexicoconsortium.org/mailman/listinfo/users
