Hi,
I've been following this thread because it may be relevant to our setup.

Is there a drawback of having orte_hetero_nodes=1 as default MCA parameter ? Is there a reason why the most generic case is not assumed ?

Maxime Boissonneault

Le 2014-06-20 13:48, Ralph Castain a écrit :
Put "orte_hetero_nodes=1" in your default MCA param file - uses can override by 
setting that param to 0


On Jun 20, 2014, at 10:30 AM, Brock Palen <bro...@umich.edu> wrote:

Perfection!  That appears to do it for our standard case.

Now I know how to set MCA options by env var or config file.  How can I make 
this the default, that then a user can override?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 1:21 PM, Ralph Castain <r...@open-mpi.org> wrote:

I think I begin to grok at least part of the problem. If you are assigning 
different cpus on each node, then you'll need to tell us that by setting 
--hetero-nodes otherwise we won't have any way to report that back to mpirun 
for its binding calculation.

Otherwise, we expect that the cpuset of the first node we launch a daemon onto 
(or where mpirun is executing, if we are only launching local to mpirun) 
accurately represents the cpuset on every node in the allocation.

We still might well have a bug in our binding computation - but the above will 
definitely impact what you said the user did.

On Jun 20, 2014, at 10:06 AM, Brock Palen <bro...@umich.edu> wrote:

Extra data point if I do:

[brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:

  Bind to:         CORE
  Node:            nyx5513
  #processes:  2
  #cpus:          1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------

[brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
[brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
0x00000010
0x00001000
[brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
nyx5513
nyx5513

Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
available, PBS says it gave it two, and if I force (this is all inside an 
interactive job) just on that node hwloc-bind --get I get what I expect,

Is there a way to get a map of what MPI thinks it has on each host?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:38 PM, Brock Palen <bro...@umich.edu> wrote:

I was able to produce it in my test.

orted affinity set by cpuset:
[root@nyx5874 ~]# hwloc-bind --get --pid 103645
0x0000c002

This mask (1, 14,15) which is across sockets, matches the cpu set setup by the 
batch system.
[root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus
1,14-15

The ranks though were then all set to the same core:

[root@nyx5874 ~]# hwloc-bind --get --pid 103871
0x00008000
[root@nyx5874 ~]# hwloc-bind --get --pid 103872
0x00008000
[root@nyx5874 ~]# hwloc-bind --get --pid 103873
0x00008000

Which is core 15:

report-bindings gave me:
You can see how a few nodes were bound to all the same core, the last one in 
each case.  I only gave you the results for the hose nyx5874.

[nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
available processors)
[nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
available processors)
[nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
available processors)
[nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
available processors)
[nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
available processors)
[nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5545.engin.umich.edu:88170] MCW rank 2 is not bound (or bound to all 
available processors)
[nyx5613.engin.umich.edu:25229] MCW rank 31 is not bound (or bound to all 
available processors)
[nyx5880.engin.umich.edu:01406] MCW rank 10 is not bound (or bound to all 
available processors)
[nyx5770.engin.umich.edu:86538] MCW rank 6 is not bound (or bound to all 
available processors)
[nyx5613.engin.umich.edu:25228] MCW rank 30 is not bound (or bound to all 
available processors)
[nyx5577.engin.umich.edu:65949] MCW rank 4 is not bound (or bound to all 
available processors)
[nyx5607.engin.umich.edu:30379] MCW rank 14 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72960] MCW rank 47 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72959] MCW rank 46 is not bound (or bound to all 
available processors)
[nyx5848.engin.umich.edu:04332] MCW rank 33 is not bound (or bound to all 
available processors)
[nyx5848.engin.umich.edu:04333] MCW rank 34 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72958] MCW rank 45 is not bound (or bound to all 
available processors)
[nyx5858.engin.umich.edu:12165] MCW rank 35 is not bound (or bound to all 
available processors)
[nyx5607.engin.umich.edu:30380] MCW rank 15 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72957] MCW rank 44 is not bound (or bound to all 
available processors)
[nyx5858.engin.umich.edu:12167] MCW rank 37 is not bound (or bound to all 
available processors)
[nyx5870.engin.umich.edu:33811] MCW rank 7 is not bound (or bound to all 
available processors)
[nyx5582.engin.umich.edu:81994] MCW rank 5 is not bound (or bound to all 
available processors)
[nyx5848.engin.umich.edu:04331] MCW rank 32 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46654] MCW rank 50 is not bound (or bound to all 
available processors)
[nyx5858.engin.umich.edu:12166] MCW rank 36 is not bound (or bound to all 
available processors)
[nyx5799.engin.umich.edu:67802] MCW rank 22 is not bound (or bound to all 
available processors)
[nyx5799.engin.umich.edu:67803] MCW rank 23 is not bound (or bound to all 
available processors)
[nyx5556.engin.umich.edu:50889] MCW rank 3 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95931] MCW rank 53 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95930] MCW rank 52 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46655] MCW rank 51 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95932] MCW rank 54 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95933] MCW rank 55 is not bound (or bound to all 
available processors)
[nyx5866.engin.umich.edu:16306] MCW rank 40 is not bound (or bound to all 
available processors)
[nyx5861.engin.umich.edu:22761] MCW rank 61 is not bound (or bound to all 
available processors)
[nyx5861.engin.umich.edu:22762] MCW rank 62 is not bound (or bound to all 
available processors)
[nyx5861.engin.umich.edu:22763] MCW rank 63 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46652] MCW rank 48 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46653] MCW rank 49 is not bound (or bound to all 
available processors)
[nyx5866.engin.umich.edu:16304] MCW rank 38 is not bound (or bound to all 
available processors)
[nyx5788.engin.umich.edu:02465] MCW rank 20 is not bound (or bound to all 
available processors)
[nyx5597.engin.umich.edu:68071] MCW rank 27 is not bound (or bound to all 
available processors)
[nyx5775.engin.umich.edu:27952] MCW rank 17 is not bound (or bound to all 
available processors)
[nyx5866.engin.umich.edu:16305] MCW rank 39 is not bound (or bound to all 
available processors)
[nyx5788.engin.umich.edu:02466] MCW rank 21 is not bound (or bound to all 
available processors)
[nyx5775.engin.umich.edu:27951] MCW rank 16 is not bound (or bound to all 
available processors)
[nyx5597.engin.umich.edu:68073] MCW rank 29 is not bound (or bound to all 
available processors)
[nyx5597.engin.umich.edu:68072] MCW rank 28 is not bound (or bound to all 
available processors)
[nyx5552.engin.umich.edu:30481] MCW rank 12 is not bound (or bound to all 
available processors)
[nyx5552.engin.umich.edu:30482] MCW rank 13 is not bound (or bound to all 
available processors)


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:20 PM, Brock Palen <bro...@umich.edu> wrote:

Got it,

I have the input from the user and am testing it out.

It probably has less todo with torque and more cpuset's,

I'm working on producing it myself also.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:18 PM, Ralph Castain <r...@open-mpi.org> wrote:

Thanks - I'm just trying to reproduce one problem case so I can look at it. Given that I 
don't have access to a Torque machine, I need to "fake" it.


On Jun 20, 2014, at 9:15 AM, Brock Palen <bro...@umich.edu> wrote:

In this case they are a single socket, but as you can see they could be 
ether/or depending on the job.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 19, 2014, at 2:44 PM, Ralph Castain <r...@open-mpi.org> wrote:

Sorry, I should have been clearer - I was asking if cores 8-11 are all on one 
socket, or span multiple sockets


On Jun 19, 2014, at 11:36 AM, Brock Palen <bro...@umich.edu> wrote:

Ralph,

It was a large job spread across.  Our system allows users to ask for 'procs' 
which are laid out in any format.

The list:

[nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
[nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
[nyx5409:11][nyx5411:11][nyx5412:3]
Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.

They could be spread across any number of sockets configuration.  We start very lax 
"user requests X procs" and then the user can request more strict requirements 
from there.  We support mostly serial users, and users can colocate on nodes.

That is good to know, I think we would want to turn our default to 'bind to 
core' except for our few users who use hybrid mode.

Our CPU set tells you what cores the job is assigned.  So in the problem case 
provided, the cpuset/cgroup shows only cores 8-11 are available to this job on 
this node.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 18, 2014, at 11:10 PM, Ralph Castain <r...@open-mpi.org> wrote:

The default binding option depends on the number of procs - it is bind-to core for 
np=2, and bind-to socket for np > 2. You never said, but should I assume you 
ran 4 ranks? If so, then we should be trying to bind-to socket.

I'm not sure what your cpuset is telling us - are you binding us to a socket? 
Are some cpus in one socket, and some in another?

It could be that the cpuset + bind-to socket is resulting in some odd behavior, 
but I'd need a little more info to narrow it down.


On Jun 18, 2014, at 7:48 PM, Brock Palen <bro...@umich.edu> wrote:

I have started using 1.8.1 for some codes (meep in this case) and it sometimes 
works fine, but in a few cases I am seeing ranks being given overlapping CPU 
assignments, not always though.

Example job, default binding options (so by-core right?):

Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and use 
TM to spawn.

[nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
[nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
[nyx5409:11][nyx5411:11][nyx5412:3]

[root@nyx5398 ~]# hwloc-bind --get --pid 16065
0x00000200
[root@nyx5398 ~]# hwloc-bind --get --pid 16066
0x00000800
[root@nyx5398 ~]# hwloc-bind --get --pid 16067
0x00000200
[root@nyx5398 ~]# hwloc-bind --get --pid 16068
0x00000800

[root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus
8-11

So torque claims the CPU set setup for the job has 4 cores, but as you can see 
the ranks were giving identical binding.

I checked the pids they were part of the correct CPU set, I also checked, orted:

[root@nyx5398 ~]# hwloc-bind --get --pid 16064
0x00000f00
[root@nyx5398 ~]# hwloc-calc --intersect PU 16064
ignored unrecognized argument 16064

[root@nyx5398 ~]# hwloc-calc --intersect PU 0x00000f00
8,9,10,11

Which is exactly what I would expect.

So ummm, i'm lost why this might happen?  What else should I check?  Like I 
said not all jobs show this behavior.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24672.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24673.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24675.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24676.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24677.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24678.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24681.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24682.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24683.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24684.php


--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique

Reply via email to