Thanks Gilles!!

On Wed, Sep 16, 2015 at 9:21 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Ralph,
>
> you can reproduce this with master by manually creating a cpuset with less
> cores than available,
> and invoke mpirun with -bind-to core from within the cpuset.
>
> i made PR 904 https://github.com/open-mpi/ompi/pull/904
>
> Brice,
>
> can you please double check the hwloc_bitmap_isincluded invokation is
> correct ?
> an other way to fix this could be to always set opal_hwloc_base_cpu_set
>
> Cheers,
>
> Gilles
>
>
>
>
> On 9/16/2015 11:47 PM, Ralph Castain wrote:
>
> As I said, if you don’t provide an explicit slot count in your hostfile,
> we default to allowing oversubscription. We don’t have OAR integration in
> OMPI, and so mpirun isn’t recognizing that you are running under a resource
> manager - it thinks this is just being controlled by a hostfile.
>
> If you want us to error out on oversubscription, you can either add the
> flag you identified, or simply change your hostfile to:
>
> frog53 slots=4
>
> Either will work.
>
>
> On Sep 16, 2015, at 1:00 AM, Patrick Begou <
> patrick.be...@legi.grenoble-inp.fr> wrote:
>
> Thanks all for your answers, I've added some details about the tests I
> have run.  See below.
>
>
> Ralph Castain wrote:
>
> Not precisely correct. It depends on the environment.
>
> If there is a resource manager allocating nodes, or you provide a hostfile
> that specifies the number of slots on the nodes, or you use -host, then we
> default to no-oversubscribe.
>
> I'm using a batch scheduler (OAR).
> # cat /dev/cpuset/oar/begou_7955553/cpuset.cpus
> 4-7
>
> So 4 cores allowed. Nodes have two height cores cpus.
>
> Node file contains:
> # cat $OAR_NODEFILE
> frog53
> frog53
> frog53
> frog53
>
> # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe
> is  okay (my test code show one process on each core)
> (process 3) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (process 0) thread is now running on PU logical index 3 (OS/physical index
> 7) on system frog53
> (process 1) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (process 2) thread is now running on PU logical index 2 (OS/physical index
> 6) on system frog53
>
> # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe
> oversuscribe with:
> (process 0) thread is now running on PU logical index 3 (OS/physical index
> 7) on system frog53
> (process 1) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (*process 3*) thread is now running on PU logical index *2 (OS/physical
> index 6)* on system frog53
> (process 4) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (*process 2*) thread is now running on PU logical index *2 (OS/physical
> index 6)* on system frog53
> This is not allowed with OpenMPI 1.7.3
>
> I can increase until the maximul core number of this first pocessor (8
> cores)
> # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep
> 'thread is now running on PU'
> (process 5) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (process 7) thread is now running on PU logical index 3 (OS/physical index
> 7) on system frog53
> (process 4) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (process 6) thread is now running on PU logical index 2 (OS/physical index
> 6) on system frog53
> (process 2) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (process 0) thread is now running on PU logical index 2 (OS/physical index
> 6) on system frog53
> (process 1) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (process 3) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
>
> But I cannot overload more than the 8 cores (max core number of one cpu).
> # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
>
>    Bind to:     CORE
>    Node:        frog53
>    #processes:  2
>    #cpus:       1
>
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
>
> Now if I add *--nooversubscribe* the problem doesn't exist anymore (no
> more than 4 processes, one on each core). So looks like if default behavior
> would be a nooversuscribe on cores number of the socket ???
>
> Again, with 1.7.3 this problem doesn't occur at all.
>
> Patrick
>
>
>
> If you provide a hostfile that doesn’t specify slots, then we use the
> number of cores we find on each node, and we allow oversubscription.
>
> What is being described sounds like more of a bug than an intended
> feature. I’d need to know more about it, though, to be sure. Can you tell
> me how you are specifying this cpuset?
>
>
> On Sep 15, 2015, at 4:44 PM, Matt Thompson <fort...@gmail.com> wrote:
>
> Looking at the Open MPI 1.10.0 man page:
>
>    <https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php>
> https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php
>
> it looks like perhaps -oversubscribe (which was an option) is now the
> default behavior. Instead we have:
>
> *-nooversubscribe, --nooversubscribe* Do not oversubscribe any nodes;
> error (without starting any processes) if the requested number of processes
> would cause oversubscription. This option implicitly sets "max_slots" equal
> to the "slots" value for each node.
>
> It also looks like -map-by has a way to implement it as well (see man
> page).
>
> Thanks for letting me/us know about this. On a system of mine I sort of
> depend on the -nooversubscribe behavior!
>
> Matt
>
>
>
> On Tue, Sep 15, 2015 at 11:17 AM, Patrick Begou <
> <patrick.be...@legi.grenoble-inp.fr>patrick.be...@legi.grenoble-inp.fr>
> wrote:
>
>> Hi,
>>
>> I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx
>> System.
>> I've some troubles with the bind-to core option when using cpuset.
>> If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on
>> a 8 cores cpus) OpenMPI 1.10.0 allows to overload these cores  until the
>> maximum number of cores of the cpu.
>> With this config and because the cpuset only allows 4 cores, I can reach
>> 2 processes/core if I use:
>>
>> mpirun -np 8 --bind-to core my_application
>>
>> OpenMPI 1.7.3 doesn't show the problem with the same situation:
>> mpirun -np 8 --bind-to-core my_application
>> returns:
>> *A request was made to bind to that would result in binding more*
>> *processes than cpus on a resource*
>> and that's okay of course.
>>
>>
>> Is there a way to avoid this oveloading with OpenMPI 1.10.0 ?
>>
>> Thanks
>>
>> Patrick
>>
>> --
>> ===================================================================
>> |  Equipe M.O.S.T.         |                                      |
>> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr 
>> <patrick.be...@grenoble-inp.fr> |
>> |  LEGI                    |                                      |
>> |  BP 53 X                 | Tel 04 76 82 51 35                   |
>> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
>> ===================================================================
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription:  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> <http://www.open-mpi.org/community/lists/users/2015/09/27575.php>
>> http://www.open-mpi.org/community/lists/users/2015/09/27575.php
>>
>
>
>
> --
> Matt Thompson
>
> Man Among Men
> Fulcrum of History
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription:  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> <http://www.open-mpi.org/community/lists/users/2015/09/27579.php>
> http://www.open-mpi.org/community/lists/users/2015/09/27579.php
>
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27580.php
>
>
>
> --
> ===================================================================
> |  Equipe M.O.S.T.         |                                      |
> |  Patrick BEGOU           | mailto:patrick.be...@grenoble-inp.fr 
> <patrick.be...@grenoble-inp.fr> |
> |  LEGI                    |                                      |
> |  BP 53 X                 | Tel 04 76 82 51 35                   |
> |  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71                   |
> ===================================================================
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription:  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> <http://www.open-mpi.org/community/lists/users/2015/09/27583.php>
> http://www.open-mpi.org/community/lists/users/2015/09/27583.php
>
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27590.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27604.php
>

Reply via email to