Thanks Gilles!!
On Wed, Sep 16, 2015 at 9:21 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Ralph, > > you can reproduce this with master by manually creating a cpuset with less > cores than available, > and invoke mpirun with -bind-to core from within the cpuset. > > i made PR 904 https://github.com/open-mpi/ompi/pull/904 > > Brice, > > can you please double check the hwloc_bitmap_isincluded invokation is > correct ? > an other way to fix this could be to always set opal_hwloc_base_cpu_set > > Cheers, > > Gilles > > > > > On 9/16/2015 11:47 PM, Ralph Castain wrote: > > As I said, if you don’t provide an explicit slot count in your hostfile, > we default to allowing oversubscription. We don’t have OAR integration in > OMPI, and so mpirun isn’t recognizing that you are running under a resource > manager - it thinks this is just being controlled by a hostfile. > > If you want us to error out on oversubscription, you can either add the > flag you identified, or simply change your hostfile to: > > frog53 slots=4 > > Either will work. > > > On Sep 16, 2015, at 1:00 AM, Patrick Begou < > patrick.be...@legi.grenoble-inp.fr> wrote: > > Thanks all for your answers, I've added some details about the tests I > have run. See below. > > > Ralph Castain wrote: > > Not precisely correct. It depends on the environment. > > If there is a resource manager allocating nodes, or you provide a hostfile > that specifies the number of slots on the nodes, or you use -host, then we > default to no-oversubscribe. > > I'm using a batch scheduler (OAR). > # cat /dev/cpuset/oar/begou_7955553/cpuset.cpus > 4-7 > > So 4 cores allowed. Nodes have two height cores cpus. > > Node file contains: > # cat $OAR_NODEFILE > frog53 > frog53 > frog53 > frog53 > > # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe > is okay (my test code show one process on each core) > (process 3) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (process 0) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog53 > (process 1) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (process 2) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog53 > > # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe > oversuscribe with: > (process 0) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog53 > (process 1) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (*process 3*) thread is now running on PU logical index *2 (OS/physical > index 6)* on system frog53 > (process 4) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (*process 2*) thread is now running on PU logical index *2 (OS/physical > index 6)* on system frog53 > This is not allowed with OpenMPI 1.7.3 > > I can increase until the maximul core number of this first pocessor (8 > cores) > # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep > 'thread is now running on PU' > (process 5) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (process 7) thread is now running on PU logical index 3 (OS/physical index > 7) on system frog53 > (process 4) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (process 6) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog53 > (process 2) thread is now running on PU logical index 1 (OS/physical index > 5) on system frog53 > (process 0) thread is now running on PU logical index 2 (OS/physical index > 6) on system frog53 > (process 1) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > (process 3) thread is now running on PU logical index 0 (OS/physical index > 4) on system frog53 > > But I cannot overload more than the 8 cores (max core number of one cpu). > # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node: frog53 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > Now if I add *--nooversubscribe* the problem doesn't exist anymore (no > more than 4 processes, one on each core). So looks like if default behavior > would be a nooversuscribe on cores number of the socket ??? > > Again, with 1.7.3 this problem doesn't occur at all. > > Patrick > > > > If you provide a hostfile that doesn’t specify slots, then we use the > number of cores we find on each node, and we allow oversubscription. > > What is being described sounds like more of a bug than an intended > feature. I’d need to know more about it, though, to be sure. Can you tell > me how you are specifying this cpuset? > > > On Sep 15, 2015, at 4:44 PM, Matt Thompson <fort...@gmail.com> wrote: > > Looking at the Open MPI 1.10.0 man page: > > <https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php> > https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php > > it looks like perhaps -oversubscribe (which was an option) is now the > default behavior. Instead we have: > > *-nooversubscribe, --nooversubscribe* Do not oversubscribe any nodes; > error (without starting any processes) if the requested number of processes > would cause oversubscription. This option implicitly sets "max_slots" equal > to the "slots" value for each node. > > It also looks like -map-by has a way to implement it as well (see man > page). > > Thanks for letting me/us know about this. On a system of mine I sort of > depend on the -nooversubscribe behavior! > > Matt > > > > On Tue, Sep 15, 2015 at 11:17 AM, Patrick Begou < > <patrick.be...@legi.grenoble-inp.fr>patrick.be...@legi.grenoble-inp.fr> > wrote: > >> Hi, >> >> I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx >> System. >> I've some troubles with the bind-to core option when using cpuset. >> If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on >> a 8 cores cpus) OpenMPI 1.10.0 allows to overload these cores until the >> maximum number of cores of the cpu. >> With this config and because the cpuset only allows 4 cores, I can reach >> 2 processes/core if I use: >> >> mpirun -np 8 --bind-to core my_application >> >> OpenMPI 1.7.3 doesn't show the problem with the same situation: >> mpirun -np 8 --bind-to-core my_application >> returns: >> *A request was made to bind to that would result in binding more* >> *processes than cpus on a resource* >> and that's okay of course. >> >> >> Is there a way to avoid this oveloading with OpenMPI 1.10.0 ? >> >> Thanks >> >> Patrick >> >> -- >> =================================================================== >> | Equipe M.O.S.T. | | >> | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr >> <patrick.be...@grenoble-inp.fr> | >> | LEGI | | >> | BP 53 X | Tel 04 76 82 51 35 | >> | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | >> =================================================================== >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> <http://www.open-mpi.org/community/lists/users/2015/09/27575.php> >> http://www.open-mpi.org/community/lists/users/2015/09/27575.php >> > > > > -- > Matt Thompson > > Man Among Men > Fulcrum of History > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> > http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > <http://www.open-mpi.org/community/lists/users/2015/09/27579.php> > http://www.open-mpi.org/community/lists/users/2015/09/27579.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27580.php > > > > -- > =================================================================== > | Equipe M.O.S.T. | | > | Patrick BEGOU | mailto:patrick.be...@grenoble-inp.fr > <patrick.be...@grenoble-inp.fr> | > | LEGI | | > | BP 53 X | Tel 04 76 82 51 35 | > | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | > =================================================================== > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> > http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > <http://www.open-mpi.org/community/lists/users/2015/09/27583.php> > http://www.open-mpi.org/community/lists/users/2015/09/27583.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27590.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27604.php >