Hi, FYI : according to this thread http://www.open-mpi.org/community/lists/users/2011/07/16931.php OMPI uses its own binding scheme and core binding is enabled by default in version 1.8.
Regards On Thu, Dec 11, 2014 at 10:08 AM, Michael Würsch <[email protected]> wrote: > Hello, > > With OGS/GE 2011.11 and OpenMPI 1.8.3 we have a problem with core/memory > binding when multiple OpenMPI jobs run on the same machine. > > "qsub binding linear:1 job" works fine, if the job is the only one running > on the machine. As hwloc-ps and numastat show, each MPI thread is bound to > one core and allocates memory that belongs to the socket containing the > core. > > However, when two or more jobs run on the same machine, "binding linear:1" > causes them to be bound to the same cores. For instance, when two jobs with > 6 MPI threads each are started on a 12 core (2 x Xeon L5640, hyperthreading > switched off) machine, each of the two jobs is bound to these cores: > > [lx012:16840] MCW rank 0 bound to socket 0[core 0[hwt 0]]: > [B/././././.][./././././.] > [lx012:16840] MCW rank 1 bound to socket 1[core 6[hwt 0]]: > [./././././.][B/././././.] > [lx012:16840] MCW rank 2 bound to socket 0[core 1[hwt 0]]: > [./B/./././.][./././././.] > [lx012:16840] MCW rank 3 bound to socket 1[core 7[hwt 0]]: > [./././././.][./B/./././.] > [lx012:16840] MCW rank 4 bound to socket 0[core 2[hwt 0]]: > [././B/././.][./././././.] > [lx012:16840] MCW rank 5 bound to socket 1[core 8[hwt 0]]: > [./././././.][././B/././.] > ("mpirun -report-binding" output) > > Thus each MPI thread gets only 50% of a core and the remaining 6 cores are > not used. > > This is clearly not what we want. Is there a communication problem between > grid engine and OpenMPI? We do not fully understand how the communication > is supposed to work. The machines file created by grid engine contains only > machine names, but no information about which cores to use on these > machines. > > One could fix the binding by specifying explicitly (as parameters or in a > machine file) which cores should be used by mpirun. However, grid engine > seems to provide only the information on which core the first MPI thread > should run. When "qsub binding env linear:1" is used, grid engine sets > SGE_BINDING to 0 for the first job, 6 for the second job, 1 for the third > job, 7 for the forth job and so on. However, to construct a machine file > for OpenMPI one needs to know all cores that are supposed to be used by the > job. > > How can we force grid engine and OpenMPI to manage core binding in a > reasonable way? > > Maybe we are missing some setting in OpenMPI which we are not aware of (I > thought binding should be enabled as a standard). > If you need to know anything about our queue settings I could tell you > that. > > Thank you > Michael > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
