On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje <terry.don...@oracle.com>wrote:
> On 11/16/2010 01:31 PM, Reuti wrote: > > Hi Ralph, > > Am 16.11.2010 um 15:40 schrieb Ralph Castain: > > > 2. have SGE bind procs it launches to -all- of those cores. I believe SGE > does this automatically to constrain the procs to running on only those cores. > > This is another "bug/feature" in SGE: it's a matter of discussion, whether > the shepherd should get exactly one core (in case you use more than one > `qrsh`per node) for each call, or *all* cores assigned (which we need right > now, as the processes in Open MPI will be forks of orte daemon). About such a > situtation I filled an issue a long time ago and "limit_to_one_qrsh_per_host > yes/no" in the PE definition would do (this setting should then also change > the core allocation of the master process): > http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254 > > I believe this is indeed the crux of the issue > > fantastic to share the same view. > > > FWIW, I think I agree too. > > 3. tell OMPI to --bind-to-core. > > In other words, tell SGE to allocate a certain number of cores on each node, > but to bind each proc to all of them (i.e., don't bind a proc to a specific > core). I'm pretty sure that is a standard SGE option today (at least, I know > it used to be). I don't believe any patch or devel work is required (to > either SGE or OMPI). > > When you use a fixed allocation_rule and a matching -binding request it will > work today. But any other case won't be distributed in the correct way. > > Is it possible to not include the -binding request? If SGE is told to use a > fixed allocation_rule, and to allocate (for example) 2 cores/node, then won't > the orted see > itself bound to two specific cores on each node? > > When you leave out the -binding, all jobs are allowed to run on any core. > > > > We would then be okay as the spawned children of orted would inherit its > binding. Just don't tell mpirun to bind the processes and the threads of > those MPI procs will be able to operate across the provided cores. > > Or does SGE only allocate 2 cores/node in that case (i.e., allocate, but no > -binding given), but doesn't bind the orted to any two specific cores? If so, > then that would be a problem as the orted would think itself unconstrained. > If I understand the thread correctly, you're saying that this is what happens > today - true? > > Exactly. It won't apply any binding at all and orted would think of being > unlimited. I.e. limited only by the number of slots it should use thereon. > > > So I guess the question I have for Ralph. I thought, and this might be > mixing some of the ideas Jeff and I've been talking about, that when a RM > executes the orted with a bound set of resources (ie cores) that orted would > bind the individual processes on a subset of the bounded resources. Is this > not really the case for 1.4.X branch? I believe it is the case for the > trunk based on Jeff's refactoring. > You are absolutely correct, Terry, and the 1.4 release series does include the proper code. The point here, though, is that SGE binds the orted to a single core, even though other cores are also allocated. So the orted detects an external binding of one core, and binds all its children to that same core. What I had suggested to Reuti was to not include the -binding flag to SGE in the hopes that SGE would then bind the orted to all the allocated cores. However, as I feared, SGE in that case doesn't bind the orted at all - and so we assume the entire node is available for our use. This is an SGE issue. We need them to bind the orted to -all- the allocated cores (and only those cores) in order for us to operate correctly. > > > -- > [image: Oracle] > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle * - Performance Technologies* > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >