Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

Chris Jewell Tue, 16 Nov 2010 04:26:26 -0500

Hi all,

> On 11/15/2010 02:11 PM, Reuti wrote: 
>> Just to give my understanding of the problem: 
>>> 
>>>>> Sorry, I am still trying to grok all your email as what the problem you 
>>>>> are trying to solve. So is the issue is trying to have two jobs having 
>>>>> processes on the same node be able to bind there processes on different 
>>>>> resources. Like core 1 for the first job and core 2 and 3 for the 2nd 
>>>>> job? 
>>>>> 
>>>>> --td 
> You can't get 2 slots on a machine, as it's limited by the core count to one 
> here, so such a slot allocation shouldn't occur at all.


So to clarify, the current -binding <binding_strategy>:<binding_amount> 
allocates binding_amount cores to each sge_shepherd process associated with a 
job_id.  There appears to be only one sge_shepherd process per job_id per 
execution node, so all child processes run on these allocated cores.  This is 
irrespective of the number of slots allocated to the node.  

I agree with Reuti that the binding_amount parameter should be a maximum number 
of bound cores per node, with the actual number determined by the number of 
slots allocated per node.  FWIW, an alternative approach might be to have 
another binding_type ('slot', say) that automatically allocated one core per 
slot.

Of course, a complex situation might arise if a user submits a combined 
MPI/multithreaded job, but then I guess we're into the realm of setting 
allocation_rule.

Is it going to be worth looking at creating a patch for this?  I don't know 
much of the internals of SGE -- would it be hard work to do?  I've not that 
much time to dedicate towards it, but I could put some effort in if necessary...

Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

Reply via email to