Am 14.12.2011 um 10:19 schrieb William Hay:

> On 13 December 2011 23:46, Gowtham <g...@mtu.edu> wrote:
>> 
>> In some of our Rocks 5.4.2 clusters running SGE
>> 6.2u5, I have been noticing the load average on
>> several compute nodes being significantly higher
>> than others when all cores/processors in all
>> compute nodes involved are doing about the same
>> amount of work.
>> 
>> When I run
>> 
>>   qconf -sq all.q
>> 
>> I see
>> 
>>   load_thresholds  np_load_avg=1.75
>> 
>> 
>> Reading some documentation, I learned that
>> np_load_avg=1.75 means a given node can
>> support up to NCORES * 1.75 load average
>> before SGE stops assigning newer jobs
>> (correct me if I'm wrong please).
>> 
>> If possible, I'd love to maintain the
>> load avg to be approximately equal to
>> the number of cores available on given
>> compute node. In other words, SGE should
>> not assign any more jobs to a given
>> compute node with NCORES processor if
>> 
>>   load average is greater than/equal to
>>   NCORES * 1.10, irrespective of whether
>>   all NCORES are in user or not
>> 
>> How would I go about achieving it? Is it
>> as simple as chaning 'np_load_avg' to 1.10?
>> Am I missing something?
>> 
> That should do it.  One thing to watch out for is that under Linux
> load average includes some processes in uninterruptible sleep as well
> as the run queue.  Your node can therefore have a high load average
> even with idle processors because it can include things waiting on
> disk or other hardware.

Yep, I like to point to one of my former posts:

http://comments.gmane.org/gmane.comp.clustering.gridengine.users/21620

What about using a load sensor checking the SGE complex "cpu" instead (it runs 
from 0 to 100) and triggering it at 95 or higher?

OTOH: Having slots = cores would make the load_thresholds setting superfluous.

-- Reuti


> William
> 
>> Thanks for your time and help.
>> 
>> Best,
>> g
>> 
>> --
>> Gowtham
>> Information Technology Services
>> Michigan Technological University
>> 
>> (906) 487/3593
>> http://www.it.mtu.edu/
>> 
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to