On 13 December 2011 23:46, Gowtham <g...@mtu.edu> wrote:
>
> In some of our Rocks 5.4.2 clusters running SGE
> 6.2u5, I have been noticing the load average on
> several compute nodes being significantly higher
> than others when all cores/processors in all
> compute nodes involved are doing about the same
> amount of work.
>
> When I run
>
>   qconf -sq all.q
>
> I see
>
>   load_thresholds  np_load_avg=1.75
>
>
> Reading some documentation, I learned that
> np_load_avg=1.75 means a given node can
> support up to NCORES * 1.75 load average
> before SGE stops assigning newer jobs
> (correct me if I'm wrong please).
>
> If possible, I'd love to maintain the
> load avg to be approximately equal to
> the number of cores available on given
> compute node. In other words, SGE should
> not assign any more jobs to a given
> compute node with NCORES processor if
>
>   load average is greater than/equal to
>   NCORES * 1.10, irrespective of whether
>   all NCORES are in user or not
>
> How would I go about achieving it? Is it
> as simple as chaning 'np_load_avg' to 1.10?
> Am I missing something?
>
That should do it.  One thing to watch out for is that under Linux
load average includes some processes in uninterruptible sleep as well
as the run queue.  Your node can therefore have a high load average
even with idle processors because it can include things waiting on
disk or other hardware.

William

> Thanks for your time and help.
>
> Best,
> g
>
> --
> Gowtham
> Information Technology Services
> Michigan Technological University
>
> (906) 487/3593
> http://www.it.mtu.edu/
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to