In some of our Rocks 5.4.2 clusters running SGE
6.2u5, I have been noticing the load average on
several compute nodes being significantly higher
than others when all cores/processors in all
compute nodes involved are doing about the same
amount of work.

When I run

  qconf -sq all.q

I see

  load_thresholds  np_load_avg=1.75


Reading some documentation, I learned that
np_load_avg=1.75 means a given node can
support up to NCORES * 1.75 load average
before SGE stops assigning newer jobs
(correct me if I'm wrong please).

If possible, I'd love to maintain the
load avg to be approximately equal to
the number of cores available on given
compute node. In other words, SGE should
not assign any more jobs to a given
compute node with NCORES processor if

  load average is greater than/equal to
  NCORES * 1.10, irrespective of whether
  all NCORES are in user or not

How would I go about achieving it? Is it
as simple as chaning 'np_load_avg' to 1.10?
Am I missing something?

Thanks for your time and help.

Best,
g

--
Gowtham
Information Technology Services
Michigan Technological University

(906) 487/3593
http://www.it.mtu.edu/

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to