In some of our Rocks 5.4.2 clusters running SGE 6.2u5, I have been noticing the load average on several compute nodes being significantly higher than others when all cores/processors in all compute nodes involved are doing about the same amount of work.
When I run qconf -sq all.q I see load_thresholds np_load_avg=1.75 Reading some documentation, I learned that np_load_avg=1.75 means a given node can support up to NCORES * 1.75 load average before SGE stops assigning newer jobs (correct me if I'm wrong please). If possible, I'd love to maintain the load avg to be approximately equal to the number of cores available on given compute node. In other words, SGE should not assign any more jobs to a given compute node with NCORES processor if load average is greater than/equal to NCORES * 1.10, irrespective of whether all NCORES are in user or not How would I go about achieving it? Is it as simple as chaning 'np_load_avg' to 1.10? Am I missing something? Thanks for your time and help. Best, g -- Gowtham Information Technology Services Michigan Technological University (906) 487/3593 http://www.it.mtu.edu/ _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users