[gridengine users] SGE PE scheduler problem, doesn't pick least used nodes ?

Alex Phillips Wed, 16 Mar 2011 03:34:45 -0700

Dear List,

We have a cluster of 1920 cores spread over 160 nodes (12 cores/node),we only run one code in one queue, with jobs of between 48 and 256 coresusing an mpi pe.When benchmarking our code we found a 14-15% speedup by running on 6cores/node, compared with 12 cores/node.We also found that if we ran on 6 cores/node, with a second job on theother 6cores/node, we still have a 5-6% speedup.So I have configured our mpi pe with allocation_rule = 6, and thisworks, however, as the cluster fills up, the scheduler is starting asecond job on some nodes, before all the nodes are busy.How can we configure the scheduler to run one job on all the nodes,before starting a second job ?I have tried defining the number of slots as a complex value on theexecution hosts, I’ve tried –np_load_avg, np_load_avg, slots, and -slotsas the load_formula, but I can’t get it to work.I’ve read_http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least_but I can’t set the allocation rule to $pe_slots, as we only want to runon 6 cores/node, not 12.

Any suggestions ?
Regards,
*Alex Phillips*
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] SGE PE scheduler problem, doesn't pick least used nodes ?

Reply via email to