Dear List,
We have a cluster of 1920 cores spread over 160 nodes (12 cores/node), we only run one code in one queue, with jobs of between 48 and 256 cores using an mpi pe. When benchmarking our code we found a 14-15% speedup by running on 6 cores/node, compared with 12 cores/node. We also found that if we ran on 6 cores/node, with a second job on the other 6cores/node, we still have a 5-6% speedup. So I have configured our mpi pe with allocation_rule = 6, and this works, however, as the cluster fills up, the scheduler is starting a second job on some nodes, before all the nodes are busy. How can we configure the scheduler to run one job on all the nodes, before starting a second job ? I have tried defining the number of slots as a complex value on the execution hosts, I’ve tried –np_load_avg, np_load_avg, slots, and -slots as the load_formula, but I can’t get it to work. I’ve read _http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least_ but I can’t set the allocation rule to $pe_slots, as we only want to run on 6 cores/node, not 12.
Any suggestions ?
Regards,
*Alex Phillips*
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to