Hi, Am 06.05.2014 um 18:45 schrieb Roberto Nunnari:
> I'm running a small cluster using Oracle Grid Engine 6.2u7 > > At times it happens that one user submits a job that requires several > resources (-pe, -l mem_free, etc). > > For instance, user A submits a job X requiring 32 slots out of 100 available. > The other users, keeps submitting serial jobs filling up all the slots and > always having more jobs waiting on the queue. > > The serial jobs will get ahead of job X, and be scheduled as soon as one slot > is available and job X will be waiting in the queue forever and never get to > run until no more serial jobs will be submitted and 32 slots will be > available. > > I would like the scheduler to also consider how much the job has been waiting > in the queue, and possibly also the values regarding the historic users > resources usage, as returned by qacct -o username > > What are the possible solutions to solve this problem? You can also look into Resource Reservation, so that the parallel job collects the necessary resources while waiting: http://www.gridengine.info/2006/05/31/resource-reservation-prevents-parallel-job-starvation/ For certain jobs it could also be added by a JSV: http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_avoiding_starvation_resource_reservation_backfilling_a_id_rr_a -- Reuti NB: Either supply a feasible value for h_rt for all jobs or set a sensible default not being INFINITY for default_duration in the scheduler configuration, as otherwise the scheduler can't compute any backfilling/resource reservation as it judges INFINITY being smaller than INFINITY and new serial jobs might still slip in. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
