Hi,

Am 06.05.2014 um 18:45 schrieb Roberto Nunnari:

> I'm running a small cluster using Oracle Grid Engine 6.2u7
> 
> At times it happens that one user submits a job that requires several 
> resources (-pe, -l mem_free, etc).
> 
> For instance, user A submits a job X requiring 32 slots out of 100 available.
> The other users, keeps submitting serial jobs filling up all the slots and 
> always having more jobs waiting on the queue.
> 
> The serial jobs will get ahead of job X, and be scheduled as soon as one slot 
> is available and job X will be waiting in the queue forever and never get to 
> run until no more serial jobs will be submitted and 32 slots will be 
> available.
> 
> I would like the scheduler to also consider how much the job has been waiting 
> in the queue, and possibly also the values regarding the historic users 
> resources usage, as returned by qacct -o username
> 
> What are the possible solutions to solve this problem?

You can also look into Resource Reservation, so that the parallel job collects 
the necessary resources while waiting:

http://www.gridengine.info/2006/05/31/resource-reservation-prevents-parallel-job-starvation/

For certain jobs it could also be added by a JSV:

http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_avoiding_starvation_resource_reservation_backfilling_a_id_rr_a

-- Reuti

NB: Either supply a feasible value for h_rt for all jobs or set a sensible 
default not being INFINITY for default_duration in the scheduler configuration, 
as otherwise the scheduler can't compute any backfilling/resource reservation 
as it judges INFINITY being smaller than INFINITY and new serial jobs might 
still slip in.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to