Hi Loris, Slurm main scheduler algorithm is just based on priority. It starts from the highest priority job and keep trying through the waiting queue as long as the job can be executed. Once a job can not be executed the algorithm finishes. If you have more than one partition, the scheduler will try then just for jobs launched to the other partitions. And so on.
If you have configured backfilling jobs have another chance to be scheduled. In this algorithm the priority queue is traversed and most priority jobs than can not be executed got some sort of reservations for those resources needed. But the algorithm keeps going. Another lower priority job could be executed if there are resources available and its execution does not impact on previous higher priority jobs. For this the jobs timelimit is critical. A job with a short timelimit could easily use "holes" created by the scheduler. So in your case I dare to say (assuming you have backfilling configured) that jobs from user C can not be executed due to the long timelimit. However, there are other parameters worth to take a look at. Backfilling algorithm is really time consuming so it does not traverse the full queue. This is configurable so maybe you could tune this value for your system. Another interesting parameter is to assign the flag no_reserve to a specific qos. It does mean jobs from that qos will not get a reservation when the backfilling algorithm process them. This is an important decission to take as it could lead to lower priority jobs overtaking higher priority ones so use it with care. We are using this flag in a system with peaks of tenths of thousand jobs. I hope this can help you. On 02/05/2014 11:42 AM, Loris Bennett wrote: > "Loris Bennett" <loris.benn...@fu-berlin.de> > writes: > > >> We do already use weighting, but my understanding was that this would >> only affect the order in which resources are assigned and not prevent a >> job from starting even when resources are available. >> >> I assume that there is some valid reason for a job waiting, but it is >> not apparent to me. I guess it would be helpful if it were possible to >> see exactly what resources a job is waiting for, but I haven't come >> across a way to do that. >> > The situation of jobs not starting despite resources being available > has occurred again. > > - User A with the highest priority jobs has reached her running job > limit, so no more of her jobs can start. Her jobs have a time limit > of 2 days. > > - User B with the next highest priority jobs needs more memory than is > available on the free node, so his job can't start there. His jobs > have a time limit of 3 days. > > - User C is next in line and needs all the CPUs of the node, but very > little memory. It seems that his job should start, but it doesn't. > His jobs have a time limit of 3 days. > > Should User B's job prevent User C's job from starting? Or is it > because User C's time limit is greater than that of User A? I can sort > of see why a lower priority job with a long run-time maybe shouldn't > start before a higher priority, short run-time job which is being held > back due to the running job limit, but is this really what is going on? > > Regards > > Loris > > WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer