On 5/2/20 1:44 pm, Antony Cleave wrote:

Hi, from what you are describing it sounds like jobs are backfilling in front and stopping the large jobs from starting

We use a feature that SchedMD implemented for us called "bf_min_prio_reserve" which lets you set a priority threshold below which Slurm won't make a forward reservation for a job (and so can only start if it can start right now without delaying other jobs).

https://slurm.schedmd.com/slurm.conf.html#OPT_bf_min_prio_reserve

So if you can arrange your local priority system so that large jobs are over that threshold and smaller jobs are below it (or whatever suits your use case) then you should have a way to let these large jobs get a reliable start time without smaller jobs pushing them back in time.

There's some useful background from the bug where this was implemented:

https://bugs.schedmd.com/show_bug.cgi?id=2565

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Reply via email to