[slurm-users] Larger jobs tend to get starved out on our cluster

Baker D . J . Wed, 09 Jan 2019 08:43:57 -0800

Hello,

A colleague intimated that he thought that larger jobs were tending to get 
starved out on our slurm cluster. It's not a busy time at the moment so it's 
difficult to test this properly. Back in November it was not completely unusual 
for a larger job to have to wait up to a week to start.


I've extracted the key scheduling configuration out of the slurm.conf and I 
would appreciate your comments, please. Even at the busiest of times we notice 
many single compute jobs executing on the cluster -- starting either via the 
scheduler or by backfill.

Looking at the scheduling configuration do you think that I'm favouring small 
jobs too much? That is, for example, should I increase the 
PriorityWeightJobSize to encourage larger jobs to run?

I was very keen not to starve out small/medium jobs, however perhaps there is 
too much emphasis on small/medium jobs in our setup.

My colleague is from a Moab background, and in that respect he was surprised 
not to see nodes being reserved for jobs, but it could be that Slurm works in a 
different way to try to make efficient use of the cluster by backfilling more 
aggressively than Moab. Certainly we see a great deal of activity from backfill.

In this respect does anyone understand the mechanism used to reserve 
nodes/resources for jobs in slurm or potentially where to look for that type of 
information.

Best regards,
David

SchedulerType=sched/backfill
SchedulerParameters=bf_window=3600,bf_resolution=180,bf_max_job_user=4

SelectType=select/cons_res
SelectTypeParameters=CR_Core
FastSchedule=1
PriorityFavorSmall=NO
PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME,FAIR_TREE
PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0

PriorityWeightFairshare=1000000
PriorityWeightAge=100000
PriorityWeightPartition=0
PriorityWeightJobSize=100000
PriorityWeightQOS=10000
PriorityMaxAge=7-0

[slurm-users] Larger jobs tend to get starved out on our cluster

Reply via email to