Hello everyone, I have setup slurm (19.05) to use suspension to be able to pre-empt jobs in a lower priority queue by a higher priority queue. However, jobs aren't resuming as I would expect. If a higher priority job completes and frees up resources for the lower priority jobs, often the suspended jobs don't resume, and instead another pending job in the lower queue gets launched. Furthermore, even when resources are actually available the suspended jobs do not resume.
My best guess of what is currently happening is that this is an interaction with task/core affinity. I use the cgroup task plugin to constrain that a job can not use more cores than it requests. However I suspect that the suspended job when launched was e.g. bound to core 1. Now in the above scenarios, presumably the job that pre-empted the suspended job was then bound to core 1. When e.g. core 2 becomes available the suspended job cannot be placed on that core, as it is already bound to core 1, and thus core 2 remains idle even though there is a job waiting. As for my applications, I do care about constraining the total number of cores a job uses to what they requested (to make sure you don't accidentally consume more resources than requested and thus effecting other jobs on the node), but I don't care which core they run on. (These are typically single thread/core jobs, so they don't need appropriate core placements), I was wondering a) if my interpretation of what is happening in scheduling is correct? c) can task/core affinity be reset on suspend/resume? To make sure that a suspended job can resume on any of the available cores on the same node? Can one use cgroups to constrain cores without core affinity? Thank you, Kai Krueger
pEpkey.asc
Description: application/pgp-keys