Hello,

Is there a way for users to specify their own concurrent job limits in
slurm?

We use slurm on a fairly small cluster with only tens of users who
mostly know each other. So an "honor system", although not perfect
mostly works and often better reflects the social demands of the
department than a rigid enforced algorithm.

We also often care quite strongly about latency till the jobs start,
which is why we want to limit the concurrency to ensure there are free
slots available on the cluster for newly submitted jobs to rapidly
start, even at the expense of overall throughput. In fact we would like
to limit concurrency at an even finer scale than the user. E.g. a single
user might start many jobs for project A and then later start more jobs
for project B wishing for jobs of project B to start before all of the
jobs of Project A have completed.

So ideally, a user would be able to create the equivalent of their own
(temporary) "sub-QoS" into which they can submit jobs and for which they
can specify additional constraints such as the maximum concurrent jobs
that can run within that group. It would of cause also respect the
administratively set limits of the parent qos for when the honor system
does not work.


At the moment we use explicit nodelists as a substitute for a user
created QoS to limit concurrency for sets of jobs to the number of cores
those nodes have. But that is very inflexible and if the scheduler has
already scheduled other jobs on those nodes the user gets less
concurrency than desired while other nodes might sit idle.


Thank you for any suggestions of how best to achieve this.

Kai

Reply via email to