If you decide to go the single partition model, you can use the
"Weight" parameter in slurm.conf to cause the standard nodes
to be preferentially used to the high-mem and GPU nodes.  So jobs
only end up on high-mem or GPU nodes if they requested a lot of memory or a GPU, or if the cluster is very busy.

On Wed, 1 Feb 2017, Loris Bennett wrote:

Hi David,

Baker D.J. <d.j.ba...@soton.ac.uk> writes:

Hello,

This is hopefully a very simple set of questions for someone. I?m evaluating
slurm with a view to replacing our existing torque/moab system, and I?ve been
reading about defining partitions and QoSs. I like the idea of being able to use
a QoS to throttle user activity -- for example to set maxcpus/user, maxjobs/user
and maxnodes/user, etc, etc. Also I?m going to define a very simple set of
partitions to reflect the different types of nodes in the cluster. For example

Batch ? normal compute nodes

Highmem ? high memory nodes

Gpu ? gpu nodes

We have a similar range of hardware, albeit with three different
categories of memory, but we decided against setting these up as
separate partitions.  The disadvantage is that small memory jobs can
potentially clog up the large memory nodes; the advantage is that small
memory jobs can use the large memory nodes if they would otherwise be
empty.

So presumably it makes sense to associate the ?normal? QOS with the batch queue
and define throttling limits as needs. Then define corresponding QoSs for the
highmem and gpu partitions. In this respect do the QOS definitions override any
definitions on the PartitionName line? For example does QOS Maxwall override
MaxTime?

The hierarchy of the limits is given here:

https://slurm.schedmd.com/resource_limits.html

However, unless you have specific needs, having limits defined on both
the partitions and QOS might be overkill.  If, as you say later, you
have a heterogeneous job mix, you probably also have a heterogeneous
user base, some of whom might find the setup confusing.  For that
reason, I would start with a fairly simple configuration and only add to
that as the need arises.

Also I suspect I?ll need to define a test queue with a high level of throttling
to enable users to get a limited number of small test jobs through the system
quickly. In this respect does it make sense for my batch and test partitions to
overlap either partially or completely? At any one time the test partition will
only take a few resources out of the pool of normal compute nodes?

We originally had a separate test partition, but have now moved to a
'short' QOS on the main batch partition which increases the priority for
a limited number of jobs with a short maximum run-time.  If you have
overlapping batch and test partitions, the batch jobs can clog the test
nodes, although you could have different priorities for each partition.

Another issue is that we do have a large mix of small and large jobs. In our
torque/moab cluster we make use of the XFACTOR component to make sure that small
jobs don?t get starved out of the system. I don?t think there is an analog of
this parameter in slurm, and so I need to understand how to enable smaller jobs
to compete with the larger jobs and not get starved out. Using slurm I
understand that the backfill mechanism and priority flags like
PriorityFavorSmall=NO and SMALL_RELATIVE_TO_TIME can help the situation. What
are your thoughts?

We also have a very heterogeneous job mix, but don't have any problem
with small jobs starving.  On the contrary, as we share nodes, small
jobs with moderate memory requirements have an advantage, as there are
always a few cores available somewhere in the cluster, even when it is
quite full.  For this reason we favour large jobs slighty.

Your advice on the above points would be appreciated, please.

Best regards,

David

Cheers,

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universit?t Berlin         Email loris.benn...@fu-berlin.de

Tom Payerle
IT-ETI-EUS                              paye...@umd.edu
4254 Stadium Dr                         (301) 405-6135
University of Maryland
College Park, MD 20742-4111

Reply via email to