Hello,

What I’m looking for is a way for a node to continue to be in the same 
partition, and have the same QoS(es), but only be chosen if a particular 
capability is being asked for. This is because we are rolling something (OS 
upgrade) out slowly to a small batch of nodes at first, and then more and more 
over time, and do not want to interrupt users’ workflows: we want them to 
default the ‘current’ nodes and only land on the ‘special’ ones if requested. 
(At a certain point the ‘special’ ones will become the majority and we’d swap 
the behaviour.)

Slurm has the well-known feature item that can be put on a node(s):

> A comma-delimited list of arbitrary strings indicative of some characteristic 
> associated with the node. There is no value or count associated with a 
> feature at this time, a node either has a feature or it does not. A desired 
> feature may contain a numeric component indicating, for example, processor 
> speed but this numeric component will be considered to be part of the feature 
> string. Features are intended to be used to filter nodes eligible to run jobs 
> via the --constraintargument. By default a node has no features. Also see 
> Gres for being able to have more control such as types and count. Using 
> features is faster than scheduling against GRES but is limited to Boolean 
> operations.


        https://slurm.schedmd.com/slurm.conf.html#OPT_Features

So if there are (a bunch of) partitions, and nodes with-in those partitions, a 
job can be submitted to a partition and it can be run any any available node, 
or even be requested to run a particular node (--nodelist). With the above (and 
--constraint / --prefer), a particular sub-set of node(s) can be requested. But 
(AIUI) that sub-set is also available generally to everyone, even if a 
particular feature is not requested.

Is there a way to tell Slurm to not schedule a job on a node UNLESS a flag or 
option is set? Or is it necessary to set up new partition(s) or QoS(es)? I see 
that AllowAccounts (and AllowGroups) is applicable only to Partitions, and not 
(AFAICT) on a per node basis. 

We’re currently on 22.05.x, but upgrading is fine.

Regards,
David

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to