Hi Nathan,
Le 01/08/2016 à 13:46, Nathan Harper a écrit :
Hi, We are trying to get to the bottom of some TRES limits we have in place, to work out if it should be expected behaviour. We have two QoS configured, 'low' and 'normal'. Normal is the default QoS and applys limits at the association level. The low QoS has it's own TRES limits applied to it, and is preemptable by 'normal' eg: account engineering has GrpTRES=node=20 QoS low has GrpTRES=node=50 A user from engineering has a 15 node job running, leaving 5 nodes available in their association limit. He then submits a 50 node job using the low QoS, and the job also starts.
That's due to limits hierarchy: http://slurm.schedmd.com/resource_limits.html The QOS GrepTRES "overrides" the association GrpTRES.
He then submits a 5 node job using the normal QoS, however it is held for AssocGrpNodeLimit, despite there being 5 nodes available for engineering/normal.
If there are already the first 15 nodes + second 50 nodes jobs launched with the engineering account (and if the normal QOS hasn't any GrpTRES), the user reached the GrpTRES=node=20 of the association so the hold reason seems reasonable to me.
Best, Rémi