Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

gerard . gil Mon, 08 Aug 2022 05:25:03 -0700

Hello Miguel, 

Setting the limit to only one QOS works indeed but it prevents usage of several 
QOS for all users, and all the multi QOS possibilities.


I'm thinking about how I can manage with it and if it's possible to set up a 
workaround in our environment. 

Thanks for all your help. 

Cordialement, 
Gérard Gil 

Département Calcul Intensif 
Centre Informatique National de l'Enseignement Superieur 
950, rue de Saint Priest 
34097 Montpellier CEDEX 5 
FRANCE 

tel : (334) 67 14 14 14 
fax : (334) 67 52 37 63 
web : [ http://www.cines.fr/ | http://www.cines.fr ] 

> De: "Gérard Gil" <gerard....@cines.fr>
> À: "Miguel Oliveira" <miguel.olive...@uc.pt>
> Cc: "Slurm-users" <slurm-users@lists.schedmd.com>
> Envoyé: Vendredi 1 Juillet 2022 11:03:42
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

> Hi Miguel,

> As far as I understood GrpTRESMins=cpu=N(4227) seems not to be the limit of 
> the
> QOS unlike its name but the RawUsage of the QOS in mn instead of second as
> accounted in RawUsage.
> When I set QOS RawUsage to 0 GrpTRESMins=cpu is also set to 0.
> Each time a job as completed using this QOS RawUsage and GrpTRESMins=cpu are
> increased by the usage of this job (in mn or s).

> So I need to first try setting the limit on one QOS as you told me, then set 
> the
> same limit on all QOS and see how it handles all those limits with one 
> account.

> I think this should be the last test to understand the complete behavior of
> GrpTRESMins.

> I'll inform you on the result, but in a while because of holidays.

> Thanks a lot for all your help.

> Best,
> Gérard

>> De: "Miguel Oliveira" <miguel.olive...@uc.pt>
>> À: "Gérard Gil" <gerard....@cines.fr>
>> Cc: "Slurm-users" <slurm-users@lists.schedmd.com>
>> Envoyé: Jeudi 30 Juin 2022 21:33:46
>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>> Hi Gérard,

>> Let see if I understood this right. You have a user on the account dci and 
>> you
>> have put GrpTRESMins limit on this (cpu=4100).
>> From the output it looks like that is associated to the QoS toto.
>> However the limit put on the association and not on the QoS:

>>> GrpTRESMins=cpu=N(4227)

>> You need to remove the limit from the association and put it on the QoS.

>> Hope that helps,

>> MAO

>>> On 30 Jun 2022, at 19:12, [ mailto:gerard....@cines.fr | 
>>> gerard....@cines.fr ]
>>> wrote:

>>> Hi Miguel,

>>> I finally found the time to test the QOS NoDecay configuration vs 
>>> GrpTRESMins
>>> account limit.

>>> Here is my benchmark :

>>> 1) Initialize the benchmark configuration
>>> - reset all RawUsage (on QOS and account)
>>> - set a limit on Account GrpTRESMins
>>> - run several jobs with a controlled ellaps cpu time on a QOS.
>>> - reset account RawUsage
>>> - set a limit on Account GrpTRESMins under the QOS RawUsage

>>> Here is the inital state before running the benchmark

>>> toto@login1: ~/TEST$ sshare -A dci -u " " -o
>>> account,user,GrpTRESRaw%80,GrpTRESMins,rawusage
>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>> -------------------- ----------
>>> -----------------------------------------------------
>>> ------------------------------ -----------
>>> dci cpu=0 ,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0 
>>> cpu=4100 0

>>> Account RawUsage = 0
>>> GrpTRESMins cpu=4100

>>> toto@login1 :~/TEST$ scontrol -o show assoc_mgr | grep "^QOS" | grep support
>>> QOS=support(8) UsageRaw=253632 .000000 GrpJobs=N(0) GrpJobsAccrue=N(0)
>>> GrpSubmitJobs=N(0) GrpWall=N(132.10)
>>> GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)
>>> GrpTRESMins=cpu=N(4227)
>>> ,mem=N(7926000),energy=N(0),node=N(132),billing=N(4227),fs/disk=N(0),vmem=N(0),pages=N(0)
>>> GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)
>>> MaxWallPJ=1440 MaxTRESPJ=node=700 MaxTRESPN= MaxTRESMinsPJ= MinPrioThresh=
>>> MinTRESPJ= PreemptMode=OFF Priority=10 Account Limits= dci={MaxJobsPA=N(0)
>>> MaxJobsAccruePA=N(0) MaxSubmitJobsPA=N(0)
>>> MaxTRESPA=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}
>>> User Limits= 1145={MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0)
>>> MaxTRESPU=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}

>>> QOS support RawUsage = 253632 s or 4227 mn

>>> QOS support RawUsage > GrpTRESMins SLURM should prevent to start a job for 
>>> this
>>> account if it works as expected.

>>> 2) Run the benchmark to control limit GrpTRESMins efficiency over QOS 
>>> rawusage

>>> toto@login1:~/TEST$ sbatch TRESMIN.slurm
>>> Submitted batch job 3687

>>> toto@login1:~/TEST$ squeue
>>> JOBIDADMIN_COMMMIN_MEMOR SUBMIT_TIME PRIORITY PARTITION QOS USER STATE
>>> TIME_LIMIT TIME NODES REASON START_TIME
>>> 3687 BDW28 60000M 2022-06-30T19:36:42 1100000 bdw28 support toto RUNNING 
>>> 5:00
>>> 0:02 1 None 2022-06-30T19:36:42

>>> The job is running unless GrpTRESMins is under QOS support RawUsage .

>>> Is there anything wrong with my control process that invalidates the result 
>>> ?

>>> Thanks

>>> Gérard

>>> [ http://www.cines.fr/ ]

>>>> De: "gerard gil" < [ mailto:gerard....@cines.fr | gerard....@cines.fr ] >
>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>> slurm-users@lists.schedmd.com ] >
>>>> Envoyé: Mercredi 29 Juin 2022 19:13:56
>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>> Hi Miguel,

>>>>>If I understood you correctly your goal was to limit the number of minutes 
>>>>>each
>>>>>project can run. By associating each project to a slurm account with a 
>>>>>nodecay
>>>> >QoS then you will have achieved your goal.

>>>> Here is what I what to do :

>>>> "All jobs submitted to an account regardless the QOS they use have to be
>>>> constrained to a number of minutes set by the limit associated with that
>>>> account (and not to QOS)."

>>>> >Try a project with a very small limit and you will see that it won’t run

>>>> I already tested GrpTRESmins limit and confirms it works as expected.
>>>> Then I saw the decay effect on GrpTRESRaw (what I thought first as the 
>>>> right
>>>> metric to look at) and try to find out a way to fix it.

>>>> It's really very import for me to trust it, so I need a deterministic test 
>>>> to
>>>> prove it.

>>>> I'm testing this GrpTRESMins limit with NoDecay set on QOS resetting all
>>>> RawUsage (Account and QOS) to be sure it works as I expect.
>>>> I print the account GrpTRESRaw (in mn) at the end of my tests job to set a 
>>>> new
>>>> limits with GrpTRESMins and see how it behaves.

>>>> I'll get inform on the results. I hope it works.

>>>> > You don’t have to add anything.
>>>>>Each QoS will accumulate its respective usage, i.e, the usage of all users 
>>>>>on
>>>>>that account. Users can even be on different accounts (projects) and 
>>>>>charge the
>>>> >respective project with the parameter --account on sbatch.

>>>> If SLURM does it for to manage limit I would also like to obtain the 
>>>> current
>>>> RawUsage for an account.
>>>> Do you know how to get it ?

>>>> >The GrpTRESMins is always changed on the QoS with a command like:

>>>> >sacctmgr update qos where qos=... set GrpTRESMin=cpu=….

>>>> That's right if you want to set a limit to a QOS.
>>>> But I dont know/think the same limit value will also apply to all other 
>>>> QOS, and
>>>> if I apply the same limit to all QOS.
>>>> Is my account limit the sum of all the QOS limit ?

>>>> Actualy I'm setting the limit to the Account using command:

>>>> sacctmgr modify account myaccount set grptresmins=cpu=60000 qos=...

>>>> With this setting I saw the limit is set to the account and not to the QOS.
>>>> sacctmgr show QOS command shows an empty field for GrpTRESMins on all QOS

>>>> Thanks again form your help.
>>>> I hope I'm close to get the answer to my issue.

>>>> Best,
>>>> Gérard
>>>> [ http://www.cines.fr/ ]

>>>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | 
>>>>> miguel.olive...@uc.pt ]
>>>>> >
>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>>> slurm-users@lists.schedmd.com ] >
>>>>> Envoyé: Mercredi 29 Juin 2022 01:28:58
>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>>> Hi Gérard,

>>>>> If I understood you correctly your goal was to limit the number of 
>>>>> minutes each
>>>>> project can run. By associating each project to a slurm account with a 
>>>>> nodecay
>>>>> QoS then you will have achieved your goal.
>>>>> Try a project with a very small limit and you will see that it won’t run.

>>>>> You don’t have to add anything. Each QoS will accumulate its respective 
>>>>> usage,
>>>>> i.e, the usage of all users on that account. Users can even be on 
>>>>> different
>>>>> accounts (projects) and charge the respective project with the parameter
>>>>> --account on sbatch.
>>>>> The GrpTRESMins is always changed on the QoS with a command like:

>>>>> sacctmgr update qos where qos=... set GrpTRESMin=cpu=….

>>>>> Hope that makes sense!

>>>>> Best,

>>>>> MAO

>>>>>> On 28 Jun 2022, at 18:30, [ mailto:gerard....@cines.fr | 
>>>>>> gerard....@cines.fr ]
>>>>>> wrote:

>>>>>> Hi Miguel,

>>>>>> OK, I did'nt know this command.

>>>>>> I'm not sure to understand how it works regarding to my goal.
>>>>>> I use the following command inspired by the command you gave me and I 
>>>>>> obtain a
>>>>>> UsageRaw for each QOS.

>>>>>> scontrol -o show assoc_mgr -accounts=myaccount Users=" "

>>>>>> Do I have to sumup all QOS RawUsage to obtain the RawUsage of myaccount 
>>>>>> with
>>>>>> NoDecay ?
>>>>>> If I set GrpTRESMins for an Account and not for a QOS, does SLURM handle 
>>>>>> to
>>>>>> sumpup these QOS RawUsage to control if the GrpTRESMins account limit is 
>>>>>> reach
>>>>>> ?

>>>>>> Thanks again for your precious help.

>>>>>> Gérard
>>>>>> [ http://www.cines.fr/ ]

>>>>>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | 
>>>>>>> miguel.olive...@uc.pt ]
>>>>>>> >
>>>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>>>>> slurm-users@lists.schedmd.com ] >
>>>>>>> Envoyé: Mardi 28 Juin 2022 17:23:18
>>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>>>>> Hi Gérard,

>>>>>>> The way you are checking is against the association and as such it 
>>>>>>> ought to be
>>>>>>> decreasing in order to be used by fair share appropriately.
>>>>>>> The counter used that does not decrease is on the QoS, not the 
>>>>>>> association. You
>>>>>>> can check that with:

>>>>>>> scontrol -o show assoc_mgr | grep "^QOS='+account+’ ”

>>>>>>> That ought to give you two numbers. The first is the limit, or N for 
>>>>>>> not limit,
>>>>>>> and the second in parenthesis the usage.

>>>>>>> Hope that helps.

>>>>>>> Best,

>>>>>>> Miguel Afonso Oliveira

>>>>>>>> On 28 Jun 2022, at 08:58, [ mailto:gerard....@cines.fr | 
>>>>>>>> gerard....@cines.fr ]
>>>>>>>> wrote:

>>>>>>>> Hi Miguel,

>>>>>>>> I modified my test configuration to evaluate the effect of NoDecay.

>>>>>>>> I modified all QOS adding NoDecay Flag.

>>>>>>>> toto@login1:~/TEST$ sacctmgr show QOS
>>>>>>>> Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags 
>>>>>>>> UsageThres
>>>>>>>> UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit 
>>>>>>>> GrpWall MaxTRES
>>>>>>>> MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU 
>>>>>>>> MaxTRESPA
>>>>>>>> MaxJobsPA MaxSubmitPA MinTRES
>>>>>>>> ---------- ---------- ---------- ---------- ------------------- 
>>>>>>>> -----------
>>>>>>>> ---------------------------------------- ---------- ----------- 
>>>>>>>> -------------
>>>>>>>> ------------- ------------- ------- --------- ----------- -------------
>>>>>>>> -------------- ------------- ----------- ------------- --------- 
>>>>>>>> -----------
>>>>>>>> ------------- --------- ----------- -------------
>>>>>>>> normal 0 00:00:00 cluster NoDecay 1.000000
>>>>>>>> interactif 10 00:00:00 cluster NoDecay 1.000000 node=50 node=22 
>>>>>>>> 1-00:00:00
>>>>>>>> node=50
>>>>>>>> petit 4 00:00:00 cluster NoDecay 1.000000 node=1500 node=22 1-00:00:00 
>>>>>>>> node=300
>>>>>>>> gros 6 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00 
>>>>>>>> node=700
>>>>>>>> court 8 00:00:00 cluster NoDecay 1.000000 node=1100 node=100 02:00:00 
>>>>>>>> node=300
>>>>>>>> long 4 00:00:00 cluster NoDecay 1.000000 node=500 node=200 5-00:00:00 
>>>>>>>> node=200
>>>>>>>> special 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=2106 
>>>>>>>> 5-00:00:00
>>>>>>>> node=2106
>>>>>>>> support 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 
>>>>>>>> 1-00:00:00
>>>>>>>> node=2106
>>>>>>>> visu 10 00:00:00 cluster NoDecay 1.000000 node=4 node=700 06:00:00 
>>>>>>>> node=4

>>>>>>>> I submitted a bunch of jobs to control the NoDecay efficiency and I 
>>>>>>>> noticed
>>>>>>>> RawUsage as well as GrpTRESRaw cpu is still decreasing.

>>>>>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>>>>> GrpTRESMins ,RawUsage
>>>>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>>>>> -------------------- ----------
>>>>>>>> -----------------------------------------------------
>>>>>>>> ------------------------------ -----------
>>>>>>>> dci cpu=6932
>>>>>>>> ,mem=12998963,energy=0,node=216,billing=6932,fs/disk=0,vmem=0,pages=0 
>>>>>>>> cpu=17150
>>>>>>>> 415966
>>>>>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>>>>> GrpTRESMins , RawUsage
>>>>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>>>>> -------------------- ----------
>>>>>>>> -----------------------------------------------------
>>>>>>>> ------------------------------ -----------
>>>>>>>> dci cpu=6931
>>>>>>>> ,mem=12995835,energy=0,node=216,billing=6931,fs/disk=0,vmem=0,pages=0 
>>>>>>>> cpu=17150
>>>>>>>> 415866
>>>>>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o
>>>>>>>> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>>>>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>>>>> -------------------- ----------
>>>>>>>> -----------------------------------------------------
>>>>>>>> ------------------------------ -----------
>>>>>>>> dci cpu=6929
>>>>>>>> ,mem=12992708,energy=0,node=216,billing=6929,fs/disk=0,vmem=0,pages=0 
>>>>>>>> cpu=17150
>>>>>>>> 415766

>>>>>>>> Something I forgot to do ?

>>>>>>>> Best,
>>>>>>>> Gérard

>>>>>>>> Cordialement,
>>>>>>>> Gérard Gil

>>>>>>>> Département Calcul Intensif
>>>>>>>> Centre Informatique National de l'Enseignement Superieur
>>>>>>>> 950, rue de Saint Priest
>>>>>>>> 34097 Montpellier CEDEX 5
>>>>>>>> FRANCE

>>>>>>>> tel : (334) 67 14 14 14
>>>>>>>> fax : (334) 67 52 37 63
>>>>>>>> web : [ http://www.cines.fr/ | http://www.cines.fr ]

>>>>>>>>> De: "Gérard Gil" < [ mailto:gerard....@cines.fr | gerard....@cines.fr 
>>>>>>>>> ] >
>>>>>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>>>>>>> slurm-users@lists.schedmd.com ] >
>>>>>>>>> Cc: "slurm-users" < [ mailto:slurm-us...@schedmd.com | 
>>>>>>>>> slurm-us...@schedmd.com ]
>>>>>>>>> >
>>>>>>>>> Envoyé: Vendredi 24 Juin 2022 14:52:12
>>>>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>>>>>>> Hi Miguel,

>>>>>>>>> Good !!

>>>>>>>>> I'll try this options on all existing QOS and see if everything works 
>>>>>>>>> as
>>>>>>>>> expected.
>>>>>>>>> I'll inform you on the results.

>>>>>>>>> Thanks a lot

>>>>>>>>> Best,
>>>>>>>>> Gérard

>>>>>>>>> ----- Mail original -----

>>>>>>>>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | 
>>>>>>>>>> miguel.olive...@uc.pt ]
>>>>>>>>>> >
>>>>>>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>>>>>>>> slurm-users@lists.schedmd.com ] >
>>>>>>>>>> Cc: "slurm-users" < [ mailto:slurm-us...@schedmd.com | 
>>>>>>>>>> slurm-us...@schedmd.com ]
>>>>>>>>>> >
>>>>>>>>>> Envoyé: Vendredi 24 Juin 2022 14:07:16
>>>>>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>>>>>>>>>> Hi Gérard,

>>>>>>>>>> I believe so. All our accounts correspond to one project and all 
>>>>>>>>>> have an
>>>>>>>>>> associated QoS with NoDecay and DenyOnLimit. This is enough to 
>>>>>>>>>> restrict usage
>>>>>>>>>> on each individual project.
>>>>>>>>>> You only need these flags on the QoS. The association will carry on 
>>>>>>>>>> as usual and
>>>>>>>>>> fairshare will not be impacted.

>>>>>>>>>> Hope that helps,

>>>>>>>>>> Miguel Oliveira

>>>>>>>>>>> On 24 Jun 2022, at 12:56, [ mailto:gerard....@cines.fr | 
>>>>>>>>>>> gerard....@cines.fr ]
>>>>>>>>>>> wrote:

>>>>>>>>>>> Hi Miguel,

>>>>>>>>>>>> Why not? You can have multiple QoSs and you have other techniques 
>>>>>>>>>>>> to change
>>>>>>>>>>>> priorities according to your policies.
>>>>>>>>>>> Is this answer my question ?

>>>>>>>>>>> "If all configured QOS use NoDecay, we can take advantage of the 
>>>>>>>>>>> FairShare
>>>>>>>>>>> priority with Decay and all jobs GrpTRESRaw with NoDecay ?"

>>>>>>>>>>> Thanks

>>>>>>>>>>> Best,
>>>>>>>>> > > Gérard

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

Reply via email to