Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

Miguel Oliveira Thu, 30 Jun 2022 12:35:46 -0700

Hi Gérard,

Let see if I understood this right. You have a user on the account dci and you 
have put GrpTRESMins limit on this (cpu=4100).
From the output it looks like that is associated to the QoS toto.
However the limit put on the association and not on the QoS:


> GrpTRESMins=cpu=N(4227)

You need to remove the limit from the association and put it on the QoS.

Hope that helps,

MAO


> On 30 Jun 2022, at 19:12, gerard....@cines.fr wrote:
> 
> Hi Miguel,
> 
> I finally found the time to test the QOS NoDecay configuration vs GrpTRESMins 
> account limit.
> 
> Here is my benchmark :
> 
> 
> 1) Initialize the benchmark configuration
>    - reset all RawUsage (on QOS and account)
>    - set a limit on Account GrpTRESMins
>    - run several jobs with a controlled ellaps cpu time on a QOS.
>    - reset account RawUsage 
>    - set a limit on Account GrpTRESMins under the QOS RawUsage
> 
> Here is the inital state before running the benchmark
> 
> toto@login1:~/TEST$ sshare -A dci -u " " -o 
> account,user,GrpTRESRaw%80,GrpTRESMins,rawusage
>              Account       User                                               
>                         GrpTRESRaw                    GrpTRESMins    RawUsage 
> -------------------- ----------                            
> ----------------------------------------------------- 
> ------------------------------ ----------- 
> dci                                               
> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0                
>        cpu=4100           0
> 
> 
> 
> Account            RawUsage = 0 
> GrpTRESMins    cpu=4100
> 
> 
> 
> toto@login1:~/TEST$ scontrol -o show assoc_mgr | grep "^QOS" | grep support
> QOS=support(8) UsageRaw=253632.000000 GrpJobs=N(0) GrpJobsAccrue=N(0) 
> GrpSubmitJobs=N(0) GrpWall=N(132.10) 
> GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)
>  
> GrpTRESMins=cpu=N(4227),mem=N(7926000),energy=N(0),node=N(132),billing=N(4227),fs/disk=N(0),vmem=N(0),pages=N(0)
>  
> GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)
>  MaxWallPJ=1440 MaxTRESPJ=node=700 MaxTRESPN= MaxTRESMinsPJ= MinPrioThresh=  
> MinTRESPJ= PreemptMode=OFF Priority=10 Account Limits= dci={MaxJobsPA=N(0) 
> MaxJobsAccruePA=N(0) MaxSubmitJobsPA=N(0) 
> MaxTRESPA=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}
>  User Limits= 1145={MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0) 
> MaxTRESPU=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}
> 
> QOS support RawUsage = 253632 s or 4227 mn
> 
> 
> QOS support RawUsage  > GrpTRESMins      SLURM should prevent to start a job 
> for this account if it works as expected.
> 
> 
> 
> 2) Run the benchmark to control limit GrpTRESMins efficiency over QOS 
> rawusage 
> 
> 
> toto@login1:~/TEST$ sbatch TRESMIN.slurm 
> Submitted batch job 3687
> 
> 
> toto@login1:~/TEST$ squeue
>              JOBIDADMIN_COMMMIN_MEMOR         SUBMIT_TIME  PRIORITY PARTITION 
>       QOS        USER      STATE  TIME_LIMIT      TIME NODES         REASON   
>          START_TIME
>               3687     BDW28   60000M 2022-06-30T19:36:42   1100000     bdw28 
>   support       toto    RUNNING        5:00      0:02     1           None   
> 2022-06-30T19:36:42
> 
> 
> The job is running unless GrpTRESMins is under QOS support RawUsage .
> 
> 
> 
> Is there anything wrong with my control process that invalidates the result ?
> 
> 
> Thanks
> 
> Gérard 
> 
>  <http://www.cines.fr/>
> 
> De: "gerard gil" <gerard....@cines.fr>
> À: "Slurm-users" <slurm-users@lists.schedmd.com>
> Envoyé: Mercredi 29 Juin 2022 19:13:56
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
> Hi Miguel,
> 
> >If I understood you correctly your goal was to limit the number of minutes 
> >each project can run. By associating each project to a slurm account with a 
> >nodecay QoS then you will have achieved your goal.
> 
> Here is what I what to do :
> 
> "All jobs submitted to an account regardless the QOS they use have to be 
> constrained to a number of minutes set by the limit associated with that 
> account (and not to QOS)." 
> 
> 
> >Try a project with a very small limit and you will see that it won’t run
> 
> I already tested GrpTRESmins limit and confirms it works as expected.
> Then I saw the decay effect on GrpTRESRaw (what I thought first as the right 
> metric to look at) and try to find out a way to fix it.
> 
> It's really very import for me to trust it, so I need a deterministic test to 
> prove it.
> 
> I'm testing this GrpTRESMins limit with NoDecay set on QOS resetting all 
> RawUsage (Account and QOS) to be sure it works as I expect. 
> I print the account GrpTRESRaw (in mn) at the end of my tests job to set a 
> new limits with GrpTRESMins and see how it behaves.
> 
> I'll get inform on the results. I hope it works. 
> 
> 
> > You don’t have to add anything.
> >Each QoS will accumulate its respective usage, i.e, the usage of all users 
> >on that account. Users can even be on different accounts (projects) and 
> >charge the respective project with the parameter --account on sbatch.
> 
> If SLURM does it for to manage limit I would also like to obtain the current 
> RawUsage for an account.
> Do you know how to get it ?
> 
> 
> 
> >The GrpTRESMins is always changed on the QoS with a command like:
> >
> >sacctmgr update qos where qos=... set GrpTRESMin=cpu=….
> 
> That's right if you want to set a limit to a QOS.
> But I dont know/think the same limit value will also apply to all other QOS, 
> and if I apply the same limit to all QOS.
> Is my account limit the sum of all the QOS limit ?
> 
> 
> Actualy I'm setting the limit to the Account using command:
> 
> sacctmgr modify account myaccount set grptresmins=cpu=60000 qos=...
> 
> With this setting I saw the limit is set to the account and not to the QOS. 
> sacctmgr show QOS     command shows an empty field for GrpTRESMins on all QOS
> 
> 
> Thanks again form your help.
> I hope I'm close to get the answer to my issue.
> 
> Best,
> Gérard 
>  <http://www.cines.fr/>
> 
> De: "Miguel Oliveira" <miguel.olive...@uc.pt>
> À: "Slurm-users" <slurm-users@lists.schedmd.com>
> Envoyé: Mercredi 29 Juin 2022 01:28:58
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
> Hi Gérard,
> 
> If I understood you correctly your goal was to limit the number of minutes 
> each project can run. By associating each project to a slurm account with a 
> nodecay QoS then you will have achieved your goal.
> Try a project with a very small limit and you will see that it won’t run.
> 
> You don’t have to add anything. Each QoS will accumulate its respective 
> usage, i.e, the usage of all users on that account. Users can even be on 
> different accounts (projects) and charge the respective project with the 
> parameter --account on sbatch.
> The GrpTRESMins is always changed on the QoS with a command like:
> 
> sacctmgr update qos where qos=... set GrpTRESMin=cpu=….
> 
> Hope that makes sense!
> 
> Best,
> 
> MAO
> 
> On 28 Jun 2022, at 18:30, gerard....@cines.fr <mailto:gerard....@cines.fr> 
> wrote:
> 
> Hi Miguel,
> 
> OK, I did'nt know this command.
> 
> I'm not sure to understand how it works regarding to my goal.
> I use the following command inspired by the command you gave me and I obtain 
> a UsageRaw for each QOS. 
> 
> scontrol -o show assoc_mgr -accounts=myaccount Users=" "
> 
> 
> Do I have to sumup all QOS RawUsage to obtain the RawUsage of myaccount with 
> NoDecay ?
> If I set GrpTRESMins for an Account and not for a QOS, does SLURM handle to 
> sumpup these QOS RawUsage to control if the GrpTRESMins account limit is 
> reach ?
> 
> Thanks again for your precious help.
> 
> Gérard 
>  <http://www.cines.fr/>
> 
> De: "Miguel Oliveira" <miguel.olive...@uc.pt <mailto:miguel.olive...@uc.pt>>
> À: "Slurm-users" <slurm-users@lists.schedmd.com 
> <mailto:slurm-users@lists.schedmd.com>>
> Envoyé: Mardi 28 Juin 2022 17:23:18
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
> Hi Gérard,
> 
> The way you are checking is against the association and as such it ought to 
> be decreasing in order to be used by fair share appropriately.
> The counter used that does not decrease is on the QoS, not the association. 
> You can check that with:
> 
> scontrol -o show assoc_mgr | grep "^QOS='+account+’”
> 
> That ought to give you two numbers. The first is the limit, or N for not 
> limit, and the second in parenthesis the usage.
> 
> Hope that helps.
> 
> Best,
> 
> Miguel Afonso Oliveira
> 
> On 28 Jun 2022, at 08:58, gerard....@cines.fr <mailto:gerard....@cines.fr> 
> wrote:
> 
> Hi Miguel,
> 
> 
> I modified my test configuration to evaluate the effect of NoDecay.
> 
> 
> 
> 
> I modified all QOS adding NoDecay Flag.
> 
> 
> toto@login1:~/TEST$ sacctmgr show QOS
>       Name   Priority  GraceTime    Preempt   PreemptExemptTime PreemptMode   
>                                  Flags UsageThres UsageFactor       GrpTRES   
> GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES 
> MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU  
>    MaxTRESPA MaxJobsPA MaxSubmitPA       MinTRES 
> ---------- ---------- ---------- ---------- ------------------- ----------- 
> ---------------------------------------- ---------- ----------- ------------- 
> ------------- ------------- ------- --------- ----------- ------------- 
> -------------- ------------- ----------- ------------- --------- ----------- 
> ------------- --------- ----------- ------------- 
>     normal          0   00:00:00                                    cluster   
>                                NoDecay               1.000000                 
>                                                                               
>                                                                               
>                                          
> interactif         10   00:00:00                                    cluster   
>                                NoDecay               1.000000       node=50   
>                                                               node=22         
>                       1-00:00:00       node=50                                
>                                          
>      petit          4   00:00:00                                    cluster   
>                                NoDecay               1.000000     node=1500   
>                                                               node=22         
>                       1-00:00:00      node=300                                
>                                          
>       gros          6   00:00:00                                    cluster   
>                                NoDecay               1.000000     node=2106   
>                                                              node=700         
>                       1-00:00:00      node=700                                
>                                          
>      court          8   00:00:00                                    cluster   
>                                NoDecay               1.000000     node=1100   
>                                                              node=100         
>                         02:00:00      node=300                                
>                                          
>       long          4   00:00:00                                    cluster   
>                                NoDecay               1.000000      node=500   
>                                                              node=200         
>                       5-00:00:00      node=200                                
>                                          
>    special         10   00:00:00                                    cluster   
>                                NoDecay               1.000000     node=2106   
>                                                             node=2106         
>                       5-00:00:00     node=2106                                
>                                          
>    support         10   00:00:00                                    cluster   
>                                NoDecay               1.000000     node=2106   
>                                                              node=700         
>                       1-00:00:00     node=2106                                
>                                          
>       visu         10   00:00:00                                    cluster   
>                                NoDecay               1.000000        node=4   
>                                                              node=700         
>                         06:00:00        node=4                       
> 
> 
> 
> I submitted a bunch of jobs to control the NoDecay efficiency and I noticed 
> RawUsage as well as GrpTRESRaw cpu is still decreasing.
> 
> 
> toto@login1:~/TEST$ sshare -A dci -u " " -o 
> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>              Account       User                                               
>                         GrpTRESRaw                    GrpTRESMins    RawUsage
> -------------------- ----------                            
> ----------------------------------------------------- 
> ------------------------------ -----------
> dci                                
> cpu=6932,mem=12998963,energy=0,node=216,billing=6932,fs/disk=0,vmem=0,pages=0 
>                      cpu=17150      415966
> toto@login1:~/TEST$ sshare -A dci -u " " -o 
> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>              Account       User                                               
>                         GrpTRESRaw                    GrpTRESMins    RawUsage
> -------------------- ----------                            
> ----------------------------------------------------- 
> ------------------------------ -----------
> dci                                
> cpu=6931,mem=12995835,energy=0,node=216,billing=6931,fs/disk=0,vmem=0,pages=0 
>                      cpu=17150      415866
> toto@login1:~/TEST$ sshare -A dci -u " " -o 
> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>              Account       User                                               
>                         GrpTRESRaw                    GrpTRESMins    RawUsage 
> -------------------- ----------                            
> ----------------------------------------------------- 
> ------------------------------ ----------- 
> dci                                
> cpu=6929,mem=12992708,energy=0,node=216,billing=6929,fs/disk=0,vmem=0,pages=0 
>                      cpu=17150      415766 
> 
> 
> Something I forgot to do ?
> 
> 
> Best,
> Gérard
> 
> Cordialement,
> Gérard Gil
> 
> Département Calcul Intensif
> Centre Informatique National de l'Enseignement Superieur
> 950, rue de Saint Priest
> 34097 Montpellier CEDEX 5
> FRANCE
> 
> tel :  (334) 67 14 14 14
> fax : (334) 67 52 37 63
> web : http://www.cines.fr <http://www.cines.fr/>
> 
> De: "Gérard Gil" <gerard....@cines.fr <mailto:gerard....@cines.fr>>
> À: "Slurm-users" <slurm-users@lists.schedmd.com 
> <mailto:slurm-users@lists.schedmd.com>>
> Cc: "slurm-users" <slurm-us...@schedmd.com <mailto:slurm-us...@schedmd.com>>
> Envoyé: Vendredi 24 Juin 2022 14:52:12
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
> Hi Miguel,
>  
>  Good !!
>  
>  I'll try this options on all existing QOS and see if everything works as
>  expected.
>  I'll inform you on the results.
>  
>  
>  Thanks a lot
>  
>  Best,
>  Gérard
>  
>  
>  ----- Mail original -----
> De: "Miguel Oliveira" <miguel.olive...@uc.pt <mailto:miguel.olive...@uc.pt>>
>  À: "Slurm-users" <slurm-users@lists.schedmd.com 
> <mailto:slurm-users@lists.schedmd.com>>
>  Cc: "slurm-users" <slurm-us...@schedmd.com <mailto:slurm-us...@schedmd.com>>
>  Envoyé: Vendredi 24 Juin 2022 14:07:16
>  Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
> 
>  
> Hi Gérard,
>  
>  I believe so. All our accounts correspond to one project and all have an
>  associated QoS with NoDecay and DenyOnLimit. This is enough to restrict usage
>  on each individual project.
>  You only need these flags on the QoS. The association will carry on as usual 
> and
>  fairshare will not be impacted.
>  
>  Hope that helps,
>  
>  Miguel Oliveira
>  
> On 24 Jun 2022, at 12:56, gerard....@cines.fr <mailto:gerard....@cines.fr> 
> wrote:
>  
>  Hi Miguel,
>  
> Why not? You can have multiple QoSs and you have other techniques to change
>  priorities according to your policies.
> 
>  
>  Is this answer my question ?
>  
>  "If all configured QOS use NoDecay, we can take advantage of the FairShare
>  priority with Decay and  all jobs GrpTRESRaw with NoDecay ?"
>  
>  Thanks
>  
>  Best,
> 
>  > > Gérard
> 
> 
> 
> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

Reply via email to