Hello Mathew,

just to check the basics - did you wait a few minutes before executing sshare? 
As far as I remember the RawUsage value is updated every 5 minutes (per 
default), so these erroneous values might be caused by a measurement that was 
taken while the jobs were still running.

Regards,
Lech


Am 26.11.2014 um 12:14 schrieb Roshan Mathew <r.t.mat...@bath.ac.uk>:

> My fairshare test scenario - As it stand the farishare is not distributed 
> correctly
> 
> *Accounts*
> 
> [root@slurm-login slurm-scripts]# sacctmgr list accounts
>    Account                Descr                  Org 
> ---------- -------------------- -------------------- 
>   premium1      primary account                  root 
>   premium2      primary account                  root
>       root default root account                  root
> 
> *Users*
> 
> [root@slurm-login slurm-scripts]# sacctmgr list users
>       User   Def Acct     Admin 
> ---------- ---------- --------- 
>      mm339   premium1      None 
>      sy223   premium2      None 
> 
> *Initial Shares*
> 
> [root@slurm-login slurm-scripts]# sshare
>              Account       User Raw Shares Norm Shares   Raw Usage Effectv 
> Usage  FairShare 
> -------------------- ---------- ---------- ----------- ----------- 
> ------------- ---------- 
> root                                          1.000000           0      
> 1.000000   0.500000 
>  premium1                               50    0.500000           0      
> 0.000000   1.000000 
>  premium2                               50    0.500000           0      
> 0.000000   1.000000
> 
> 
> *Job script*
> 
> [root@slurm-login slurm-scripts]# cat stress.slurm 
> #!/bin/bash
> 
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --job-name=stress
> #SBATCH --time=10
> #SBATCH --output=stress.%j-out
> #SBATCH --error=stress.%j-out
> 
> time /opt/shared/apps/stress/app/bin/stress --cpu 1 --timeout 1m
> 
> 
> *Job Submission* 
> 
> [root@slurm-login slurm-scripts]# runuser sy223 -c 'sbatch stress.slurm'
> Submitted batch job 2
> [root@slurm-login slurm-scripts]# runuser mm339 -c 'sbatch stress.slurm'
> Submitted batch job 3
> 
> 
> *SACCT Information*
> 
> [root@slurm-login slurm-scripts]# sacct --format 
> Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
>        JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
> CPUTimeRAW 
> ------------ ---------- ---------- ---------- ---------- ---------- -------- 
> ---------- 
> 2                stress      batch   premium2          1  COMPLETED      0:0  
>        60 
> 2.batch           batch              premium2          1  COMPLETED      0:0  
>        60 
> 3                stress      batch   premium1          1  COMPLETED      0:0  
>        60 
> 3.batch           batch              premium1          1  COMPLETED      0:0  
>        60
> 
> 
> *SSHARE Information*
> 
> [root@slurm-login slurm-scripts]# sshare
>              Account       User Raw Shares Norm Shares   Raw Usage Effectv 
> Usage  FairShare 
> -------------------- ---------- ---------- ----------- ----------- 
> ------------- ---------- 
> root                                          1.000000          79      
> 1.000000   0.500000 
>  premium1                               50    0.500000          36      
> 0.455696   0.531672 
>  premium2                               50    0.500000          43      
> 0.544304   0.470215
> 
> 
> *Slurm.conf - priority/multifactor*
> 
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
> 
> # apply no decay
> PriorityDecayHalfLife=0
> PriorityCalcPeriod=1
> # reset usage after 1 month
> PriorityUsageResetPeriod=MONTHLY
> 
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
> 
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
> 
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=0
> PriorityWeightFairshare=100
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=0 # don't use the qos factor
> 
> 
> *Questions*
> 
> 1. Given that I have set the PriorityDecayHalfLife=0, i.e no decay applied at 
> any stage, shouldnt both the jobs have the same RAW Usage reported by SSHARE?
> 
> 2. Also shouldnt CPUTimeRAW in sacct be same as RAW Usage in sshare?
> 
> 
> From: Skouson, Gary B <gary.skou...@pnnl.gov>
> Sent: 25 November 2014 21:09
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>  
> I believe that the info share data is kept by slurmctld in memory.  As far as 
> I could tell from the code, it should be checkpointing the info to the 
> assoc_usage file wherever slurm is saving state information.  I couldn’t find 
> any docs on that, you’d have to check the code for more information.
>  
> However, if you just want to see what was used, you can get the raw usage 
> using sacct.  For example, for a given job, you can do something like:
>  
> sacct -X -a -j 1182128  --format 
> Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
>  
> -----
> Gary Skouson
>  
>  
> From: Roshan Mathew [mailto:r.t.mat...@bath.ac.uk] 
> Sent: Tuesday, November 25, 2014 9:51 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>  
> Thanks Ryan,
>  
> Is this value stored anywhere in the SLURM accounting DB? I could not find 
> any value for the JOB that corresponds to this RAW usage.
>  
> Roshan
> From: Ryan Cox <ryan_...@byu.edu>
> Sent: 25 November 2014 17:43
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>  
> Raw usage is a long double and the time added by jobs can be off by a few 
> seconds.  You can take a look at _apply_new_usage() in 
> src/plugins/priority/multifactor/priority_multifactor.c to see exactly what 
> happens.
> 
> Ryan
> 
> On 11/25/2014 10:34 AM, Roshan Mathew wrote:
> Hello SLURM users,
>  
> http://slurm.schedmd.com/sshare.html
> Raw Usage
> The number of cpu-seconds of all the jobs that charged the account by the 
> user. This number will decay over time when PriorityDecayHalfLife is defined.
> I am getting different RAW Usage  values for the same job every time it is 
> executed. The Job am using is a CPU stress test for 1 minute.
>  
> It would be very useful to understand the formula for how this RAW Usage is 
> calculated when we are using the plugin PriorityType=priority/multifactor.
>  
> Snip of my slurm.conf file:-
>  
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
> 
> # apply no decay
> PriorityDecayHalfLife=0
> 
> PriorityCalcPeriod=1
> PriorityUsageResetPeriod=MONTHLY
> 
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
> 
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
> 
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=0
> PriorityWeightFairshare=100
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=0 # don't use the qos factor
>  
> Thanks!
> 
> <image001.jpg>
> <image001.jpg>
> 
> 

Reply via email to