My fairshare test scenario - As it stand the farishare is not distributed correctly
*Accounts* [root@slurm-login slurm-scripts]# sacctmgr list accounts Account Descr Org ---------- -------------------- -------------------- premium1 primary account root premium2 primary account root root default root account root *Users* [root@slurm-login slurm-scripts]# sacctmgr list users User Def Acct Admin ---------- ---------- --------- mm339 premium1 None sy223 premium2 None *Initial Shares* [root@slurm-login slurm-scripts]# sshare Account User Raw Shares Norm Shares Raw Usage Effectv Usage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 1.000000 0 1.000000 0.500000 premium1 50 0.500000 0 0.000000 1.000000 premium2 50 0.500000 0 0.000000 1.000000 *Job script* [root@slurm-login slurm-scripts]# cat stress.slurm #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --job-name=stress #SBATCH --time=10 #SBATCH --output=stress.%j-out #SBATCH --error=stress.%j-out time /opt/shared/apps/stress/app/bin/stress --cpu 1 --timeout 1m *Job Submission* [root@slurm-login slurm-scripts]# runuser sy223 -c 'sbatch stress.slurm' Submitted batch job 2 [root@slurm-login slurm-scripts]# runuser mm339 -c 'sbatch stress.slurm' Submitted batch job 3 *SACCT Information* [root@slurm-login slurm-scripts]# sacct --format Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw JobID JobName Partition Account AllocCPUS State ExitCode CPUTimeRAW ------------ ---------- ---------- ---------- ---------- ---------- -------- ---------- 2 stress batch premium2 1 COMPLETED 0:0 60 2.batch batch premium2 1 COMPLETED 0:0 60 3 stress batch premium1 1 COMPLETED 0:0 60 3.batch batch premium1 1 COMPLETED 0:0 60 *SSHARE Information* [root@slurm-login slurm-scripts]# sshare Account User Raw Shares Norm Shares Raw Usage Effectv Usage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 1.000000 79 1.000000 0.500000 premium1 50 0.500000 36 0.455696 0.531672 premium2 50 0.500000 43 0.544304 0.470215 *Slurm.conf - priority/multifactor* # Activate the Multi-factor Job Priority Plugin with decay PriorityType=priority/multifactor # apply no decay PriorityDecayHalfLife=0 PriorityCalcPeriod=1 # reset usage after 1 month PriorityUsageResetPeriod=MONTHLY # The larger the job, the greater its job size priority. PriorityFavorSmall=NO # The job's age factor reaches 1.0 after waiting in the # queue for 2 weeks. PriorityMaxAge=14-0 # This next group determines the weighting of each of the # components of the Multi-factor Job Priority Plugin. # The default value for each of the following is 1. PriorityWeightAge=0 PriorityWeightFairshare=100 PriorityWeightJobSize=0 PriorityWeightPartition=0 PriorityWeightQOS=0 # don't use the qos factor *Questions* 1. Given that I have set the PriorityDecayHalfLife=0, i.e no decay applied at any stage, shouldnt both the jobs have the same RAW Usage reported by SSHARE? 2. Also shouldnt CPUTimeRAW in sacct be same as RAW Usage in sshare? ________________________________ From: Skouson, Gary B <gary.skou...@pnnl.gov> Sent: 25 November 2014 21:09 To: slurm-dev Subject: [slurm-dev] Re: [ sshare ] RAW Usage I believe that the info share data is kept by slurmctld in memory. As far as I could tell from the code, it should be checkpointing the info to the assoc_usage file wherever slurm is saving state information. I couldn’t find any docs on that, you’d have to check the code for more information. However, if you just want to see what was used, you can get the raw usage using sacct. For example, for a given job, you can do something like: sacct -X -a -j 1182128 --format Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw ----- Gary Skouson From: Roshan Mathew [mailto:r.t.mat...@bath.ac.uk] Sent: Tuesday, November 25, 2014 9:51 AM To: slurm-dev Subject: [slurm-dev] Re: [ sshare ] RAW Usage Thanks Ryan, Is this value stored anywhere in the SLURM accounting DB? I could not find any value for the JOB that corresponds to this RAW usage. Roshan ________________________________ From: Ryan Cox <ryan_...@byu.edu> Sent: 25 November 2014 17:43 To: slurm-dev Subject: [slurm-dev] Re: [ sshare ] RAW Usage Raw usage is a long double and the time added by jobs can be off by a few seconds. You can take a look at _apply_new_usage() in src/plugins/priority/multifactor/priority_multifactor.c to see exactly what happens. Ryan On 11/25/2014 10:34 AM, Roshan Mathew wrote: Hello SLURM users, http://slurm.schedmd.com/sshare.html Raw Usage The number of cpu-seconds of all the jobs that charged the account by the user. This number will decay over time when PriorityDecayHalfLife is defined. I am getting different RAW Usage values for the same job every time it is executed. The Job am using is a CPU stress test for 1 minute. It would be very useful to understand the formula for how this RAW Usage is calculated when we are using the plugin PriorityType=priority/multifactor. Snip of my slurm.conf file:- # Activate the Multi-factor Job Priority Plugin with decay PriorityType=priority/multifactor # apply no decay PriorityDecayHalfLife=0 PriorityCalcPeriod=1 PriorityUsageResetPeriod=MONTHLY # The larger the job, the greater its job size priority. PriorityFavorSmall=NO # The job's age factor reaches 1.0 after waiting in the # queue for 2 weeks. PriorityMaxAge=14-0 # This next group determines the weighting of each of the # components of the Multi-factor Job Priority Plugin. # The default value for each of the following is 1. PriorityWeightAge=0 PriorityWeightFairshare=100 PriorityWeightJobSize=0 PriorityWeightPartition=0 PriorityWeightQOS=0 # don't use the qos factor Thanks!