Hello Mathew, just to check the basics - did you wait a few minutes before executing sshare? As far as I remember the RawUsage value is updated every 5 minutes (per default), so these erroneous values might be caused by a measurement that was taken while the jobs were still running.
Regards, Lech Am 26.11.2014 um 12:14 schrieb Roshan Mathew <r.t.mat...@bath.ac.uk>: > My fairshare test scenario - As it stand the farishare is not distributed > correctly > > *Accounts* > > [root@slurm-login slurm-scripts]# sacctmgr list accounts > Account Descr Org > ---------- -------------------- -------------------- > premium1 primary account root > premium2 primary account root > root default root account root > > *Users* > > [root@slurm-login slurm-scripts]# sacctmgr list users > User Def Acct Admin > ---------- ---------- --------- > mm339 premium1 None > sy223 premium2 None > > *Initial Shares* > > [root@slurm-login slurm-scripts]# sshare > Account User Raw Shares Norm Shares Raw Usage Effectv > Usage FairShare > -------------------- ---------- ---------- ----------- ----------- > ------------- ---------- > root 1.000000 0 > 1.000000 0.500000 > premium1 50 0.500000 0 > 0.000000 1.000000 > premium2 50 0.500000 0 > 0.000000 1.000000 > > > *Job script* > > [root@slurm-login slurm-scripts]# cat stress.slurm > #!/bin/bash > > #SBATCH --nodes=1 > #SBATCH --ntasks=1 > #SBATCH --job-name=stress > #SBATCH --time=10 > #SBATCH --output=stress.%j-out > #SBATCH --error=stress.%j-out > > time /opt/shared/apps/stress/app/bin/stress --cpu 1 --timeout 1m > > > *Job Submission* > > [root@slurm-login slurm-scripts]# runuser sy223 -c 'sbatch stress.slurm' > Submitted batch job 2 > [root@slurm-login slurm-scripts]# runuser mm339 -c 'sbatch stress.slurm' > Submitted batch job 3 > > > *SACCT Information* > > [root@slurm-login slurm-scripts]# sacct --format > Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw > JobID JobName Partition Account AllocCPUS State ExitCode > CPUTimeRAW > ------------ ---------- ---------- ---------- ---------- ---------- -------- > ---------- > 2 stress batch premium2 1 COMPLETED 0:0 > 60 > 2.batch batch premium2 1 COMPLETED 0:0 > 60 > 3 stress batch premium1 1 COMPLETED 0:0 > 60 > 3.batch batch premium1 1 COMPLETED 0:0 > 60 > > > *SSHARE Information* > > [root@slurm-login slurm-scripts]# sshare > Account User Raw Shares Norm Shares Raw Usage Effectv > Usage FairShare > -------------------- ---------- ---------- ----------- ----------- > ------------- ---------- > root 1.000000 79 > 1.000000 0.500000 > premium1 50 0.500000 36 > 0.455696 0.531672 > premium2 50 0.500000 43 > 0.544304 0.470215 > > > *Slurm.conf - priority/multifactor* > > # Activate the Multi-factor Job Priority Plugin with decay > PriorityType=priority/multifactor > > # apply no decay > PriorityDecayHalfLife=0 > PriorityCalcPeriod=1 > # reset usage after 1 month > PriorityUsageResetPeriod=MONTHLY > > # The larger the job, the greater its job size priority. > PriorityFavorSmall=NO > > # The job's age factor reaches 1.0 after waiting in the > # queue for 2 weeks. > PriorityMaxAge=14-0 > > # This next group determines the weighting of each of the > # components of the Multi-factor Job Priority Plugin. > # The default value for each of the following is 1. > PriorityWeightAge=0 > PriorityWeightFairshare=100 > PriorityWeightJobSize=0 > PriorityWeightPartition=0 > PriorityWeightQOS=0 # don't use the qos factor > > > *Questions* > > 1. Given that I have set the PriorityDecayHalfLife=0, i.e no decay applied at > any stage, shouldnt both the jobs have the same RAW Usage reported by SSHARE? > > 2. Also shouldnt CPUTimeRAW in sacct be same as RAW Usage in sshare? > > > From: Skouson, Gary B <gary.skou...@pnnl.gov> > Sent: 25 November 2014 21:09 > To: slurm-dev > Subject: [slurm-dev] Re: [ sshare ] RAW Usage > > I believe that the info share data is kept by slurmctld in memory. As far as > I could tell from the code, it should be checkpointing the info to the > assoc_usage file wherever slurm is saving state information. I couldn’t find > any docs on that, you’d have to check the code for more information. > > However, if you just want to see what was used, you can get the raw usage > using sacct. For example, for a given job, you can do something like: > > sacct -X -a -j 1182128 --format > Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw > > ----- > Gary Skouson > > > From: Roshan Mathew [mailto:r.t.mat...@bath.ac.uk] > Sent: Tuesday, November 25, 2014 9:51 AM > To: slurm-dev > Subject: [slurm-dev] Re: [ sshare ] RAW Usage > > Thanks Ryan, > > Is this value stored anywhere in the SLURM accounting DB? I could not find > any value for the JOB that corresponds to this RAW usage. > > Roshan > From: Ryan Cox <ryan_...@byu.edu> > Sent: 25 November 2014 17:43 > To: slurm-dev > Subject: [slurm-dev] Re: [ sshare ] RAW Usage > > Raw usage is a long double and the time added by jobs can be off by a few > seconds. You can take a look at _apply_new_usage() in > src/plugins/priority/multifactor/priority_multifactor.c to see exactly what > happens. > > Ryan > > On 11/25/2014 10:34 AM, Roshan Mathew wrote: > Hello SLURM users, > > http://slurm.schedmd.com/sshare.html > Raw Usage > The number of cpu-seconds of all the jobs that charged the account by the > user. This number will decay over time when PriorityDecayHalfLife is defined. > I am getting different RAW Usage values for the same job every time it is > executed. The Job am using is a CPU stress test for 1 minute. > > It would be very useful to understand the formula for how this RAW Usage is > calculated when we are using the plugin PriorityType=priority/multifactor. > > Snip of my slurm.conf file:- > > # Activate the Multi-factor Job Priority Plugin with decay > PriorityType=priority/multifactor > > # apply no decay > PriorityDecayHalfLife=0 > > PriorityCalcPeriod=1 > PriorityUsageResetPeriod=MONTHLY > > # The larger the job, the greater its job size priority. > PriorityFavorSmall=NO > > # The job's age factor reaches 1.0 after waiting in the > # queue for 2 weeks. > PriorityMaxAge=14-0 > > # This next group determines the weighting of each of the > # components of the Multi-factor Job Priority Plugin. > # The default value for each of the following is 1. > PriorityWeightAge=0 > PriorityWeightFairshare=100 > PriorityWeightJobSize=0 > PriorityWeightPartition=0 > PriorityWeightQOS=0 # don't use the qos factor > > Thanks! > > <image001.jpg> > <image001.jpg> > >