My fairshare test scenario - As it stand the farishare is not distributed 
correctly

*Accounts*

[root@slurm-login slurm-scripts]# sacctmgr list accounts
   Account                Descr                  Org
---------- -------------------- --------------------
  premium1      primary account                  root
  premium2      primary account                  root
      root default root account                  root

*Users*

[root@slurm-login slurm-scripts]# sacctmgr list users
      User   Def Acct     Admin
---------- ---------- ---------
     mm339   premium1      None
     sy223   premium2      None

*Initial Shares*

[root@slurm-login slurm-scripts]# sshare
             Account       User Raw Shares Norm Shares   Raw Usage Effectv 
Usage  FairShare
-------------------- ---------- ---------- ----------- ----------- 
------------- ----------
root                                          1.000000           0      
1.000000   0.500000
 premium1                               50    0.500000           0      
0.000000   1.000000
 premium2                               50    0.500000           0      
0.000000   1.000000


*Job script*

[root@slurm-login slurm-scripts]# cat stress.slurm
#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --job-name=stress
#SBATCH --time=10
#SBATCH --output=stress.%j-out
#SBATCH --error=stress.%j-out

time /opt/shared/apps/stress/app/bin/stress --cpu 1 --timeout 1m


*Job Submission*

[root@slurm-login slurm-scripts]# runuser sy223 -c 'sbatch stress.slurm'
Submitted batch job 2
[root@slurm-login slurm-scripts]# runuser mm339 -c 'sbatch stress.slurm'
Submitted batch job 3


*SACCT Information*

[root@slurm-login slurm-scripts]# sacct --format 
Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
CPUTimeRAW
------------ ---------- ---------- ---------- ---------- ---------- -------- 
----------
2                stress      batch   premium2          1  COMPLETED      0:0    
     60
2.batch           batch              premium2          1  COMPLETED      0:0    
     60
3                stress      batch   premium1          1  COMPLETED      0:0    
     60
3.batch           batch              premium1          1  COMPLETED      0:0    
     60


*SSHARE Information*

[root@slurm-login slurm-scripts]# sshare
             Account       User Raw Shares Norm Shares   Raw Usage Effectv 
Usage  FairShare
-------------------- ---------- ---------- ----------- ----------- 
------------- ----------
root                                          1.000000          79      
1.000000   0.500000
 premium1                               50    0.500000          36      
0.455696   0.531672
 premium2                               50    0.500000          43      
0.544304   0.470215


*Slurm.conf - priority/multifactor*

# Activate the Multi-factor Job Priority Plugin with decay
PriorityType=priority/multifactor

# apply no decay
PriorityDecayHalfLife=0
PriorityCalcPeriod=1
# reset usage after 1 month
PriorityUsageResetPeriod=MONTHLY

# The larger the job, the greater its job size priority.
PriorityFavorSmall=NO

# The job's age factor reaches 1.0 after waiting in the
# queue for 2 weeks.
PriorityMaxAge=14-0

# This next group determines the weighting of each of the
# components of the Multi-factor Job Priority Plugin.
# The default value for each of the following is 1.
PriorityWeightAge=0
PriorityWeightFairshare=100
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS=0 # don't use the qos factor


*Questions*

1. Given that I have set the PriorityDecayHalfLife=0, i.e no decay applied at 
any stage, shouldnt both the jobs have the same RAW Usage reported by SSHARE?

2. Also shouldnt CPUTimeRAW in sacct be same as RAW Usage in sshare?



________________________________
From: Skouson, Gary B <gary.skou...@pnnl.gov>
Sent: 25 November 2014 21:09
To: slurm-dev
Subject: [slurm-dev] Re: [ sshare ] RAW Usage

I believe that the info share data is kept by slurmctld in memory.  As far as I 
could tell from the code, it should be checkpointing the info to the 
assoc_usage file wherever slurm is saving state information.  I couldn’t find 
any docs on that, you’d have to check the code for more information.

However, if you just want to see what was used, you can get the raw usage using 
sacct.  For example, for a given job, you can do something like:

sacct -X -a -j 1182128  --format 
Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw

-----
Gary Skouson


From: Roshan Mathew [mailto:r.t.mat...@bath.ac.uk]
Sent: Tuesday, November 25, 2014 9:51 AM
To: slurm-dev
Subject: [slurm-dev] Re: [ sshare ] RAW Usage


Thanks Ryan,



Is this value stored anywhere in the SLURM accounting DB? I could not find any 
value for the JOB that corresponds to this RAW usage.



Roshan

________________________________
From: Ryan Cox <ryan_...@byu.edu>
Sent: 25 November 2014 17:43
To: slurm-dev
Subject: [slurm-dev] Re: [ sshare ] RAW Usage

Raw usage is a long double and the time added by jobs can be off by a few 
seconds.  You can take a look at _apply_new_usage() in 
src/plugins/priority/multifactor/priority_multifactor.c to see exactly what 
happens.

Ryan
On 11/25/2014 10:34 AM, Roshan Mathew wrote:
Hello SLURM users,

http://slurm.schedmd.com/sshare.html
Raw Usage
The number of cpu-seconds of all the jobs that charged the account by the user. 
This number will decay over time when PriorityDecayHalfLife is defined.
I am getting different RAW Usage  values for the same job every time it is 
executed. The Job am using is a CPU stress test for 1 minute.

It would be very useful to understand the formula for how this RAW Usage is 
calculated when we are using the plugin PriorityType=priority/multifactor.

Snip of my slurm.conf file:-

# Activate the Multi-factor Job Priority Plugin with decay
PriorityType=priority/multifactor

# apply no decay
PriorityDecayHalfLife=0

PriorityCalcPeriod=1
PriorityUsageResetPeriod=MONTHLY

# The larger the job, the greater its job size priority.
PriorityFavorSmall=NO

# The job's age factor reaches 1.0 after waiting in the
# queue for 2 weeks.
PriorityMaxAge=14-0

# This next group determines the weighting of each of the
# components of the Multi-factor Job Priority Plugin.
# The default value for each of the following is 1.
PriorityWeightAge=0
PriorityWeightFairshare=100
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS=0 # don't use the qos factor

Thanks!




Reply via email to