Trey,
In http://slurm.schedmd.com/fair_tree.html#fairshare, take a look at the
definition for "S". Basically, the normalized shares only matters
between sibling associations and will equal 1.0 when summed. If an
association has no siblings, the value is 1.0. If each of the four
siblings in an account has the same Raw Shares (as defined in sacctmgr)
value, the normalized shares value for each is 0.25. The reason why is
because the Level Fairshare calculations are only done within in
account, comparing siblings to each other. Note that Norm Usage is still
presented in sshare but not used in the calculations.
The sshare manpage has a section about the Fair Tree modifications to
existing columns:
http://slurm.schedmd.com/sshare.html#SECTION_FAIR_TREE%20MODIFICATIONS
Ryan
On 06/03/2015 02:47 PM, Trey Dockendorf wrote:
FAIR_TREE in SLURM 14.11
My site is currently on 14.03.10 and we are evaluating and testing
14.11.7 as well as moving from
PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME to using
PriorityFlags=FAIR_TREE,SMALL_RELATIVE_TO_TIME.
Our account hierarchy is very deep and is intended to represent the
org structure of departments and research organizations that are using
our cluster [1]. We were able to make the normalized share ratio
match up so all non-stakeholders were equal (0.000323) and all
stakeholders had the correct ratio based on their contributions to the
cluster. The Shares value assigned represents CPUs funded. All the
CPUs no longer belonging to stakeholders were given to the "mgmt"
group so that the Shares given to the top level (tamu) had a
meaningful value when divided up amongst all the accounts.
While testing FAIR_TREE I noticed the normalized shares were
drastically different [2]. In particular the current stakeholders
(idhcm and hepx) both ended up with 1.0. I'm guessing this is due to
having no sibling accounts.
The docs for FAIR_TREE only describe the formula used to calculate the
Level FairShare. Does the method for calculating normalized shares
change for FAIR_TREE? Is the hierarchy we are using not a good fit for
FAIR_TREE? The description and benefits of FAIR_TREE appeal to our
use case, so modifying our hierarchy is within the realm of things I'm
willing to change.
Any advice on migrating into FAIR_TREE is more than welcome. Right
now I've been running "sleep" jobs under different UIDs to simulate
usage to try and work out how we may need to adjust things for a
migration to FAIR_TREE.
I used the attached spreadsheet to work out the share values we are
using with 14.03.10.
Thanks,
- Trey
[1]:
Account User Raw Shares Norm Shares Raw Usage
Effectv Usage FairShare
-------------------- ---------- ---------- ----------- -----------
------------- ----------
root 1.000000 114089982 1.000000 0.870551
root root 1 0.000323 0 0.000000 1.000000
grid 1 0.000323 3688 0.000032 0.986174
cms 10 0.000269 3688 0.000027 0.986155
suragrid 1 0.000027 0 0.000000
1.000000
tamu 3096 0.999354 114086294 0.999968 0.870477
agriculture 20 0.006671 2697 0.000024
0.999507
aglife 1 0.003336 2697 0.000012
0.999507
genomics 1 0.003336 0 0.000000
1.000000
engineering 10 0.003336 0 0.000000
1.000000
pete 1 0.003336 0 0.000000 1.000000
general 10 0.003336 5542 0.000049
0.997977
geo 10 0.003336 2 0.000000 0.999999
atmo 1 0.003336 2 0.000000 0.999999
liberalarts 128 0.042696 0 0.000000
1.000000
idhmc 1 0.042696 0 0.000000
1.000000
mgmt 2058 0.686472 16 0.000000
1.000000
science 760 0.253508 114078034 0.999895
0.578806
acad 10 0.003336 0 0.000000 1.000000
chem 10 0.003336 0 0.000000 1.000000
iamcs 10 0.003336 3506549 0.030649
0.279777
math-dept 20 0.006671 11735411 0.102422
0.119035
math 10 0.003336 11735411 0.102422
0.014169
secant 10 0.003336 0 0.000000
1.000000
physics 700 0.233494 98836073 0.919795
0.579205
hepx 700 0.233494 98836073 0.919795
0.579205
stat 10 0.003336 0 0.000000 1.000000
carroll 10 0.003336 0 0.000000
1.000000
[2]:
Account User Raw Shares Norm Shares Raw Usage
Effectv Usage FairShare
-------------------- ---------- ---------- ----------- -----------
------------- ----------
root 0.000000 53229 1.000000
root root 1 0.000323 0 0.000000 1.000000
grid 1 0.000323 0 0.000000
cms 10 0.909091 0 0.000000
suragrid 1 0.090909 0 0.000000
tamu 3096 0.999354 53229 1.000000
agriculture 20 0.006676 0 0.000000
aglife 1 0.500000 0 0.000000
genomics 1 0.500000 0 0.000000
engineering 10 0.003338 0 0.000000
pete 1 1.000000 0 0.000000
general 10 0.003338 6326 0.118860
geo 10 0.003338 0 0.000000
atmo 1 1.000000 0 0.000000
liberalarts 128 0.042724 13122 0.246522
idhmc 1 1.000000 13122 1.000000
mgmt 2058 0.686916 20984 0.394237
science 760 0.253672 12795 0.240382
acad 10 0.013158 0 0.000000
chem 10 0.013158 0 0.000000
iamcs 10 0.013158 0 0.000000
math-dept 20 0.026316 0 0.000000
math 10 0.500000 0 0.000000
secant 10 0.500000 0 0.000000
physics 700 0.921053 12795 1.000000
hepx 1 1.000000 12795 1.000000
stat 10 0.013158 0 0.000000
carroll 1 1.000000 0 0.000000
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
Jabber: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University