Re: [slurm-dev] Re: FAIR_TREE in SLURM 14.11
Ryan,
Thanks, I think that clarification plus the slide I found here,
http://slurm.schedmd.com/SUG14/fair_tree.pdf, offered the insight I
needed. With the deep hierarchy the goal is that a deeply nested
account be considered a sibling with an account elsewhere in the
tree. The key was setting FairShare=parent for the accounts that
contain no users and that are only used for organizational purposes.
Once I made that change the normalized shares of accounts that contain
users looked correct. The basic idea is such that all accounts
containing users end up being considered siblings under the root
account. Using FairShare=parent achieves this perfectly in 14.11.
The only issue I'm seeing may be a non-issue but could use some
clarification for better understanding. I ran 100 5-minute jobs as
"test-idhmc" (stakeholder) and "test-hepx" (stakeholder) and 50
5-minute jobs as "test-general" (non-stakeholder). I looked at the
sshare -l output for these 3 users and noticed while the FairShare
values are correctly ordered such that test-hepx > test-idhmc >
test-general, the ratio is not proportional for test-hepx vs
test-idhmc. The hepx account has 700 shares while the idhmc account
has 128 shares. The ratio for the account's normalized shares is
correct, and so is the Level FS of the account. In this case the
calculated FairShare for test-hepx is not much higher than test-idhmc
but it's still greater so in the end I guess that's what matters.
Thanks,
- Trey
# sshare -u test-hepx,test-idhmc,test-general -l
Account User Raw Shares Norm Shares Raw Usage Norm
Usage Effectv Usage FairShare Level FS GrpCPUMins CPURunMins
-------------------- ---------- ---------- ----------- -----------
----------- ------------- ---------- ---------- -----------
---------------
root 0.000000 11217 1.000000 0
grid parent 0.000000 0 0.000000 0.000000 0
cms 10 0.003315 0 0.000000 0.000000
inf 0
suragrid 10 0.003315 0 0.000000
0.000000 inf 0
tamu parent 0.000000 11217 1.000000 1.000000 0
agriculture parent 0.000000 0 0.000000
0.000000 0
aglife 10 0.003315 0 0.000000
0.000000 inf 0
genomics 10 0.003315 0 0.000000
0.000000 inf 0
engineering parent 0.000000 0 0.000000
0.000000 0
pete 10 0.003315 0 0.000000
0.000000 inf 0
general 10 0.003315 2262 0.201728
0.201728 0.016431 0
general test-general 1 0.007874 2262 0.201728
1.000000 0.001572 0.007874 0
geo parent 0.000000 0 0.000000 0.000000 0
atmo 10 0.003315 0 0.000000
0.000000 inf 0
liberalarts parent 0.000000 4507 0.401794
0.401794 0
idhmc 128 0.042426 4507 0.401794
0.401794 0.105592 0
idhmc test-idhmc 1 0.062500 4507 0.401794
1.000000 0.201258 0.062500 0
mgmt 2058 0.682135 0 0.000000
0.000000 inf 0
science parent 0.000000 4447 0.396478
0.396478 0
acad 10 0.003315 0 0.000000
0.000000 inf 0
chem 10 0.003315 0 0.000000
0.000000 inf 0
iamcs 10 0.003315 0 0.000000
0.000000 inf 0
math-dept parent 0.000000 0 0.000000
0.000000 0
math 10 0.003315 0 0.000000
0.000000 inf 0
secant 10 0.003315 0 0.000000
0.000000 inf 0
physics parent 0.000000 4447 0.396478
0.396478 0
hepx 700 0.232019 4447 0.396478
0.396478 0.585199 0
hepx test-hepx 1 0.012821 4447 0.396478
1.000000 0.226415 0.012821 0
stat parent 0.000000 0 0.000000 0.000000 0
carroll 10 0.003315 0 0.000000
0.000000 inf 0
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
Jabber: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
On Thu, Jun 4, 2015 at 11:51 AM, Ryan Cox <ryan_...@byu.edu
<mailto:ryan_...@byu.edu>> wrote:
Trey,
In http://slurm.schedmd.com/fair_tree.html#fairshare, take a look
at the definition for "S". Basically, the normalized shares only
matters between sibling associations and will equal 1.0 when
summed. If an association has no siblings, the value is 1.0. If
each of the four siblings in an account has the same Raw Shares
(as defined in sacctmgr) value, the normalized shares value for
each is 0.25. The reason why is because the Level Fairshare
calculations are only done within in account, comparing siblings
to each other. Note that Norm Usage is still presented in sshare
but not used in the calculations.
The sshare manpage has a section about the Fair Tree modifications
to existing columns:
http://slurm.schedmd.com/sshare.html#SECTION_FAIR_TREE%20MODIFICATIONS
Ryan
On 06/03/2015 02:47 PM, Trey Dockendorf wrote:
My site is currently on 14.03.10 and we are evaluating and
testing 14.11.7 as well as moving from
PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME to using
PriorityFlags=FAIR_TREE,SMALL_RELATIVE_TO_TIME.
Our account hierarchy is very deep and is intended to represent
the org structure of departments and research organizations that
are using our cluster [1]. We were able to make the normalized
share ratio match up so all non-stakeholders were equal
(0.000323) and all stakeholders had the correct ratio based on
their contributions to the cluster. The Shares value assigned
represents CPUs funded. All the CPUs no longer belonging to
stakeholders were given to the "mgmt" group so that the Shares
given to the top level (tamu) had a meaningful value when divided
up amongst all the accounts.
While testing FAIR_TREE I noticed the normalized shares were
drastically different [2]. In particular the current
stakeholders (idhcm and hepx) both ended up with 1.0. I'm
guessing this is due to having no sibling accounts.
The docs for FAIR_TREE only describe the formula used to
calculate the Level FairShare. Does the method for calculating
normalized shares change for FAIR_TREE? Is the hierarchy we are
using not a good fit for FAIR_TREE? The description and benefits
of FAIR_TREE appeal to our use case, so modifying our hierarchy
is within the realm of things I'm willing to change.
Any advice on migrating into FAIR_TREE is more than welcome.
Right now I've been running "sleep" jobs under different UIDs to
simulate usage to try and work out how we may need to adjust
things for a migration to FAIR_TREE.
I used the attached spreadsheet to work out the share values we
are using with 14.03.10.
Thanks,
- Trey
[1]:
Account User Raw Shares Norm Shares Raw Usage Effectv
Usage FairShare
-------------------- ---------- ---------- -----------
----------- ------------- ----------
root 1.000000 114089982 1.000000 0.870551
root root 1 0.000323 0
0.000000 1.000000
grid 1 0.000323 3688
0.000032 0.986174
cms 10 0.000269 3688
0.000027 0.986155
suragrid 1 0.000027 0
0.000000 1.000000
tamu 3096 0.999354 114086294
0.999968 0.870477
agriculture 20 0.006671 2697
0.000024 0.999507
aglife 1 0.003336
2697 0.000012 0.999507
genomics 1 0.003336 0
0.000000 1.000000
engineering 10 0.003336 0
0.000000 1.000000
pete 1 0.003336 0
0.000000 1.000000
general 10 0.003336
5542 0.000049 0.997977
geo 10 0.003336 2
0.000000 0.999999
atmo 1 0.003336 2
0.000000 0.999999
liberalarts 128 0.042696 0
0.000000 1.000000
idhmc 1 0.042696 0
0.000000 1.000000
mgmt 2058 0.686472 16
0.000000 1.000000
science 760 0.253508
114078034 0.999895 0.578806
acad 10 0.003336 0
0.000000 1.000000
chem 10 0.003336 0
0.000000 1.000000
iamcs 10 0.003336 3506549
0.030649 0.279777
math-dept 20 0.006671 11735411
0.102422 0.119035
math 10 0.003336 11735411
0.102422 0.014169
secant 10 0.003336 0
0.000000 1.000000
physics 700 0.233494 98836073
0.919795 0.579205
hepx 700 0.233494 98836073
0.919795 0.579205
stat 10 0.003336 0
0.000000 1.000000
carroll 10 0.003336 0
0.000000 1.000000
[2]:
Account User Raw Shares Norm Shares Raw Usage Effectv
Usage FairShare
-------------------- ---------- ---------- -----------
----------- ------------- ----------
root 0.000000 53229 1.000000
root root 1 0.000323 0
0.000000 1.000000
grid 1 0.000323 0
0.000000
cms 10 0.909091 0
0.000000
suragrid 1 0.090909 0
0.000000
tamu 3096 0.999354 53229
1.000000
agriculture 20 0.006676 0
0.000000
aglife 1 0.500000
0 0.000000
genomics 1 0.500000 0
0.000000
engineering 10 0.003338 0
0.000000
pete 1 1.000000 0
0.000000
general 10 0.003338
6326 0.118860
geo 10 0.003338 0
0.000000
atmo 1 1.000000 0
0.000000
liberalarts 128 0.042724 13122
0.246522
idhmc 1 1.000000 13122
1.000000
mgmt 2058 0.686916 20984
0.394237
science 760 0.253672
12795 0.240382
acad 10 0.013158 0
0.000000
chem 10 0.013158 0
0.000000
iamcs 10 0.013158 0
0.000000
math-dept 20 0.026316 0
0.000000
math 10 0.500000 0
0.000000
secant 10 0.500000 0
0.000000
physics 700 0.921053 12795
1.000000
hepx 1 1.000000 12795
1.000000
stat 10 0.013158 0
0.000000
carroll 1 1.000000 0
0.000000
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
Jabber: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University