Hi guys,

We've just setup our new cluster and are facing some issues regading fairshare calculation.
Our slurm directive regarding priority calculation are defines as follows:

PriorityType=priority/multifactor
PriorityFlags=MAX_TRES
PriorityDecayHalfLife=14-0
PriorityFavorSmall=NO
PriorityMaxAge=14-0
PriorityWeightAge=1000
PriorityWeightJobSize=1000
PriorityWeightPartition=10000000
PriorityWeightQOS=10000000
PriorityWeightTRES=CPU=2000,Mem=4000
PriorityWeightFairshare=100000

The partition we are submitinh out jobs to is setup as follows:

PartitionName=mypartPriority=1000TRESBillingWeights="CPU=1.0,Mem=0.25G"Default=YESMaxTime=96:0:0DefMemPerCPU=5333Nodes=node[001-036] MaxNodes=20

Whenever we take a look at the fairshare value using sshare -l we see the following output:

AccountUserRawSharesNormSharesRawUsageNormUsageEffectvUsageFairShareLevelFSGrpTRESMinsTRESRunMins
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
root10.0000002687245970.0000000.000000cpu=1098201,mem=5856709132,en+
rootroot10.10000000.0000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group110.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group210.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group310.1000002687245970.0000000.000000cpu=1098201,mem=5856709132,en+
group410.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group510.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group610.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group710.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group810.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+
group910.10000000.0000000.000000cpu=0,mem=0,energy=0,node=0,b+

We think it is really weird that the FairShare value is 0 for the root account and "NULL" for all other groups, even the one who had the greatest raw usage.

While taking a look at the data for our users we see the following:

AccountUserRawSharesNormSharesRawUsageEffectvUsageFairShare
-------------------------------------------------------------------------------------
root10.0000002689837210.000000
rootroot10.10000000.0000000.000000
group310.1000002689837210.000000
group3user110.090909121093740.0000000.000000
group3user210.09090900.0000000.000000
group3user310.09090900.0000000.000000
group3user410.09090900.0000000.000000
group3user510.09090900.0000000.000000
group3user610.09090900.0000000.000000
group3user710.09090900.0000000.000000
group3user810.0909092088245970.0000000.000000
group3user910.09090900.0000000.000000
group3user1010.09090900.0000000.000000
group3user1110.090909480497500.0000000.000000
group410.10000000.000000
group4user1310.0000004994520.0000000.000000
group510.10000000.000000
group5user1410.00000015396030.0000000.000000

This is a weird behavior, since user1, user8, user11, user13 and user14 are the ones who have more RawUsage and the FairShare value is the same for all of them, including the users that have no yet submited any job.

We also noticed that in the slurmctld log there is the fillowing error message that appears with some regularity

[2024-03-07T16:38:13.260] error: _append_list_to_array: unable to append NULL list to assoc list. [2024-03-07T16:38:13.260] error: _calc_tree_fs: unable to calculate fairshare on empty tree

The error above looks like it is coming from: https://github.com/SchedMD/slurm/blob/b11bf689b270f1f5dfe4b0cd54c4fa84b4af315b/src/plugins/priority/multifactor/fair_tree.c#L337

Are we missing any setting on slurm.conf? This is kind of strange, because we have another cluster with pretty much the same configuration and the FairShare is calculated without any problems.
Any help would be appreciated.



--
Cumprimentos / Best Regards,
Zacarias Benta

LIP/INCD @ UMINHO
 ----------------------------------------------
/ Use linux, and may the source be with you.  /
----------------------------------------------
                \  __
                -=(o '.
                   '.-.\
                   /|  \\
                   '|  ||
                    _\_):,_

Attachment: OpenPGP_0x5CDEDF15366C1002_and_old_rev.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to