[slurm-dev] Re: Possible Age Priority bug in SLURM version 2.4.1

Paddy Doyle Wed, 05 Sep 2012 05:48:09 -0700

Just to add a +1 for this bug. We're seeing this as well with 2.4.1 on three
different clusters with slightly different priority weights etc.


We have accounting turned on.

Our PriorityCalcPeriod is 1.

It only seems to start happening after a period of time, and jobs submitted
beforehand still show a normal Age value -- just new jobs go immediately to the
PriorityWeightAge.

Restarting the slurmctld seems to make things go back to normal -- i.e. ``old''
jobs keep their existing age, and the ``new'' jobs start counting time as
normal, and then any jobs submitted after that behave as normal too.


To give an idea, here's the sprio beforehand. My job (96715) is showing the
PriorityWeightAge, whereas the other jobs were queued up before the weirdness
started.

$ sprio -l
  JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        
QOS   NICE
  96483    userA     170770      13719      44277      12774     100000         
 0      0
  96526    userA     169615      12565      44277      12774     100000         
 0      0
  96706    userB     107184        573        407       6204     100000         
 0      0
  96715    paddy     391711     100000     171182      20529     100000         
 0      0

Then restart slurmctld, and a few minutes later:

$ sprio -l
  JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        
QOS   NICE
  96483    userA     171134      14012      44348      12774     100000         
 0      0
  96526    userA     169979      12858      44348      12774     100000         
 0      0
  96706    userB     107482        866        411       6204     100000         
 0      0
  96715    paddy     291822         47     171246      20529     100000         
 0      0


Thanks,
Paddy


On Wed, Sep 05, 2012 at 04:10:06AM -0600, Lennart Karlsson wrote:

> 
> On 09/04/2012 04:22 PM, Miguel Méndez wrote:
> > Hi Lennart,
> >
> > I have some questions for you so I can help you:
> >
> > Have you tried to set DebugFlags=Priority in slurm.conf to get some more
> > info about priorities on slurmctld.log?
> >
> > Are your priorities being recalculated every "PriorityCalcPeriod" (in
> > slurm.conf as well, default is 5 min)? If not, do you have Accounting
> > enabled?
> 
> Hi Miguel,
> 
> And thanks for trying to help me!
> 
> Yes, I have configured
> 
>    PriorityCalcPeriod=5
> 
> in the slurm.conf file.
> 
> I do not understand your question about if I have Accounting enabled.
> I have no such configuration variable in my slurm.conf file. I run
> 
> 
> I have now tried your suggestion to set DebugFlags=Priority,
> so now I can rewrite my question in a new way.
> 
> In slurm.conf, I have configured
> PriorityMaxAge=14-0
> PriorityWeightAge=20160
> 
> The plan behind this configuration is to start with an age
> value of zero and get approximately one priority point added
> for each minute that the job has been waiting, up to a
> maximum of 20160.
> 
> This has worked for a long time and usually still does.
> But sometimes it goes seriously wrong, with a new job starting
> at a age value of 20160 instead.
> 
> This can be seen with the sprio command and also with Priority
> debugging on:
> 
> [2012-09-05T10:43:37] Weighted Age priority is 1.000000 * 20160 = 20160.00
> [2012-09-05T10:43:37] Weighted Fairshare priority is 10.000000 * 10000 = 
> 100000.00
> [2012-09-05T10:43:37] Weighted JobSize priority is 0.001616 * 104 = 0.17
> [2012-09-05T10:43:37] Weighted Partition priority is 0.000000 * 0 = 0.00
> [2012-09-05T10:43:37] Weighted QOS priority is 0.000000 * 400000 = 0.00
> [2012-09-05T10:43:37] Job 2182878 priority: 20160.00 + 100000.00 + 0.17 + 
> 0.00 + 0.00 - 0 = 120160.17
> 
> The job was submitted 2012-09-05T10:42:22, so it should have a weighted
> age priority of zero or one, but it got for some unknown reason the
> maximum value instead.
> 
> Here are a job that behaves the normal way, as expected:
> [2012-09-05T10:44:17] Weighted Age priority is 0.000000 * 20160 = 0.00
> [2012-09-05T10:44:17] Weighted Fairshare priority is 6.000000 * 10000 = 
> 60000.00
> [2012-09-05T10:44:17] Weighted JobSize priority is 0.002874 * 104 = 0.30
> [2012-09-05T10:44:17] Weighted Partition priority is 0.000000 * 0 = 0.00
> [2012-09-05T10:44:17] Weighted QOS priority is 0.000000 * 400000 = 0.00
> [2012-09-05T10:44:17] Job 2182879 priority: 0.00 + 60000.00 + 0.30 + 0.00 + 
> 0.00 - 0 = 60000.30
> 
> This job was submitted 2012-09-05T10:44:17, so the weighted age
> priority is zero, as expected.
> 
> Here is an example for a job that has waited for some time:
> [2012-09-05T00:07:31] Weighted Age priority is 0.004721 * 20160 = 95.17
> [2012-09-05T00:07:31] Weighted Fairshare priority is 10.000000 * 10000 = 
> 100000.00
> [2012-09-05T00:07:31] Weighted JobSize priority is 0.002874 * 104 = 0.30
> [2012-09-05T00:07:31] Weighted Partition priority is 0.000000 * 0 = 0.00
> [2012-09-05T00:07:31] Weighted QOS priority is 0.300000 * 400000 = 120000.00
> [2012-09-05T00:07:31] Job 2178648 priority: 95.17 + 100000.00 + 0.30 + 0.00 + 
> 120000.00 - 0 = 220095.47
> 
> Submit time was 2012-09-04T22:32:08, so the Weighted Age
> priority works as intended in this case.
> 
> This is version 2.4.1 of SLURM. (If someone thinks that the Fairshare
> priorities are strange, do not worry. They are intended to be in this
> way, but that is another story.)
> 
> Full slurm.conf configuration is at the bottom of this e-mail,
> with line numbers added.
> 
> Cheers,
> -- Lennart Karlsson
>      UPPMAX, Uppsala University, Sweden
>      http://www.uppmax.uu.se
> 
> ==============================================
>       1  ControlMachine=kalkyl2
>       2  AuthType=auth/munge
>       3  CacheGroups=0
>       4  CryptoType=crypto/munge
>       5  EnforcePartLimits=YES
>       6  Epilog=/etc/slurm/slurm.epilog
>       7  JobCredentialPrivateKey=/etc/slurm/slurm.key
>       8  JobCredentialPublicCertificate=/etc/slurm/slurm.cert
>       9  JobRequeue=0
>      10  MaxJobCount=1000000
>      11  MpiDefault=none
>      12  Proctracktype=proctrack/cgroup
>      13  Prolog=/etc/slurm/slurm.prolog
>      14  PropagateResourceLimits=RSS
>      15  ReturnToService=0
>      16  SallocDefaultCommand="/usr/bin/srun -n1 -N1 --pty --preserve-env 
> --mpi=none -Q $SHELL"
>      17  
> SchedulerParameters=default_queue_depth=5000,bf_window=10080,max_job_bf=5000,bf_interval=120
>      18  SlurmctldPidFile=/var/run/slurmctld.pid
>      19  SlurmctldPort=6817
>      20  SlurmdPidFile=/var/run/slurmd.pid
>      21  SlurmdPort=6818
>      22  SlurmdSpoolDir=/var/spool/slurmd
>      23  SlurmUser=slurm
>      24  StateSaveLocation=/usr/local/slurm-state
>      25  SwitchType=switch/none
>      26  TaskPlugin=task/cgroup
>      27  TaskProlog=/etc/slurm/slurm.taskprolog
>      28  TopologyPlugin=topology/tree
>      29  TmpFs=/scratch
>      30  TrackWCKey=yes
>      31  TreeWidth=20
>      32  UsePAM=1
>      33  HealthCheckInterval=1800
>      34  HealthCheckProgram=/etc/slurm/slurm.healthcheck
>      35  InactiveLimit=0
>      36  KillWait=600
>      37  MessageTimeout=60
>      38  ResvOverRun=UNLIMITED
>      39  MinJobAge=43200
>      40  SlurmctldTimeout=300
>      41  SlurmdTimeout=1200
>      42  Waittime=0
>      43  FastSchedule=1
>      44  MaxMemPerCPU=3072
>      45  SchedulerType=sched/backfill
>      46  SchedulerPort=7321
>      47  SelectType=select/cons_res
>      48  SelectTypeParameters=CR_Core_Memory
>      49  PriorityType=priority/multifactor
>      50  PriorityDecayHalfLife=0
>      51  PriorityCalcPeriod=5
>      52  PriorityUsageResetPeriod=MONTHLY
>      53  PriorityFavorSmall=NO
>      54  PriorityMaxAge=14-0
>      55  PriorityWeightAge=20160
>      56  PriorityWeightFairshare=10000
>      57  PriorityWeightJobSize=104
>      58  PriorityWeightPartition=0
>      59  PriorityWeightQOS=400000
>      60  AccountingStorageEnforce=associations,limits,qos
>      61  AccountingStorageHost=kalkyl2
>      62  AccountingStoragePort=7031
>      63  AccountingStorageType=accounting_storage/slurmdbd
>      64  ClusterName=kalkyl
>      65  DebugFlags=NO_CONF_HASH,Priority
>      66  JobCompLoc=/etc/slurm/slurm_jobcomp_logger
>      67  JobCompType=jobcomp/script
>      68  JobAcctGatherFrequency=30
>      69  JobAcctGatherType=jobacct_gather/linux
>      70  SlurmctldDebug=3
>      71  SlurmctldLogFile=/var/log/slurm/slurmctld.log
>      72  SlurmdDebug=3
>      73  SlurmdLogFile=/var/log/slurm/slurmd.log
>      74  NodeName=DEFAULT Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 
> State=UNKNOWN TmpDisk=100000
>      75
>      76  NodeName=q[1-16]    RealMemory=72000 Feature=fat,mem72GB,ibsw1   
> Weight=3
>      77  NodeName=q[17-32]   RealMemory=48000 Feature=fat,mem48GB,ibsw1   
> Weight=2
>      78  NodeName=q[33-64]   RealMemory=24000 Feature=thin,mem24GB,ibsw2  
> Weight=1
>      79  NodeName=q[65-96]   RealMemory=24000 Feature=thin,mem24GB,ibsw3  
> Weight=1
>      80  NodeName=q[97-108]  RealMemory=24000 Feature=thin,mem24GB,ibsw4  
> Weight=1
>      81  NodeName=q[109-140] RealMemory=24000 Feature=thin,mem24GB,ibsw5  
> Weight=1
>      82  NodeName=q[141-172] RealMemory=24000 Feature=thin,mem24GB,ibsw6  
> Weight=1
>      83  NodeName=q[173-204] RealMemory=24000 Feature=thin,mem24GB,ibsw7  
> Weight=1
>      84  NodeName=q[205-216] RealMemory=24000 Feature=thin,mem24GB,ibsw8  
> Weight=1
>      85
>      86  NodeName=q[217-232] RealMemory=24000 Feature=thin,mem24GB,ibsw4  
> Weight=1
>      87
>      88  NodeName=q[233-252] RealMemory=24000 Feature=thin,mem24GB,ibsw8  
> Weight=1
>      89  NodeName=q[253-284] RealMemory=24000 Feature=thin,mem24GB,ibsw9  
> Weight=1
>      90  NodeName=q[285-316] RealMemory=24000 Feature=thin,mem24GB,ibsw10 
> Weight=1
>      91  NodeName=q[317-348] RealMemory=24000 Feature=thin,mem24GB,ibsw11 
> Weight=1
>      92
>      93  PartitionName=all Nodes=q[1-348] Shared=EXCLUSIVE 
> DefaultTime=00:00:01 MaxTime=14400 State=DOWN
>      94  PartitionName=core Nodes=q[45-348] Default=YES Shared=NO 
> MaxTime=14400 MaxNodes=1 State=UP
>      95  PartitionName=node Nodes=q[1-32,45-348] Shared=EXCLUSIVE 
> DefaultTime=00:00:01 MaxTime=14400 State=UP
>      96  PartitionName=devel Nodes=q[33-44] Shared=EXCLUSIVE 
> DefaultTime=00:00:01 MaxTime=60 MaxNodes=4 State=UP
> 

-- 
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/

[slurm-dev] Re: Possible Age Priority bug in SLURM version 2.4.1

Reply via email to