I am trying the backfill scheduler without success.
I just want to test it with the most simple configuration possible* (see
slurm.conf at the end).
7 homogenous nodes, 12 CPU per node
I submit three jobs and the last should be backfilled, but ... :
$ sbatch --nice=0 -N 5 -c 12 --time-min="09:00" --time="10:00" ~/slurm/job.sh
Submitted batch job 65574
$ sbatch --nice=0 -N 5 -c 12 --time-min="09:00" --time="10:00" ~/slurm/job.sh
Submitted batch job 65575
sbatch --nice=0 -N 1 -c 12 --time-min="00:40" --time="01:00" ~/slurm/job.sh
Submitted batch job 65576
$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
65575 prod.q job.sh hummelm PD 0:00 5 (Resources)
65576 prod.q job.sh hummelm PD 0:00 1 (Priority)
65574 prod.q job.sh hummelm R 7:54 5 OGSE[1-5]
I hope someone here can show me the error I've made, thks.
(* slurm.conf )
ControlMachine=OGSE1
#
AuthType=auth/munge
CryptoType=crypto/munge
MailProg=/bin/mail
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SchedulerParameters=bf_interval=20,bf_resolution=10
SchedulerPort=7321
##### Round robin select for nodes
#SelectType=select/cons_res
#SelectTypeParameters=CR_LLN
#
#
# JOB PRIORITY
PriorityType=priority/multifactor
PriorityWeightPartition=1000
############
#
#Preemption
#PreemptMode=REQUEUE
#PreemptType=preempt/partition_prio
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
DebugFlags=Backfill
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=6
SlurmdDebug=1
#
# COMPUTE NODES
NodeName=OGSE[1-7] CPUs=12 State=UNKNOWN
PartitionName=prod.q Nodes=OGSE[1-7] Default=YES MaxTime="01:00:00" State=UP
Priority=10
PartitionName=urgent.q Nodes=OGSE[1-7] Default=NO MaxTime="01:00:00" State=UP
Priority=20
[@@ THALES GROUP INTERNAL @@]