I think i've found the problem it comes from a comparison between integers of 
different types here is a patch which solve the problem :

--- slurm-slurm-13-12-0-0pre4/src/plugins/sched/backfill/backfill.c.org 
2013-11-18 17:56:09.741413223 +0100
+++ slurm-slurm-13-12-0-0pre4/src/plugins/sched/backfill/backfill.c     
2013-11-18 17:57:42.903468026 +0100
@@ -712,7 +712,7 @@
                        continue;       /* started in other partition */
                if (!avail_front_end(job_ptr))
                        continue;       /* No available frontend for this job */
-               if (job_ptr->array_task_id != (uint16_t) NO_VAL) {
+               if (job_ptr->array_task_id != (uint32_t) NO_VAL) {
                        if (reject_array_job_id == job_ptr->array_job_id)
                                continue;  /* already rejected array element */
                        /* assume reject whole array for now, clear if OK */


 
Regards,

[@@ THALES GROUP INTERNAL @@]

De : HUMMEL Michel [mailto:[email protected]] 
Envoyé : lundi 18 novembre 2013 16:32
À : slurm-dev
Objet : [slurm-dev] unable to configure backfill

I am trying the backfill scheduler without success.
I just want to test it with the most simple configuration possible* (see 
slurm.conf at the end).
7 homogenous nodes, 12 CPU per node

I submit three jobs and the last should be backfilled, but … :
$ sbatch --nice=0 -N 5  -c 12 --time-min="09:00" --time="10:00" ~/slurm/job.sh 
Submitted batch job 65574
$ sbatch --nice=0 -N 5  -c 12 --time-min="09:00" --time="10:00" ~/slurm/job.sh
Submitted batch job 65575
sbatch --nice=0 -N 1  -c 12 --time-min="00:40" --time="01:00" ~/slurm/job.sh
Submitted batch job 65576
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
             65575    prod.q   job.sh  hummelm PD       0:00      5 (Resources)
             65576    prod.q   job.sh  hummelm PD       0:00      1 (Priority)
             65574    prod.q   job.sh  hummelm  R       7:54      5 OGSE[1-5]

I hope someone here can show me the error I’ve made, thks.

(* slurm.conf )
ControlMachine=OGSE1
#
AuthType=auth/munge
CryptoType=crypto/munge
MailProg=/bin/mail
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none

InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SchedulerParameters=bf_interval=20,bf_resolution=10
SchedulerPort=7321
##### Round robin select for nodes
#SelectType=select/cons_res
#SelectTypeParameters=CR_LLN
#
#
# JOB PRIORITY
PriorityType=priority/multifactor
PriorityWeightPartition=1000
############
#
#Preemption
#PreemptMode=REQUEUE
#PreemptType=preempt/partition_prio
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none

ClusterName=cluster
DebugFlags=Backfill
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=6
SlurmdDebug=1
#
# COMPUTE NODES
NodeName=OGSE[1-7] CPUs=12 State=UNKNOWN
PartitionName=prod.q Nodes=OGSE[1-7] Default=YES MaxTime="01:00:00" State=UP 
Priority=10
PartitionName=urgent.q Nodes=OGSE[1-7] Default=NO MaxTime="01:00:00" State=UP 
Priority=20


[@@ THALES GROUP INTERNAL @@]


Reply via email to