Hi guys,

I've been testing job preemption and found a bug in implementation of
PreemptMode=off option for partitions. The examples from:
http://slurm.schedmd.com/preempt.html, presents non-preemptable partitions
with PreemtMode=off only when the option is set for the highest priority
partition. In our environment we rather wanted to implement partitions like:
PartitionName=priousers  [...]             Priority=30 PreemptMode=suspend
PartitionName=users     [...]                 Priority=20
PreemptMode=suspend Shared=FORCE:1
PartitionName=external-users   [...]   Priority=10 PreemptMode=off
Shared=FORCE:1

so.. the priority for external-users is lowest, but their jobs won't be
preempted. Whithout patching jobs from external-users partition where
considered to be preempted, and finally killed with:
slurmctld/gang.c :671
 if (rc != SLURM_SUCCESS) {
                        rc = job_signal(job_ptr->job_id, SIGKILL, 0, 0,
true);
                        if (rc == SLURM_SUCCESS)
                                info("preempted job %u had to be killed",
                                     job_ptr->job_id);
                        else {
                                info("preempted job %u kill failure %s",
                                     job_ptr->job_id, slurm_strerror(rc));
                        }
                }


because they cannot be suspended, requeued, etc. because they are not in
partition with appropriate PreemptMode.

The needed change is in select plugin beeing used, in our case it's
cons_res, but checking the code the problem in different plugins will be
similar.
I've changed a condition in:
plugins/select/cons_res/job_test.c:2326 to:
if (p_ptr->part_ptr->priority <= jp_ptr->part_ptr->priority &&
p_ptr->part_ptr->preempt_mode != PREEMPT_MODE_OFF)

Probably this is the only needed change to implement correct behaviour (I
was testing on workstation with 3 partitons), but I'd also recommend
additional change in preemptable_candidates list creation.

However it is implicitly checked  (line 1558 select_cons_res.c) checked, if
preemptable_candidate have PREEMPT_MODE_OFF job:
                        mode = slurm_job_preempt_mode(tmp_job_ptr);
                        if ((mode != PREEMPT_MODE_REQUEUE)    &&
                            (mode != PREEMPT_MODE_CHECKPOINT) &&
                            (mode != PREEMPT_MODE_CANCEL))
                                continue;       /* can't remove job */

I think it's more efficient do not include jobs from partition with
PREEMPT_MODE_OFF in preemptable_candidates list, which can be done  in
(plugins/preempt/partition_prio/preempt_partition_prio.c:113
                if ((job_p->part_ptr == NULL) ||
                    (job_p->part_ptr->priority >=
job_ptr->part_ptr->priority) ||
                        (job_p->part_ptr->preempt_mode=PREEMPT_MODE_OFF))
// add check jobs partmode
                        continue;



I've attached a patch with my changes, it also add some additional debug
with appropriate debug flags. I'was working on 2.6.7, but quick review of
the code shows that there were no chages in 14.03-rc1 version.

cheers,
marcin
===================
Marcin Stolarek
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM),
University of Warsaw, Poland

Attachment: patch
Description: Binary data

Reply via email to