(most likely in the next year). My reaction is that Slurm very rarely
provides an estimated start time for a job. I understand that this is not
possible for jobs on hold and dependent jobs.

it's also not possible if both running and queued jobs lack definite termination times; do yours?

my understanding is the following:
the main scheduler does not perform forward planning.
that is, it is opportunistic.  it walks the list of priority-sorted
pending jobs, starting any which can run on currently free
(or preemptable) resources.

the backfill scheduler is a secondary, asynchronous loop that tries hard
not to interfere with the main scheduler (severely throttles itself)
and tries to place start times for pending jobs.

the main issue with forward scheduling is that if high-prio jobs become
runnable (submitted, off hold, dependency-satisfied), then most of the (tentative) start times probably need to be removed.

a quick look at plugins/sched/backfill/backfill.c indicates that things are /complicated/ ;)

we (ComputeCanada) don't see a lot of forward start times either.

I also would welcome discussion of how to tune the backfill scheduler!
I suspect that in order to work well, it needs a particular distribution
of job priorities.

regards, mark hahn.

Reply via email to