Hi.
I usually create a maintenance reservation with IGNORE_JOBS flag, so I
can avoid new jobs interfering with it. Then I'll contact job owners to
warn 'em I'll kill their jobs if needed.
Actually that's useful only for nodes that allow unlimited time jobs:
for the others it's sufficient to plan in advance (if max run time is
24h, then the reservation should be created more than 24h in advance).
Just my $.02
Diego
Il 07/11/2021 13:45, Carsten Beyer ha scritto:
Hi Ahmad,
you could use squeue -h -t r --format="%i %e" | sort -k2 to get a list
of all running jobs sorted by their endtime.
We use normaly a maintenance reservation with starttime of the
mainenance (or with some leading time before) to get the system free of
jobs. That make things easier, because if you drain your cluster no new
jobs could start. With the reservation jobs with a shorter wallclock
time could be backfilled till the reservation/maintenance starts. You
can put the reservation anytime in the system but at least or before
"<starttime maintenance> minus <longest MaxTime of partition>", e.g.
scontrol create reservation=<name> starttime=<starttime>
duration=<duration> user=root flags=maint nodes=ALL
Hope, that helps a little bit,
Carsten
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786