Hi.

I usually create a maintenance reservation with IGNORE_JOBS flag, so I can avoid new jobs interfering with it. Then I'll contact job owners to warn 'em I'll kill their jobs if needed. Actually that's useful only for nodes that allow unlimited time jobs: for the others it's sufficient to plan in advance (if max run time is 24h, then the reservation should be created more than 24h in advance).

Just my $.02

Diego

Il 07/11/2021 13:45, Carsten Beyer ha scritto:
Hi Ahmad,

you could use squeue -h -t r --format="%i %e" | sort -k2 to get a list of all running jobs sorted by their endtime.

We use normaly a maintenance reservation with starttime of the mainenance (or with some leading time before) to get the system free of jobs. That make things easier, because if you drain your cluster no new jobs could start. With the reservation jobs with a shorter wallclock time could be backfilled till the reservation/maintenance starts. You can put the reservation anytime in the system but at least or before "<starttime maintenance> minus <longest MaxTime of partition>", e.g.

scontrol create reservation=<name> starttime=<starttime> duration=<duration>  user=root flags=maint nodes=ALL

Hope, that helps a little bit,

Carsten


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to