Re: [slurm-users] Reservation vs. Draining for Maintenance?

Ole Holm Nielsen Thu, 06 Aug 2020 11:07:22 -0700

On 06-08-2020 19:13, Jason Simms wrote:

Later this month, I will have to bring down, patch, and reboot all nodesin our cluster for maintenance. The two options available to set nodesinto a maintenance mode seem to be either: 1) creating a system-widereservation, or 2) setting all nodes into a DRAIN state.
I'm not sure it really matters either way, but is there any preferenceone way or the other? Any gotchas I should be aware of?

I'd recommend using a reservation because you can define a specificmaintenance period way ahead of time. You ought to create thereservation in advance, before the greatest MaxTime for all partitionsin slurm.conf, so that you won't have any remaining running jobs whenthe reservation sets in. Jobs can then continue to run until the verylast minute!


I have some notes on reservations in
https://wiki.fysik.dtu.dk/niflheim/SLURM#resource-reservation

Draining nodes is a bad idea, IMHO, because you'll have a lot of drainednodes from now and until your maintenance period, causing lost resources.

The way I prefer to do upgrades is actually neither 1) nor 2). I makerolling (minor) upgrades of the compute node OS and firmware while thecluster is in full production in order to avoid lost resources. I willpost my upgrade script to this list in a separate message.


/Ole

Re: [slurm-users] Reservation vs. Draining for Maintenance?

Reply via email to