On 8/25/21 10:48 AM, Julien Tailleur wrote:
We have been running a computing cluster using slurm since 2016, that I installed back then, with some help from others. I was pretty late on upgrades and decided to upgrade the cluster up to debian Bullseye, which runs slurm 20.11.7, starting from stretch, that runs slurm 16.05.9.

SchedMD documents that upgrades must be at most 2 major versions, see https://slurm.schedmd.com/quickstart_admin.html#upgrade. So you would have to go through 16.05 -> 17.02 -> 18.08 -> 20.02 -> 20.11 (soon 21.08 will be out). Whether you can find Debian packages for these old versions is unknown to me.

I have collected some Slurm upgrading information in
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
It's written for CentOS, but the Slurm parts would be the same.

While the update of the system in itself went smoothly, slurm is broken. Of course, that's the stage at which I thought "Oh, I should have checked if the upgrade is supposed to be harmless"... Now that's the self-bashing is rightfully done, I would be very happy with some help! I hesitate between two strategies: removing slurm completely and a completely new installation, or trying to save what can be saved... I am tempted by the former since I remember suffering a bit to get the installation right in the first place...

A useable database dump from the old 16.05 is vital! You could start again with Slurm 16.05 and upgrade in 4 steps as indicated above.

Beware of potential database issues:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older

If the 4-step upgrade doesn't work, starting from scratch seems to be the only option :-( My Slurm Wiki page may perhaps be of a little help: https://wiki.fysik.dtu.dk/niflheim/SLURM

/Ole

Reply via email to