On 8/25/21 10:48 AM, Julien Tailleur wrote:
We have been running a computing cluster using slurm since 2016, that I
installed back then, with some help from others. I was pretty late on
upgrades and decided to upgrade the cluster up to debian Bullseye, which
runs slurm 20.11.7, starting from stretch, that runs slurm 16.05.9.
SchedMD documents that upgrades must be at most 2 major versions, see
https://slurm.schedmd.com/quickstart_admin.html#upgrade. So you would
have to go through 16.05 -> 17.02 -> 18.08 -> 20.02 -> 20.11 (soon 21.08
will be out). Whether you can find Debian packages for these old versions
is unknown to me.
I have collected some Slurm upgrading information in
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
It's written for CentOS, but the Slurm parts would be the same.
While the update of the system in itself went smoothly, slurm is broken.
Of course, that's the stage at which I thought "Oh, I should have checked
if the upgrade is supposed to be harmless"... Now that's the self-bashing
is rightfully done, I would be very happy with some help! I hesitate
between two strategies: removing slurm completely and a completely new
installation, or trying to save what can be saved... I am tempted by the
former since I remember suffering a bit to get the installation right in
the first place...
A useable database dump from the old 16.05 is vital! You could start
again with Slurm 16.05 and upgrade in 4 steps as indicated above.
Beware of potential database issues:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older
If the 4-step upgrade doesn't work, starting from scratch seems to be the
only option :-( My Slurm Wiki page may perhaps be of a little help:
https://wiki.fysik.dtu.dk/niflheim/SLURM
/Ole