On 04-02-2022 08:59, Bjørn-Helge Mevik wrote:
Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:

As Brian Andrus said, you must upgrade Slurm by at most 2 major
versions, and that includes slurmd's as well!  Don't do a "direct
upgrade" of slurmd by more than 2 versions!

That should only be an issue if you have running jobs during the
upgrade, shouldn't it?  As I understand it, without any running jobs,
you can do pretty much what you want on the compute nodes.  Or am I
missing something here?

I think that Slurm's communication protocol is incompatible when versions differ by more than 2. So the slurmd daemons may possibly lose contact with the slurmctld in that case.

In my experience, it's not a problem to upgrade slurmd while the nodes are running jobs: Upgrade the slurmd RPM, and slurmd will restart itself and attach to the running jobs. There are probably cases where this will cause job crashes, so please heed the information collected in the Wiki page https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-on-centos-7
There may be some issues with MPI applications as mentioned in the Wiki.

/Ole

Reply via email to