On Friday, 18 December 2020 10:10:19 AM PST Jason Simms wrote: > Thanks to several helpful members on this list, I think I have a much better > handle on how to upgrade Slurm. Now my question is, do most of you upgrade > with each major release?
We do, though not immediately and not without a degree of testing on our test systems. One of the big reasons for us upgrading is that we've usually paid for features in Slurm for our needs (for example in 20.11 that includes scrontab so users won't be tied to favourite login nodes, as well as the experimental RPC queue code due to the large numbers of RPCs our systems need to cope with). I also keep an eye out for discussions of what other sites find with new releases too, so I'm following the current concerns about 20.11 and the change in behaviour for job steps that do (expanding NVIDIA's example slightly): #SBATCH --exclusive #SBATCH -N2 srun --ntasks-per-node=1 python multi_node_launch.py which (if I'm reading the bugs correctly) fails in 20.11 as that srun no longer gets all the allocated resources, instead just gets the default of --cpus-per-task=1 instead, which also affects things like mpirun in OpenMPI built with Slurm support (as it effectively calls "srun orted" and that "orted" launches the MPI ranks, so in 20.11 it only has access to a single core for them all to fight over). Again - if I'm interpreting the bugs correctly! I don't currently have a test system that's free to try 20.11 on, but hopefully early in the new year I'll be able to test this out to see how much of an impact this is going to have and how we will manage it. https://bugs.schedmd.com/show_bug.cgi?id=10383 https://bugs.schedmd.com/show_bug.cgi?id=10489 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA