On 03-02-2022 16:37, Nathan Smith wrote:
Yes, we are running slurmdbd. We could arrange enough downtime to do an incremental upgrade of major versions as Brian Andrus suggested, at least on the slurmctld and slurmdbd systems. The slurmds I would just do a direct upgrade once the scheduler work was completed.

As Brian Andrus said, you must upgrade Slurm by at most 2 major versions, and that includes slurmd's as well! Don't do a "direct upgrade" of slurmd by more than 2 versions!

I recommend separate physical servers for slurmdbd and slurmctld. Then you can upgrade slurmdbd without taking the cluster offline. It's OK for slurmdbd to be down for many hours, since slurmctld caches the state information in the meantime.

I've described the Slurm upgrade process in detail in my Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm

Since you start from 17.02, you have to be extremely cautious when upgrading the database! See the Wiki page for details. Make sure to test the database upgrade on a test server, using a database dump in stead of the real slurmdbd server.

I hope this helps.

/Ole

*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of *Brian Haymore
*Sent:* Wednesday, February 2, 2022 1:51 PM
*To:* slurm-us...@schedmd.com; Slurm User Community List <slurm-users@lists.schedmd.com> *Subject:* [EXTERNAL] Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

Are you running slurmdbd in your current setup?  If you are then the upgrade path there might have additional considerations moving this far in versions.

--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax: 801-585-5366
http://bit.ly/1HO1N2C <https://urldefense.com/v3/__http:/bit.ly/1HO1N2C__;!!Mi0JBg!eqxyactyQJqJ7Bwy-LEQT4WeJrmjDkqZxfwNtCBk_zliQifvEt1RQj4RYjUwe98$>

------------------------------------------------------------------------

*From:*slurm-users <slurm-users-boun...@lists.schedmd.com <mailto:slurm-users-boun...@lists.schedmd.com>> on behalf of Nathan Smith <smina...@ohsu.edu <mailto:smina...@ohsu.edu>>
*Sent:* Wednesday, February 2, 2022 2:38 PM
*To:* slurm-us...@schedmd.com <mailto:slurm-us...@schedmd.com> <slurm-us...@schedmd.com <mailto:slurm-us...@schedmd.com>> *Subject:* [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information


The "Upgrades" section of the quick-start guide [0] warns:

 > Slurm permits upgrades to a new major release from the past two major
 > releases, which happen every nine months (e.g. 20.02.x or 20.11.x to
 > 21.08.x) without loss of jobs or other state information. State
 > information from older versions will not be recognized and will be
 > discarded, resulting in loss of all running and pending jobs.

We are planning for an upgrade from 17.02.11 to 21.08.2. As a part of
our upgrade procedure we'd be bringing the scheduler to full stop, so
the loss of running and pending jobs would not be a concern. Is there
anything more to state information than running and pending jobs? For
example, would the JobID count revert to 1 in the case of such an
upgrade?

[0] https://slurm.schedmd.com/quickstart_admin.html#upgrade <https://urldefense.com/v3/__https:/slurm.schedmd.com/quickstart_admin.html*upgrade__;Iw!!Mi0JBg!eqxyactyQJqJ7Bwy-LEQT4WeJrmjDkqZxfwNtCBk_zliQifvEt1RQj4RNExvAfw$>

--
Nathan Smith
Research Systems Engineer
Advanced Computing Center
Oregon Health & Science University


Reply via email to