On 06/26/2017 01:24 PM, Loris Bennett wrote:
We have also upgraded in place continuously from 2.2.4 to currently
16.05.10 without any problems. As I mentioned previously, it can be
handy to make a copy of the statesave directory, once the daemons have
been stopped.
However, if you want to know how long the upgrade might take, then yours
is a good approach. What is your use case here? Do you want to inform
the users about the length of the outage with regard to job submission?
I want to be 99.9% sure that upgrading (my first one) will actually
work. I also want to know how roughly long the slurmdbd will be down so
that the cluster doesn't kill all jobs due to timeouts. Better to be
safe than sorry.
I don't expect to inform the users, since the operation is expected to
run smoothly without troubles for user jobs.
Thanks,
Ole
Re: [slurm-dev] Dry run upgrade procedure for the slurmdbd database
We did it in place, worked as noted on the tin. It was less painful
than I expected. TBH, your procedures are admirable, but you shouldn't
worry - it's a relatively smooth process.
cheers
L.
------
"Mission Statement: To provide hope and inspiration for collective action, to build
collective power, to achieve collective transformation, rooted in grief and rage but
pointed towards vision and dreams."
- Patrisse Cullors, Black Lives Matter founder
On 26 June 2017 at 20:04, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote:
We're planning to upgrade Slurm 16.05 to 17.02 soon. The most critical step
seems to me to be the upgrade of the slurmdbd database, which may also take
tens of minutes.
I thought it's a good idea to test the slurmdbd database upgrade locally on a
drained compute node in order to verify both correctness and the time required.
I've developed the dry run upgrade procedure documented in the Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
Question 1: Would people who have real-world Slurm upgrade experience kindly
offer comments on this procedure?
My testing was actually successful, and the database conversion took less
than 5 minutes in our case.
A crucial step is starting the slurmdbd manually after the upgrade. But how
can we be sure that the database conversion has been 100% completed?
Question 2: Can anyone confirm that the output "slurmdbd: debug2: Everything
rolled up" indeed signifies that conversion is complete?
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Manager of IT services
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620