On 06/26/2017 01:24 PM, Loris Bennett wrote:
We have also upgraded in place continuously from 2.2.4 to currently
16.05.10 without any problems.  As I mentioned previously, it can be
handy to make a copy of the statesave directory, once the daemons have
been stopped.

However, if you want to know how long the upgrade might take, then yours
is a good approach.  What is your use case here?  Do you want to inform
the users about the length of the outage with regard to job submission?

I want to be 99.9% sure that upgrading (my first one) will actually work. I also want to know how roughly long the slurmdbd will be down so that the cluster doesn't kill all jobs due to timeouts. Better to be safe than sorry.

I don't expect to inform the users, since the operation is expected to run smoothly without troubles for user jobs.

Thanks,
Ole

Re: [slurm-dev] Dry run upgrade procedure for the slurmdbd database

We did it in place, worked as noted on the tin. It was less painful
than I expected. TBH, your procedures are admirable, but you shouldn't
worry - it's a relatively smooth process.

cheers
L.

------
"Mission Statement: To provide hope and inspiration for collective action, to build 
collective power, to achieve collective transformation, rooted in grief and rage but 
pointed towards vision and dreams."

- Patrisse Cullors, Black Lives Matter founder

On 26 June 2017 at 20:04, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote:

  We're planning to upgrade Slurm 16.05 to 17.02 soon. The most critical step 
seems to me to be the upgrade of the slurmdbd database, which may also take 
tens of minutes.

  I thought it's a good idea to test the slurmdbd database upgrade locally on a 
drained compute node in order to verify both correctness and the time required.

  I've developed the dry run upgrade procedure documented in the Wiki page 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm

  Question 1: Would people who have real-world Slurm upgrade experience kindly 
offer comments on this procedure?

  My testing was actually successful, and the database conversion took less 
than 5 minutes in our case.

  A crucial step is starting the slurmdbd manually after the upgrade. But how 
can we be sure that the database conversion has been 100% completed?

  Question 2: Can anyone confirm that the output "slurmdbd: debug2: Everything 
rolled up" indeed signifies that conversion is complete?

  Thanks,
  Ole




--
Ole Holm Nielsen
PhD, Manager of IT services
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620

Reply via email to