Brian / Christopher, that looks like a good process, thanks guys, I will do some testing and let you know.
if I mark a partition down and it has running jobs, what happens to those jobs, do they keep running? Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/ On Tue, Feb 1, 2022 at 3:27 PM Brian Andrus <toomuc...@gmail.com> wrote: > One possibility: > > Sounds like your concern is folks with interactive jobs from the login > node that are running under screen/tmux. > > That being the case, you need running jobs to end and not allow new users > to start tmux sessions. > > Definitely doing 'scontrol update state=down partition=xxxx' for each > partition. Also: > > touch /etc/nologin > > That will prevent new logins. > > Send a message to all active folks > > wall "system going down at XX:XX, please end your sessions" > > Then wait for folks to drain off your login node and do your stuff. > > When done, remove the /etc/nologin file and folks will be able to login > again. > > Brian Andrus > On 1/31/2022 9:18 PM, Sid Young wrote: > > > > > Sid Young > W: https://off-grid-engineering.com > W: (personal) https://sidyoung.com/ > W: (personal) https://z900collector.wordpress.com/ > > > On Tue, Feb 1, 2022 at 3:02 PM Christopher Samuel <ch...@csamuel.org> > wrote: > >> On 1/31/22 4:41 pm, Sid Young wrote: >> >> > I need to replace a faulty DIMM chim in our login node so I need to >> stop >> > new jobs being kicked off while letting the old ones end. >> > >> > I thought I would just set all nodes to drain to stop new jobs from >> > being kicked off... >> >> That would basically be the way, but is there any reason why compute >> jobs shouldn't start whilst the login node is down? >> > > My concern was to keep the running jobs going and stop new jobs, so when > the last running job ends, > I could reboot the login node knowing that any terminal windows > "screen"/"tmux" sessions would effectively > have ended as the job(s) had now ended > > I'm not sure if there was an accepted procedure or best practice way to > tackle shutting down the Login node for this use case. > > On the bright side I am down to two jobs left so any day now :) > > Sid > > > > >> All the best, >> Chris >> -- >> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA >> >>