Re: [slurm-users] How to checkout a slurm node?

2021-11-12 Thread Gerhard Strangar
Joe Teumer wrote: > However, if the user needs to reboot the node, set BIOS settings, etc then > `salloc` automatically terminates the allocation when the new shell is What kind of BIOS settings would a user need to change?

Re: [slurm-users] How to checkout a slurmnode?

2021-11-12 Thread gilles
Hey Joe, Have you considered using a reservation? An operator can reserve a (set of) nodes for a given time, and as a user, you would simply submit your jobs within this reservation. Depending on your system configuration, a node might be marked as down if you reboot it, and an operator

Re: [slurm-users] How to checkout a slurm node?

2021-11-12 Thread Brian Andrus
I don't think slum does what you think it does. It manages the resources and schedule, not the actual hardware of a node. You are likely looking for something more along a hypervisor (if you are doing VMs) or remote KVM (since you are mentioning BIOS access). Brian Andrus On 11/12/2021 2:00

[slurm-users] Slurm BoF and booth at SC21

2021-11-12 Thread Tim Wickberg
The Slurm Birds-of-a-Feather session will be held virtually on Thursday, November at 12:15 - 1:15pm (Central). This is conducted through the SC21 HUBB platform, and you will need to have registered in some capacity through the conference to be able to participate live. We'll be reviewing the

[slurm-users] How to checkout a slurm node?

2021-11-12 Thread Joe Teumer
Hello! How best for a user to check out a slurm node? Unfortunately, command 'salloc' doesn't appear to meet this need. Command `salloc --nodelist some_node --time 3:00:00` This gives the user a new shell and the user can use `srun` to start an interactive session. However, if the user needs

Re: [slurm-users] enable_configless, srun and DNS vs. hosts file

2021-11-12 Thread Paul Brunk
Hi: We run configless. If we add a node to slurm.conf and don't restart slurmd on our submit nodes, then attempts to submit to that new node will get the error you saw. Restarting slurmd on the submit node fixes it. This is the documented behavior (adding nodes needs slurmd restarted