On 25 October 2016 at 08:42, Tuo Chen Peng <tp...@nvidia.com> wrote: > Hello all, > > This is my first post in the mailing list - nice to join the community! >
Welcome! > > > I have a general question regarding slurm partition change: > > If I move one node from one partition to the other, will it cause any > impact to the jobs that are still running on other nodes, in both > partitions? > > No, it shouldn't, depending on how you execute the plan... > But we would like to do this without interrupting existing, running jobs. > > What would be the safe way to do this? > > > > And here’s my plan: > > (1) drain the node in main partition for the move, and only drain that > node - keep other nodes available for job submission. > > (2) move node from main partition to short job partition > > (2.1) update slurm.conf on both control node and node to be moved, so that > this node is listed under short job partition > > (2.2) Run scontrol update on both control node and node just moved, to let > slurm pick up configuration change. > > (3) node should now be moved to short job partition, set the node back to > normal / idle state. > > > > Is “scontrol update” the right command to use in this case? > > Does anyone see any impact / concern in above sequence? > > I’m mostly worried mostly about whether such partition change could cause > user’s existing jobs to be killed or fail for some reason. > Looks correct except for 2.2 - my understanding is that you would need to restart the slurmctld process (`systemctl restart slurm`) at this point - which is the point the slurm "head" node picks up the changes to the slurm.conf - and then `scontrol reconfigure` to distribute that change to the nodes. Cheers L.