On 25 October 2016 at 08:42, Tuo Chen Peng <tp...@nvidia.com> wrote:

> Hello all,
>
> This is my first post in the mailing list - nice to join the community!
>

Welcome!


>
>
> I have a general question regarding slurm partition change:
>
> If I move one node from one partition to the other, will it cause any
> impact to the jobs that are still running on other nodes, in both
> partitions?
>
>
No, it shouldn't, depending on how you execute the plan...


> But we would like to do this without interrupting existing, running jobs.
>
> What would be the safe way to do this?
>
>
>
> And here’s my plan:
>
> (1) drain the node in main partition for the move, and only drain that
> node - keep other nodes available for job submission.
>
> (2) move node from main partition to short job partition
>
> (2.1) update slurm.conf on both control node and node to be moved, so that
> this node is listed under short job partition
>
> (2.2) Run scontrol update on both control node and node just moved, to let
> slurm pick up configuration change.
>
> (3) node should now be moved to short job partition, set the node back to
> normal / idle state.
>
>
>
> Is “scontrol update” the right command to use in this case?
>
> Does anyone see any impact / concern in above sequence?
>
> I’m mostly worried mostly about whether such partition change could cause
> user’s existing jobs to be killed or fail for some reason.
>

Looks correct except for 2.2 - my understanding is that you would need to
restart the slurmctld process (`systemctl restart slurm`) at this point -
which is the point the slurm "head" node picks up the changes to the
slurm.conf - and  then `scontrol reconfigure` to distribute that change to
the nodes.


Cheers
L.

Reply via email to