Re: [slurm-users] Questions about adding new nodes to Slurm

Ole Holm Nielsen Tue, 04 May 2021 06:53:47 -0700

The task of adding or removing nodes from Slurm is well documented anddiscussed in SchedMD presentations, please see my Wiki pagehttps://wiki.fysik.dtu.dk/niflheim/SLURM#add-and-remove-nodes


/Ole



On 04-05-2021 14:47, Tina Friedrich wrote:

Not sure if that's changed but aren't there cases where 'scontrolreconfigure' isn't sufficient? (Like adding nodes?)
But yes, that's my point exactly; it is a pretty basic day to day taskto update slurm.conf, not some daunting operation that requires adowntime or anything like it. (I remember this requirement to update theconfig file everywhere & restart everything sounding like a major taskthat requires announcements & downtimes to me when I started with SLURM- coming from Grid Engine - and it took me while to figure out, andtrust, that an update to slurm.conf is a very minor task, and not arisky one really :) ))
Tina

On 04/05/2021 13:32, Sid Young wrote:
You can push a new conf file and issue an "scontrol reconfigure" onthe fly as needed... I do it on our cluster as needed, do the nodesfirst then login nodes then the slurm controller... you are making ahuge issue of a very basic task...
Sid
On Tue, 4 May 2021, 22:28 Tina Friedrich, <tina.friedr...@it.ox.ac.uk<mailto:tina.friedr...@it.ox.ac.uk>> wrote:
    Hello,

    a lot of people already gave very good answer to how to tackle this.

    Still, I thought it worth pointing this out - you said 'you need to
    basically shut down slurm, update the slurm.conf file, then restart'.
That makes it sound like a major operation with lots of preprequired.
It's not like that at all. Updating slurm.conf is not a majoroperation.
    There's absolutely no reason to shut things down first & then change
    the
file. You can edit the file / ship out a new version (however youlike)
    and then restart the daemons.
The daemons do not have to all be restarted simultaneously. It isof no consequence if they're running with out-of-sync config files for abit, really. (There's a flag you can set if you want to suppress thewarning
    - 'NO_CONF_HASH' debug flag I think).

    Restarting the dameons (slurmctld, slurmd, ...) is safe. It does not
    require cluster downtime or anything.

    I control slurm.conf using configuration management; the config
management process restarts the appropriate daemon (slurmctld,slurmd, slurmdbd) if the file changed. This certainly never happens at thesame time; there's splay in that. It doesn't even necessarily happen onthe
    controller first, or anything like that.

    What I'm trying to get across - I have a feeling this 'updating the
cluster wide config file' and 'file must be the same on all nodes'is a
    lot less of a procedure (and a lot less strict) than you currently
    imagine it to be :)

    Tina

    On 27/04/2021 19:35, David Henkemeyer wrote:
     > Hello,
     >
> I'm new to Slurm (coming from PBS), and so I will likely have afew
     > questions over the next several weeks, as I work to transition my
     > infrastructure from PBS to Slurm.
     >
> My first question has to do with *_adding nodes to Slurm_*.According
     > to the FAQ (and other articles I've read), you need to basically
    shut
     > down slurm, update the slurm.conf file /*on all nodes in the
    cluster*/,
     > then restart slurm.
     >
     > - Why do all nodes need to know about all other nodes?  From what
    I have
     > read, its Slurm does a checksum comparison of the slurm.conf file
    across
> all nodes. Is this the only reason all nodes need to knowabout all
     > other nodes?
     > - Can I create a symlink that points <sysconfdir>/slurm.conf to a
     > slurm.conf file on an NFS mount point, which is mounted on all the
     > nodes?  This way, I would only need to update a single file, then
     > restart Slurm across the entire cluster.
     > - Any additional help/resources for adding/removing nodes to
    Slurm would
     > be much appreciated.  Perhaps there is a "toolkit" out there to
    automate
     > some of these operations (which is what I already have for PBS,
    and will
     > create for Slurm, if something doesn't already exist).
     >
     > Thank you all,
     >
     > David

    --     Tina Friedrich, Advanced Research Computing Snr HPC Systems
    Administrator

Re: [slurm-users] Questions about adding new nodes to Slurm

Reply via email to