[slurm-dev] Re: problem starting slurm on stateless node

Marcin Stolarek Wed, 12 Aug 2015 14:39:43 -0700

2015-08-12 19:46 GMT+02:00 Trevor Gale <tre...@snowhaven.com>:

> Thank you for your reply!
>
> I found that the error was being caused by the var/log/* directories being
> excluded, as well as the hostname being changed on the node when I switched
> to Warewulf. I thought about using the file store to provision the
> slurm.conf, but I ended up adding it to my NFS exports and just mounting
> it. I am using a separate network for NFS/WW so my applications still have
> exclusive use of the IB.
>




Thanks,
> Trevor
>
> On Aug 12, 2015, at 12:51 PM, James Armstrong <j.armstr...@rockfield.co.uk>
> wrote:
>
> Trevor,
>   I also have a warewulf provisioned cluster, and I have noticed that the
> default rule when creating a vnfs is to exclude all /var/log/* directories
> (/etc/warewulf/vnfs.conf) and I don't think the slurmd executable will
> create it if doesn't exist. I have implemented various solutions from
> editing the init.d slurm script to create the log directory to using the
> wwsh file system to create it. The simplest solution is to just edit the
> slurm.conf to point the log file to somewhere else (an NFS mount) or not
> have it at all. I would recommend against having it write to somewhere on
> the VNFS as this will be a drain on your available ram.
>
>   You also need to add /etc/slurm/slurm.conf to the wwsh file system to
> keep all your nodes updated.
>
> Hope this helps
>
> James.
>
> ------------------------------
> *From: *"Trevor Gale" <tre...@snowhaven.com>
> *To: *"slurm-dev" <slurm-dev@schedmd.com>
> *Sent: *Friday, 7 August, 2015 5:21:25 PM
> *Subject: *[slurm-dev] problem starting slurm on stateless node
>
>
> Hello all,
>
> I’m working on a small test cluster (2 nodes linked with eth and IB) and
> am trying to install slurm on them. I have installed Slurm numerous times
> on a normal system but I am having issues starting the slurm service on the
> compute node. I am using werewulf to boot my nodes statelessly, so I
> installed munge and slurm in a chroot on my head node and then provisioned
> it to my compute node. When my compute node boots, the munge daemon is
> running, but when I try to start the slurm daemon I get no output. Also, if
> I query the status of the slurm daemon I get no output. However, if I run
> “slurmd -C” I see the expected output of all the resources on my node. My
> head nodes ctl daemon is running but It cannot connect to the compute nodes
> daemon. I also have the exact users and slurm.conf on both nodes (they are
> mounted with NFS).
>
> Can't you just start slurmd on foreground and check why it's not starting?
Because probably that it's what you call "no output"?

cheers,
marcin




> My slurm.conf specifies to create log files in /var/log/slurm, but this
> folder was not created even though the slurm daemon appears to be running.
> I’m guessing there is some sort of issue with ownership of the slurm files
> that is causing this. When I installed munge I had to go through and fix
> the owners on a number of files and directories. Does anyone have any
> indication of what files might cause this?
>
> Thanks,
> Trevor
>
>
>

[slurm-dev] Re: problem starting slurm on stateless node

Reply via email to