2015-08-12 19:46 GMT+02:00 Trevor Gale <tre...@snowhaven.com>: > Thank you for your reply! > > I found that the error was being caused by the var/log/* directories being > excluded, as well as the hostname being changed on the node when I switched > to Warewulf. I thought about using the file store to provision the > slurm.conf, but I ended up adding it to my NFS exports and just mounting > it. I am using a separate network for NFS/WW so my applications still have > exclusive use of the IB. >
Thanks, > Trevor > > On Aug 12, 2015, at 12:51 PM, James Armstrong <j.armstr...@rockfield.co.uk> > wrote: > > Trevor, > I also have a warewulf provisioned cluster, and I have noticed that the > default rule when creating a vnfs is to exclude all /var/log/* directories > (/etc/warewulf/vnfs.conf) and I don't think the slurmd executable will > create it if doesn't exist. I have implemented various solutions from > editing the init.d slurm script to create the log directory to using the > wwsh file system to create it. The simplest solution is to just edit the > slurm.conf to point the log file to somewhere else (an NFS mount) or not > have it at all. I would recommend against having it write to somewhere on > the VNFS as this will be a drain on your available ram. > > You also need to add /etc/slurm/slurm.conf to the wwsh file system to > keep all your nodes updated. > > Hope this helps > > James. > > ------------------------------ > *From: *"Trevor Gale" <tre...@snowhaven.com> > *To: *"slurm-dev" <slurm-dev@schedmd.com> > *Sent: *Friday, 7 August, 2015 5:21:25 PM > *Subject: *[slurm-dev] problem starting slurm on stateless node > > > Hello all, > > I’m working on a small test cluster (2 nodes linked with eth and IB) and > am trying to install slurm on them. I have installed Slurm numerous times > on a normal system but I am having issues starting the slurm service on the > compute node. I am using werewulf to boot my nodes statelessly, so I > installed munge and slurm in a chroot on my head node and then provisioned > it to my compute node. When my compute node boots, the munge daemon is > running, but when I try to start the slurm daemon I get no output. Also, if > I query the status of the slurm daemon I get no output. However, if I run > “slurmd -C” I see the expected output of all the resources on my node. My > head nodes ctl daemon is running but It cannot connect to the compute nodes > daemon. I also have the exact users and slurm.conf on both nodes (they are > mounted with NFS). > > Can't you just start slurmd on foreground and check why it's not starting? Because probably that it's what you call "no output"? cheers, marcin > My slurm.conf specifies to create log files in /var/log/slurm, but this > folder was not created even though the slurm daemon appears to be running. > I’m guessing there is some sort of issue with ownership of the slurm files > that is causing this. When I installed munge I had to go through and fix > the owners on a number of files and directories. Does anyone have any > indication of what files might cause this? > > Thanks, > Trevor > > >