Based on your slurmd logs—from the excerpt—slurmd is failing because it's not running as slurm user. In your config file, set SlurmUser=slurm and comment out the SlurmdUser=slurm line.
Otherwise, for further troubleshooting, please attach your slurmctld(from the head node) and slurmdbd log files. On Thu, Nov 5, 2015 at 12:08 AM, Dennis Mungai <[email protected]> wrote: > > > Hello there, > > > > We recently deployed SLURM for a Bioinformatics cluster at KEMRI-Wellcome > Trust, Kilifi, kenya, and after following the setup guide and the online > configurator ( to build the configuration file), here are the errors we ran > ino: > > > > 1. None of the slurmd daemons on either node will start up. > > 2. Apparently, slurmdbd starts up correctly and allowed us to > register the cluster. > > Here’s the debug information available at the moment: > > 1. 1. An excerpt from the logs: > > > > less /var/log/slurm/slurmd.log | tail > > [2015-11-04T22:33:01.629] fatal: You are running slurmd as something other > than user slurm(564). If you want to run as this user add SlurmdUser=root > to the slurm.conf file. > > [2015-11-04T22:36:22.663] Node configuration differs from hardware: > CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw) > CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw) > > [2015-11-04T22:36:22.663] Message aggregation disabled > > [2015-11-04T22:36:22.664] Resource spec: Reserved system memory limit not > configured for this node > > [2015-11-04T23:00:17.659] Slurmd shutdown completing > > [2015-11-04T23:05:38.092] Node configuration differs from hardware: > CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw) > CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw) > > [2015-11-04T23:05:38.098] Message aggregation disabled > > [2015-11-04T23:05:38.111] error: _cpu_freq_cpu_avail: Could not open > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies > > [2015-11-04T23:05:38.113] Resource spec: Reserved system memory limit not > configured for this node > > [2015-11-04T23:05:38.127] fatal: You are running slurmd as something other > than user slurm(564). If you want to run as this user add SlurmdUser=root > to the slurm.conf file. > > > > The same message appears on the other three nodes as well. > > > > scontrol ping returns: > > > > Slurmctld(primary/backup) at kenbo-cen05/(NULL) are UP/DOWN > > > > Sinfo returns: > > > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > debug* up 5:00 1 down* kenbo-cen05 > > highmem up infinite 4 down* kenbo-cen[05-08] > > batch up infinite 4 down* kenbo-cen[05-08] > > longrun up infinite 4 down* kenbo-cen[05-08] > > > > My configuration file and the init.d scripts for both slurm and slurmdbd > are attached below for your perusal. > > > > Your assistance will be highly appreciated. > > > > Regards, > > > > Dennis Mungai. > > > > ______________________________________________________________________ > > This e-mail contains information which is confidential. It is intended > only for the use of the named recipient. If you have received this e-mail > in error, please let us know by replying to the sender, and immediately > delete it from your system. Please note, that in these circumstances, the > use, disclosure, distribution or copying of this information is strictly > prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility > for the accuracy or completeness of this message as it has been transmitted > over a public network. Although the Programme has taken reasonable > precautions to ensure no viruses are present in emails, it cannot accept > responsibility for any loss or damage arising from the use of the email or > attachments. Any views expressed in this message are those of the > individual sender, except where the sender specifically states them to be > the views of KEMRI-Wellcome Trust Programme. > ______________________________________________________________________ > -- *James Oguya*
