Based on your slurmd logs—from the excerpt—slurmd is failing because it's
not running as slurm user.
In your config file, set SlurmUser=slurm and comment out the
SlurmdUser=slurm line.

Otherwise, for further troubleshooting, please attach your slurmctld(from
the head node) and slurmdbd log files.

On Thu, Nov 5, 2015 at 12:08 AM, Dennis Mungai <[email protected]>
wrote:

>
>
> Hello there,
>
>
>
> We recently deployed SLURM for a Bioinformatics cluster at KEMRI-Wellcome
> Trust, Kilifi, kenya, and after following the setup guide and the online
> configurator ( to build the configuration file), here are the errors we ran
> ino:
>
>
>
> 1.       None of the slurmd daemons on either node will start up.
>
> 2.       Apparently, slurmdbd starts up correctly and allowed us to
> register the cluster.
>
> Here’s the debug information available at the moment:
>
> 1.       1. An excerpt from the logs:
>
>
>
> less /var/log/slurm/slurmd.log | tail
>
> [2015-11-04T22:33:01.629] fatal: You are running slurmd as something other
> than user slurm(564).  If you want to run as this user add SlurmdUser=root
> to the slurm.conf file.
>
> [2015-11-04T22:36:22.663] Node configuration differs from hardware:
> CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw)
> CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw)
>
> [2015-11-04T22:36:22.663] Message aggregation disabled
>
> [2015-11-04T22:36:22.664] Resource spec: Reserved system memory limit not
> configured for this node
>
> [2015-11-04T23:00:17.659] Slurmd shutdown completing
>
> [2015-11-04T23:05:38.092] Node configuration differs from hardware:
> CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw)
> CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw)
>
> [2015-11-04T23:05:38.098] Message aggregation disabled
>
> [2015-11-04T23:05:38.111] error: _cpu_freq_cpu_avail: Could not open
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
>
> [2015-11-04T23:05:38.113] Resource spec: Reserved system memory limit not
> configured for this node
>
> [2015-11-04T23:05:38.127] fatal: You are running slurmd as something other
> than user slurm(564).  If you want to run as this user add SlurmdUser=root
> to the slurm.conf file.
>
>
>
> The same message appears on the other three nodes as well.
>
>
>
> scontrol ping returns:
>
>
>
> Slurmctld(primary/backup) at kenbo-cen05/(NULL) are UP/DOWN
>
>
>
> Sinfo returns:
>
>
>
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
>
> debug*       up       5:00      1  down* kenbo-cen05
>
> highmem      up   infinite      4  down* kenbo-cen[05-08]
>
> batch        up   infinite      4  down* kenbo-cen[05-08]
>
> longrun      up   infinite      4  down* kenbo-cen[05-08]
>
>
>
> My configuration file and the init.d scripts for both slurm and slurmdbd
> are attached below for your perusal.
>
>
>
> Your assistance will be highly appreciated.
>
>
>
> Regards,
>
>
>
> Dennis Mungai.
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended
> only for the use of the named recipient. If you have received this e-mail
> in error, please let us know by replying to the sender, and immediately
> delete it from your system. Please note, that in these circumstances, the
> use, disclosure, distribution or copying of this information is strictly
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility
> for the accuracy or completeness of this message as it has been transmitted
> over a public network. Although the Programme has taken reasonable
> precautions to ensure no viruses are present in emails, it cannot accept
> responsibility for any loss or damage arising from the use of the email or
> attachments. Any views expressed in this message are those of the
> individual sender, except where the sender specifically states them to be
> the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
>



-- 
*James Oguya*

Reply via email to