Re: [slurm-users] Problem with permisions. CentOS 7.8

Ole Holm Nielsen Tue, 02 Jun 2020 03:06:07 -0700

Hi Ferran,

Please install Slurm software in the standard way, see
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

It seems that you have some unusual way to manage your Linux systems. InStockholm and Sweden there are many Slurm experts at the HPC centers whichmight be able to help you more directly.


Best regards,
Ole

On 6/2/20 11:58 AM, Ferran Planas Padros wrote:

I did a fresh installation with the EPEL repo, and installing munge fromit and it worked. To have the slurm user for munge was definitely aproblem, but that is the set up we have on the CentOS 6. Now I've learntmy lesson for future installations, thanks to everyone!

Now, I have a follow up question, if you don't mind. I am now trying torun slurm, and it crashes:



[root@roos21 ~]# systemctl status slurm.service

*●*slurm.service - LSB: slurm daemon management

Loaded: loaded (/etc/rc.d/init.d/slurm; bad; vendor preset: disabled)

Active: *failed*(Result: protocol) since Tue 2020-06-02 11:45:33 CEST;3min 33s ago


Docs: man:systemd-sysv-generator(8)

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Starting LSB: slurm daemonmanagement...


Jun 02 11:45:33 roos21.organ.su.se slurm[18223]: starting slurmd: [OK]

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Can't open PID file/var/run/slurmctld.pid (yet?) after start: No such file or directory

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: *Failed to start LSB: slurmdaemon management.*

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: *Unit slurm.service enteredfailed state.*


Jun 02 11:45:33 roos21.organ.su.se systemd[1]: *slurm.service failed.*

The thing is that this is a computing node, not the master node, soslurmctld is not installed. Why do I get this error?

Many thanks, and my apologies for this rather simple questions. I am anewbie on this.



Best,

Ferran

--------------------------------------------------------------------------

*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf ofRenata Maria Dart <ren...@slac.stanford.edu>

*Sent:* Friday, May 29, 2020 6:33:58 PM
*To:* ole.h.niel...@fysik.dtu.dk; Slurm User Community List
*Subject:* Re: [slurm-users] Problem with permisions. CentOS 7.8
Hi, don't know if this might be your problem but I ran into an issue
on centos 7.8 where /var/run/munge was not being created at boottime
because I didn't have the munge user in the local password file.  I
have the munge user in AD and once the system is up I can start munge
successfully, but AD wasn't available early enough during boot for the
munge startup to see it.  I added these lines to the munge systemctl
file:

PermissionsStartOnly=true
ExecStartPre=-/usr/bin/mkdir -m 0755 -p /var/run/munge
ExecStartPre=-/usr/bin/chown -R munge:munge /var/run/munge

and my system now starts munge up fine during a reboot.

Renata

On Fri, 29 May 2020, Ole Holm Nielsen wrote:

Hi Ferran,

When you have a CentOS 7 system with the EPEL repo enabled, and you have
installed the munge RPM from EPEL, then things should be working correctly.

Since systemctl tells you that Munge service didn't start correctly, then it
seems to me that you have a problem in the general configuration of your CentOS
7 system.  You should check /var/log/messages and "journalctl -xe" for munge
errors.  It is really hard for other people to guess what may be wrong in your
system.

My 2 cents worth: Maybe you could make a fresh CentOS 7.8 installation on a
test system and install the Munge service (and nothing else) according to
instructions in https://wiki.fysik.dtu.dk/niflheim/Slurm_installation.  This
*really* has got to work!

/Ole


On 29-05-2020 10:23, Ferran Planas Padros wrote:

Hello everyone,


Here it comes everything I've done.


- About Ole's answer:

Yes, we have slurm as the user to control munge. Following your comment, I
have changed the ownership of the munge files and tried to start munge as
munge user. However, it also failed.

Also, I first installed munge from a repository. I've seen your suggestion of
installing from EPEL. So I uninstalled and installed again. Same result

- About SELinux: It is disables

- The output of ps -ef | grep munge is:


root534051530 10:18 pts/000:00:00 grep --color=auto *munge*


- The outputs of munge -n is:


Failed to access "/var/run/munge/munge.socket.2": No such file or directory


- Same for unmunge


- Output for sudo systemctl status --full munge


*?*munge.service - MUNGE authentication service

Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor preset:
disabled)

Active: *failed*(Result: exit-code) since Fri 2020-05-29 10:15:52 CEST; 4min
18s ago

Docs: man:munged(8)

Process: 5333 ExecStart=/usr/sbin/munged *(code=exited, status=1/FAILURE)*


May 29 10:15:52 roos21.organ.su.se systemd[1]: Starting MUNGE authentication
service...

May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service: control process
exited, code=exited status=1*

May 29 10:15:52 roos21.organ.su.se systemd[1]: *Failed to start MUNGE
authentication service.*

May 29 10:15:52 roos21.organ.su.se systemd[1]: *Unit munge.service entered
failed state.*

May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service failed.*


- Regarding NTP, I get this message:


Unable to talk to NTP daemon. Is it running?


It is the same message I get in the nodes that DO work. All nodes are sync in
time and date with the central node


------------------------------------------------------------------------
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Ole
Holm Nielsen <ole.h.niel...@fysik.dtu.dk>
*Sent:* Friday, May 29, 2020 9:56:10 AM
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-users] Problem with permisions. CentOS 7.8
On 29-05-2020 08:46, Sudeep Narayan Banerjee wrote:

also check:
a) whether NTP has been setup and communicating with master node
b) iptables may be flushed (iptables -L)
c) SeLinux to disabled, to check :
getenforce
vim /etc/sysconfig/selinux
(change SELINUX=enforcing to SELINUX=disabled and save the file and reboot)


There is no reason to disable SELinux for running the Munge service.
It's a pretty bad idea to lower the security just for the sake of
convenience!

/Ole

On Fri, May 29, 2020 at 12:08 PM Sudeep Narayan Banerjee
<snbaner...@iitgn.ac.in <mailto:snbaner...@iitgn.ac.in>> wrote:

     I have not checked on the CentOS7.8
     a) if /var/run/munge folder does not exist then please double check
     whether munge has been installed or not
     b) user root or sudo user to do
     ps -ef | grep munge
     kill -9 <PID> //where PID is the Process ID for munge (if the
     process is running at all); else

     which munged
     /etc/init.d/munge start

     please let me know the the output of:

     |$ munge -n|

     |$ munge -n | unmunge|

     |$ sudo systemctl status --full munge

     |

     Thanks & Regards,
     Sudeep Narayan Banerjee
     System Analyst | Scientist B
     Indian Institute of Technology Gandhinagar
     Gujarat, INDIA


     On Fri, May 29, 2020 at 11:55 AM Bjørn-Helge Mevik
     <b.h.me...@usit.uio.no <mailto:b.h.me...@usit.uio.no>> wrote:

         Ferran Planas Padros <ferran.pad...@su.se
         <mailto:ferran.pad...@su.se>> writes:

          > I run the command as slurm user, and the /var/log/munge
         folder does belong to slurm.

         For security reasons, I strongly advise that you run munged as a
         separate user, which is unprivileged and not used for anything else.

         --          Regards,
         Bjørn-Helge Mevik, dr. scient,
         Department for Research Computing, University of Oslo

Re: [slurm-users] Problem with permisions. CentOS 7.8

Reply via email to