Hi;

Using the noki user, would you try to read /var/run/slurm-llnl/slurmd.pid and /var/run/slurm-llnl/slurmctld.pid files. Are there these files present, readable and writeable? May be upper directories don't have the permission to read/execute.

Regards;

Ahmet M.


On 19.06.2019 07:26, Noki Lee wrote:
Hi, slurm-suers and Ahmet

I already tried

chown -R noki:root /var/run/slurm-llnl

before I posted it.

When I first saw these messages at a glance, I applied above command and restarted demons.
After that, with the same problems, I posted it.

Regards,

Noki.

On Wed, Jun 19, 2019 at 12:24 PM mercan <ahmet.mer...@uhem.itu.edu.tr <mailto:ahmet.mer...@uhem.itu.edu.tr>> wrote:

    Hi;

    Sorry, as you can see, I did a mistake again.  I wrote two different
    directories:

    "The owner of the /var/run/slurm-llnl directory and the
    slurmctld.pid and slurmd.pid files should be "noki" user.

    chown -R noki:root /var/spool/slurm-llnl"

    You should run:

    chown -R noki:root /var/run/slurm-llnl

    Regards;

    Ahmet M.


    19.06.2019 05:55 tarihinde Noki Lee yazdı:
    > Hi, slurm-users and mercan.
    >
    > I tried what you said.
    > |noki@noki-System-Product-Name:~$ sudo chown -R noki:root
    > /var/spool/slurm-llnl/
    |noki@noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l
    > total 92
    > -rw------- 1 noki root 198 Jun 19 11:36 assoc_mgr_state
    > -rw------- 1 noki root 198 Jun 18 20:31 assoc_mgr_state.old
    > -rw------- 1 noki root  10 Jun 19 11:36 assoc_usage
    > -rw------- 1 noki root  10 Jun 18 20:31 assoc_usage.old
    > -rw-r--r-- 1 noki root   5 Jun 11 21:15 clustername
    > -rw------- 1 noki root  15 Jun 19 11:36 fed_mgr_state
    > -rw------- 1 noki root  15 Jun 18 20:31 fed_mgr_state.old
    > -rw------- 1 noki root  35 Jun 19 11:36 job_state
    > -rw------- 1 noki root  35 Jun 18 20:31 job_state.old
    > -rw------- 1 noki root  38 Jun 19 11:36 last_config_lite
    > -rw------- 1 noki root  38 Jun 19  2019 last_config_lite.old
    > -rw------- 1 noki root 109 Jun 19 11:36 layouts_state_base
    > -rw------- 1 noki root 109 Jun 18 20:31 layouts_state_base.old
    > -rw------- 1 noki root 194 Jun 19 11:36 node_state
    > -rw------- 1 noki root 194 Jun 18 20:31 node_state.old
    > -rw------- 1 noki root 142 Jun 19 11:36 part_state
    > -rw------- 1 noki root 142 Jun 18 20:31 part_state.old
    > -rw------- 1 noki root  10 Jun 19 11:36 qos_usage
    > -rw------- 1 noki root  10 Jun 18 20:31 qos_usage.old
    > -rw------- 1 noki root  35 Jun 19 11:36 resv_state
    > -rw------- 1 noki root  35 Jun 18 20:31 resv_state.old
    > -rw------- 1 noki root  31 Jun 19 11:36 trigger_state
    > -rw------- 1 noki root  31 Jun 18 20:31 trigger_state.old
    > After I restarted or not both slurmd and slrumctld, slurmctld is
    fine
    > but slurmd still shows the same issue.
    > The below is the owners and groups after restart both slurmd and
    slurmctld
    > |noki@noki-System-Product-Name:~$ sudo chown -R noki:root
    > /var/spool/slurm-llnl/
    > noki@noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l total 92
    > -rw------- 1 noki noki 198 Jun 19 11:40 assoc_mgr_state
    -rw------- 1
    > noki root 198 Jun 19 11:36 assoc_mgr_state.old -rw------- 1 noki
    noki
    >  10 Jun 19 11:40 assoc_usage -rw------- 1 noki root  10 Jun 19
    11:36
    > assoc_usage.old -rw-r--r-- 1 noki root   5 Jun 11 21:15 clustername
    > -rw------- 1 noki noki  15 Jun 19 11:40 fed_mgr_state -rw------- 1
    > noki root  15 Jun 19 11:36 fed_mgr_state.old -rw------- 1 noki noki
    >  35 Jun 19 11:40 job_state -rw------- 1 noki root  35 Jun 19 11:36
    > job_state.old -rw------- 1 noki noki  38 Jun 19 11:40
    last_config_lite
    > -rw------- 1 noki root  38 Jun 19 11:36 last_config_lite.old
    > -rw------- 1 noki noki 109 Jun 19 11:40 layouts_state_base
    -rw-------
    > 1 noki root 109 Jun 19 11:36 layouts_state_base.old -rw------- 1
    noki
    > noki 194 Jun 19 11:40 node_state -rw------- 1 noki root 194 Jun 19
    > 11:36 node_state.old -rw------- 1 noki noki 142 Jun 19 11:40
    > part_state -rw------- 1 noki root 142 Jun 19 11:36 part_state.old
    > -rw------- 1 noki noki  10 Jun 19 11:40 qos_usage -rw------- 1 noki
    > root  10 Jun 19 11:36 qos_usage.old -rw------- 1 noki noki  35
    Jun 19
    > 11:40 resv_state -rw------- 1 noki root  35 Jun 19 11:36
    > resv_state.old -rw------- 1 noki noki  31 Jun 19 11:40
    trigger_state
    > -rw------- 1 noki root  31 Jun 19 11:36 trigger_state.old |
    > Do you think I need to change chmod?
    >
    > Regards,
    >
    > On Tue, Jun 18, 2019 at 9:27 PM mercan
    <ahmet.mer...@uhem.itu.edu.tr <mailto:ahmet.mer...@uhem.itu.edu.tr>
    > <mailto:ahmet.mer...@uhem.itu.edu.tr
    <mailto:ahmet.mer...@uhem.itu.edu.tr>>> wrote:
    >
    >     Hi;
    >
    >     I did not notice
    >
    >     SlurmUser=noki
    >
    >     line. The owner of the /var/run/slurm-llnl directory and the
    >     slurmctld.pid and slurmd.pid files should be "noki" user.
    >
    >     chown -R noki:root /var/spool/slurm-llnl
    >
    >     Regards;
    >
    >     Ahmet M.
    >
    >
    >     On 18.06.2019 15:15, mercan wrote:
    >     > Hi;
    >     >
    >     > The owner of the /var/run/slurm-llnl directory and the
    >     slurmctld.pid
    >     > and slurmd.pid files should be "slurm" user. Your files owner
    >     are root
    >     > and noki.
    >     >
    >     > chown -R slurm:slurm /var/spool/slurm-llnl
    >     >
    >     >
    >     > Regards;
    >     >
    >     > Ahmet M.
    >     >
    >     >
    >     > On 18.06.2019 15:03, Noki Lee wrote:
    >     >>
    >     >> Though SLURM works fine for job submitting, running, and
    >     queueing, I
    >     >> got a minor error below.
    >     >>
    >     >> |sudo systemctl status slurmd|
    >     >>
    >     >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]:
    >     slurmd.service:
    >     >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?)
    after
    >     >> start: No such file or directory|
    >     >>
    >     >> |sudo systemctl status slurmctld|
    >     >>
    >     >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]:
    >     slurmd.service:
    >     >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?)
    after
    >     >> start: No such file or directory|
    >     >>
    >     >> I followed the installation of a guide from
    >     >>
    >     >>
    >
    
ftp://www.microway.com/pub/pub/for-customer/SDSU-Training/Webinar_2_Slurm_II--Ubuntu16.04_and_18.04.pdf
    >
    >     >>
    >     >>
    >     >> This problem may come from the ownership of slurm.conf file?
    >     >>
    >     >> Here are my slurm.conf and ownership for slur*.pid
    >     >>
    >     >> |# slurm.conf file generated by configurator easy.html. #
    Put this
    >     >> file on all nodes of your cluster. # See the slurm.conf man
    >     page for
    >     >> more information. # ControlMachine=noki-System-Product-Name
    >     >> #ControlAddr= # #MailProg=/bin/mail MpiDefault=none
    >     >> #MpiParams=ports=#-# ProctrackType=proctrack/pgid
    >     ReturnToService=1
    >     >> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
    >     >> #SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
    >     >> #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd
    SlurmUser=noki
    >     >> #SlurmdUser=root StateSaveLocation=/var/spool/slurm-llnl
    >     >> SwitchType=switch/none TaskPlugin=task/none # # # TIMERS
    >     #KillWait=30
    >     >> #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # #
    >     >> SCHEDULING FastSchedule=1 SchedulerType=sched/backfill
    >     >> SelectType=select/linear #SelectTypeParameters= # # #
    LOGGING AND
    >     >> ACCOUNTING AccountingStorageType=accounting_storage/none
    >     >> ClusterName=linux #JobAcctGatherFrequency=30
    >     >> JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=3
    >     >> SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile
    >     #SlurmdDebug=3
    >     >> SlurmdLogFile=/var/log/slurm-llnl/SlurmdLogFile # # # COMPUTE
    >     NODES
    >     >> NodeName=noki-System-Product-Name CPUs=4 RealMemory=6963
    Sockets=1
    >     >> CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
    >     PartitionName=debug
    >     >> Nodes=noki-System-Product-Name Default=YES MaxTime=INFINITE
    >     State=UP |
    >     >> |$ ls -l /var/run/slurm-llnl/ total 8 -rw-r--r-- 1 noki
    root 6
    >     Jun 12
    >     >> 10:20 slurmctld.pid -rw-r--r-- 1 root root 6 Jun 12 10:20
    >     slurmd.pid|
    >     >>
    >     >
    >


Reply via email to