Dear Nousheen,

I guess there is something missing in your installation - proably your slurm.conf?

Do you have logging enabled for slurmctld? If yes what do you see in that log?
Or what do you get if you run slurmctld manually like this:

/usr/local/sbin/slurmctld -D

Regards,
Hermann

On 1/31/22 6:08 AM, Nousheen wrote:
Dear Jeffrey,

Thank you for your response. I have followed the steps as instructed. After the copying the files to their respective locations "systemctl status slurmctld.service" command gives me an error as follows:

(base) [nousheen@exxact system]$ systemctl daemon-reload
(base) [nousheen@exxact system]$ systemctl enable slurmctld.service
(base) [nousheen@exxact system]$ systemctl start slurmctld.service
(base) [nousheen@exxact system]$ systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)    Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31 PKT; 3s ago   Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
  Main PID: 18114 (code=exited, status=1/FAILURE)

Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered failed state.
Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.

Kindly guide me. Thank you so much for your time.

Best Regards,
Nousheen Parvaiz

ᐧ

On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <jrl...@uwyo.edu <mailto:jrl...@uwyo.edu>> wrote:

    The missing file error has nothing to do with slurm.  The systemctl
    command is part of the systems service management.____

    __ __

    The error message indicates that you haven’t copied the
    slurmd.service file on your compute node to /etc/systemd/system or
    /usr/lib/systemd/system.  /etc/systemd/system is usually used when a
    user adds a new service to a machine.____

    __ __

    Depending on your version of Linux you may also need to do a
    systemctl daemon-reload to activate the slurmd.service within
    system.____

    __ __

    Once slurmd.service is copied over, the systemctld command should
    work just fine.____

    __ __

    Remember:____

                     slurmd.service     -  Only on compute nodes____

                     slurmctld.service – Only on your cluster management
    node____

                   slurmdbd.service – Only on your cluster management
    node____

    __ __

    *From:* slurm-users <slurm-users-boun...@lists.schedmd.com
    <mailto:slurm-users-boun...@lists.schedmd.com>> *On Behalf Of *Nousheen
    *Sent:* Thursday, January 27, 2022 3:54 AM
    *To:* Slurm User Community List <slurm-users@lists.schedmd.com
    <mailto:slurm-users@lists.schedmd.com>>
    *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
    execute operation: No such file or directory____

    __ __

    ◆ This message was sent from a non-UWYO address. Please exercise
    caution when clicking links or opening attachments from external
    sources.____

    __ __

    __ __

    Hello everyone,____

    __ __

    I am installing slurm on Centos 7 following tutorial:
    https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
    
<https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____

    __ __

    I am at the step where we start slurm but it gives me the following
    error:____

    __ __

    [root@exxact slurm-21.08.5]# systemctl enable slurmd.service____

    Failed to execute operation: No such file or directory____

    __ __

    I have run the command to check if slurm is configured properly____

    __ __

    [root@exxact slurm-21.08.5]# slurmd -C
    NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6
    ThreadsPerCore=2 RealMemory=31889
    UpTime=19-16:06:00____

    __ __

    I am new to this and unable to understand the problem. Kindly help
    me resolve this.____

    __ __

    My slurm.conf file is as follows:____

    __ __

    # slurm.conf file generated by configurator.html.
    # Put this file on all nodes of your cluster.
    # See the slurm.conf man page for more information.
    #
    ClusterName=cluster194
    SlurmctldHost=192.168.60.194
    #SlurmctldHost=
    #
    #DisableRootJobs=NO
    #EnforcePartLimits=NO
    #Epilog=
    #EpilogSlurmctld=
    #FirstJobId=1
    #MaxJobId=67043328
    #GresTypes=
    #GroupUpdateForce=0
    #GroupUpdateTime=600
    #JobFileAppend=0
    #JobRequeue=1
    #JobSubmitPlugins=lua
    #KillOnBadExit=0
    #LaunchType=launch/slurm
    #Licenses=foo*4,bar
    #MailProg=/bin/mail
    #MaxJobCount=10000
    #MaxStepCount=40000
    #MaxTasksPerNode=512
    MpiDefault=none
    #MpiParams=ports=#-#
    #PluginDir=
    #PlugStackConfig=
    #PrivateData=jobs
    ProctrackType=proctrack/cgroup
    #Prolog=
    #PrologFlags=
    #PrologSlurmctld=
    #PropagatePrioProcess=0
    #PropagateResourceLimits=
    #PropagateResourceLimitsExcept=
    #RebootProgram=
    ReturnToService=1
    SlurmctldPidFile=/var/run/slurmctld.pid
    SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurmd.pid
    SlurmdPort=6818
    SlurmdSpoolDir=/var/spool/slurmd
    SlurmUser=nousheen
    #SlurmdUser=root
    #SrunEpilog=
    #SrunProlog=
    StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
    SwitchType=switch/none
    #TaskEpilog=
    TaskPlugin=task/affinity
    #TaskProlog=
    #TopologyPlugin=topology/tree
    #TmpFS=/tmp
    #TrackWCKey=no
    #TreeWidth=
    #UnkillableStepProgram=
    #UsePAM=0
    #
    #
    # TIMERS
    #BatchStartTimeout=10
    #CompleteWait=0
    #EpilogMsgTime=2000
    #GetEnvTimeout=2
    #HealthCheckInterval=0
    #HealthCheckProgram=
    InactiveLimit=0
    KillWait=30
    #MessageTimeout=10
    #ResvOverRun=0
    MinJobAge=300
    #OverTimeLimit=0
    SlurmctldTimeout=120
    SlurmdTimeout=300
    #UnkillableStepTimeout=60
    #VSizeFactor=0
    Waittime=0
    #
    #
    # SCHEDULING
    #DefMemPerCPU=0
    #MaxMemPerCPU=0
    #SchedulerTimeSlice=30
    SchedulerType=sched/backfill
    SelectType=select/cons_tres
    SelectTypeParameters=CR_Core
    #
    #
    # JOB PRIORITY
    #PriorityFlags=
    #PriorityType=priority/basic
    #PriorityDecayHalfLife=
    #PriorityCalcPeriod=
    #PriorityFavorSmall=
    #PriorityMaxAge=
    #PriorityUsageResetPeriod=
    #PriorityWeightAge=
    #PriorityWeightFairshare=
    #PriorityWeightJobSize=
    #PriorityWeightPartition=
    #PriorityWeightQOS=
    #
    #
    # LOGGING AND ACCOUNTING
    #AccountingStorageEnforce=0
    #AccountingStorageHost=
    #AccountingStoragePass=
    #AccountingStoragePort=
    AccountingStorageType=accounting_storage/none
    #AccountingStorageUser=
    #AccountingStoreFlags=
    #JobCompHost=
    #JobCompLoc=
    #JobCompPass=
    #JobCompPort=
    JobCompType=jobcomp/none
    #JobCompUser=
    #JobContainerType=job_container/none
    JobAcctGatherFrequency=30
    JobAcctGatherType=jobacct_gather/none
    SlurmctldDebug=info
    SlurmctldLogFile=/var/log/slurmctld.log
    SlurmdDebug=info
    SlurmdLogFile=/var/log/slurmd.log
    #SlurmSchedLogFile=
    #SlurmSchedLogLevel=
    #DebugFlags=
    #
    #
    # POWER SAVE SUPPORT FOR IDLE NODES (optional)
    #SuspendProgram=
    #ResumeProgram=
    #SuspendTimeout=
    #ResumeTimeout=
    #ResumeRate=
    #SuspendExcNodes=
    #SuspendExcParts=
    #SuspendRate=
    #SuspendTime=
    #
    #
    # COMPUTE NODES
    NodeName=linux[1-32] CPUs=11 State=UNKNOWN____

    PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP ____

    __ __


    ____

    Best Regards,____

    Nousheen Parvaiz____

    ᐧ____


Reply via email to