Dear Ole and Hermann, I have reinstalled slurm from scratch now following this link:
The error remains the same. Kindly guide me where will i find this cred/munge plugin. Please help me resolve this issue. [root@exxact slurm]# slurmd -C NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889 UpTime=0-22:06:45 [root@exxact slurm]# systemctl enable slurmctld.service [root@exxact slurm]# systemctl start slurmctld.service [root@exxact slurm]# systemctl status slurmctld.service ● slurmctld.service - Slurm controller daemon Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s ago Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 27530 (code=exited, status=1/FAILURE) Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon. Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited, ...RE Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed state. Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed. [root@exxact slurm]# /usr/local/sbin/slurmctld -D slurmctld: slurmctld version 21.08.5 started on cluster cluster194 slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files slurmctld: error: cannot find cred plugin for cred/munge slurmctld: error: cannot create cred context for cred/munge slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted Best Regards, Nousheen Parvaiz ᐧ On Tue, Feb 1, 2022 at 9:06 AM Nousheen <nousheenparv...@gmail.com> wrote: > Dear Ole, > > Thank you for your response. > I am doing it again using your suggested link. > > Best Regards, > Nousheen Parvaiz > > > ᐧ > > On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen < > ole.h.niel...@fysik.dtu.dk> wrote: > >> Hi Nousheen, >> >> I recommend you again to follow the steps for installing Slurm on a >> CentOS >> 7 cluster: >> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation >> >> Maybe you will need to start installation from scratch, but the steps are >> guaranteed to work if followed correctly. >> >> IHTH, >> Ole >> >> On 1/31/22 06:23, Nousheen wrote: >> > The same error shows up on compute node which is as follows: >> > >> > [root@c103008 ~]# systemctl enable slurmd.service >> > [root@c103008 ~]# systemctl start slurmd.service >> > [root@c103008 ~]# systemctl status slurmd.service >> > ● slurmd.service - Slurm node daemon >> > Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor >> > preset: disabled) >> > Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 >> EST; >> > 2s ago >> > Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s >> $SLURMD_OPTIONS >> > (code=exited, status=203/EXEC) >> > Main PID: 11505 (code=exited, status=203/EXEC) >> > >> > Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon. >> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process >> exited, >> > code=exited, status=203/EXEC >> > Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed >> state. >> > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed. >> > >> > >> > Best Regards, >> > Nousheen Parvaiz >> > >> > >> > ᐧ >> > >> > On Mon, Jan 31, 2022 at 10:08 AM Nousheen <nousheenparv...@gmail.com >> > <mailto:nousheenparv...@gmail.com>> wrote: >> > >> > Dear Jeffrey, >> > >> > Thank you for your response. I have followed the steps as >> instructed. >> > After the copying the files to their respective locations "systemctl >> > status slurmctld.service" command gives me an error as follows: >> > >> > (base) [nousheen@exxact system]$ systemctl daemon-reload >> > (base) [nousheen@exxact system]$ systemctl enable slurmctld.service >> > (base) [nousheen@exxact system]$ systemctl start slurmctld.service >> > (base) [nousheen@exxact system]$ systemctl status slurmctld.service >> > ● slurmctld.service - Slurm controller daemon >> > Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; >> > vendor preset: disabled) >> > Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31 >> > PKT; 3s ago >> > Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s >> > $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE) >> > Main PID: 18114 (code=exited, status=1/FAILURE) >> > >> > Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon. >> > Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process >> > exited, code=exited, status=1/FAILURE >> > Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered >> > failed state. >> > Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed. >> > >> > Kindly guide me. Thank you so much for your time. >> > >> > Best Regards, >> > Nousheen Parvaiz >> > >> > ᐧ >> > >> > On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <jrl...@uwyo.edu >> > <mailto:jrl...@uwyo.edu>> wrote: >> > >> > The missing file error has nothing to do with slurm. The >> > systemctl command is part of the systems service management.____ >> > >> > __ __ >> > >> > The error message indicates that you haven’t copied the >> > slurmd.service file on your compute node to /etc/systemd/system >> or >> > /usr/lib/systemd/system. /etc/systemd/system is usually used >> when >> > a user adds a new service to a machine.____ >> > >> > __ __ >> > >> > Depending on your version of Linux you may also need to do a >> > systemctl daemon-reload to activate the slurmd.service within >> > system.____ >> > >> > __ __ >> > >> > Once slurmd.service is copied over, the systemctld command >> should >> > work just fine.____ >> > >> > __ __ >> > >> > Remember:____ >> > >> > slurmd.service - Only on compute nodes____ >> > >> > slurmctld.service – Only on your cluster >> > management node____ >> > >> > slurmdbd.service – Only on your cluster >> management >> > node____ >> > >> > __ __ >> > >> > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com >> > <mailto:slurm-users-boun...@lists.schedmd.com>> *On Behalf Of >> > *Nousheen >> > *Sent:* Thursday, January 27, 2022 3:54 AM >> > *To:* Slurm User Community List <slurm-users@lists.schedmd.com >> > <mailto:slurm-users@lists.schedmd.com>> >> > *Subject:* [slurm-users] systemctl enable slurmd.service Failed >> to >> > execute operation: No such file or directory____ >> > >> > __ __ >> > >> > ◆ This message was sent from a non-UWYO address. Please exercise >> > caution when clicking links or opening attachments from external >> > sources.____ >> > >> > __ __ >> > >> > __ __ >> > >> > Hello everyone,____ >> > >> > __ __ >> > >> > I am installing slurm on Centos 7 following tutorial: >> > >> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/ >> > < >> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/ >> >____ >> > >> > __ __ >> > >> > I am at the step where we start slurm but it gives me the >> > following error:____ >> > >> > __ __ >> > >> > [root@exxact slurm-21.08.5]# systemctl enable >> slurmd.service____ >> > >> > Failed to execute operation: No such file or directory____ >> > >> > __ __ >> > >> > I have run the command to check if slurm is configured >> properly____ >> > >> > __ __ >> > >> > [root@exxact slurm-21.08.5]# slurmd -C >> > NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 >> > CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889 >> > UpTime=19-16:06:00____ >> > >> > __ __ >> > >> > I am new to this and unable to understand the problem. Kindly >> help >> > me resolve this.____ >> > >> > __ __ >> > >> > My slurm.conf file is as follows:____ >> > >> > __ __ >> > >> > # slurm.conf file generated by configurator.html. >> > # Put this file on all nodes of your cluster. >> > # See the slurm.conf man page for more information. >> > # >> > ClusterName=cluster194 >> > SlurmctldHost=192.168.60.194 >> > #SlurmctldHost= >> > # >> > #DisableRootJobs=NO >> > #EnforcePartLimits=NO >> > #Epilog= >> > #EpilogSlurmctld= >> > #FirstJobId=1 >> > #MaxJobId=67043328 >> > #GresTypes= >> > #GroupUpdateForce=0 >> > #GroupUpdateTime=600 >> > #JobFileAppend=0 >> > #JobRequeue=1 >> > #JobSubmitPlugins=lua >> > #KillOnBadExit=0 >> > #LaunchType=launch/slurm >> > #Licenses=foo*4,bar >> > #MailProg=/bin/mail >> > #MaxJobCount=10000 >> > #MaxStepCount=40000 >> > #MaxTasksPerNode=512 >> > MpiDefault=none >> > #MpiParams=ports=#-# >> > #PluginDir= >> > #PlugStackConfig= >> > #PrivateData=jobs >> > ProctrackType=proctrack/cgroup >> > #Prolog= >> > #PrologFlags= >> > #PrologSlurmctld= >> > #PropagatePrioProcess=0 >> > #PropagateResourceLimits= >> > #PropagateResourceLimitsExcept= >> > #RebootProgram= >> > ReturnToService=1 >> > SlurmctldPidFile=/var/run/slurmctld.pid >> > SlurmctldPort=6817 >> > SlurmdPidFile=/var/run/slurmd.pid >> > SlurmdPort=6818 >> > SlurmdSpoolDir=/var/spool/slurmd >> > SlurmUser=nousheen >> > #SlurmdUser=root >> > #SrunEpilog= >> > #SrunProlog= >> > >> StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld >> > SwitchType=switch/none >> > #TaskEpilog= >> > TaskPlugin=task/affinity >> > #TaskProlog= >> > #TopologyPlugin=topology/tree >> > #TmpFS=/tmp >> > #TrackWCKey=no >> > #TreeWidth= >> > #UnkillableStepProgram= >> > #UsePAM=0 >> > # >> > # >> > # TIMERS >> > #BatchStartTimeout=10 >> > #CompleteWait=0 >> > #EpilogMsgTime=2000 >> > #GetEnvTimeout=2 >> > #HealthCheckInterval=0 >> > #HealthCheckProgram= >> > InactiveLimit=0 >> > KillWait=30 >> > #MessageTimeout=10 >> > #ResvOverRun=0 >> > MinJobAge=300 >> > #OverTimeLimit=0 >> > SlurmctldTimeout=120 >> > SlurmdTimeout=300 >> > #UnkillableStepTimeout=60 >> > #VSizeFactor=0 >> > Waittime=0 >> > # >> > # >> > # SCHEDULING >> > #DefMemPerCPU=0 >> > #MaxMemPerCPU=0 >> > #SchedulerTimeSlice=30 >> > SchedulerType=sched/backfill >> > SelectType=select/cons_tres >> > SelectTypeParameters=CR_Core >> > # >> > # >> > # JOB PRIORITY >> > #PriorityFlags= >> > #PriorityType=priority/basic >> > #PriorityDecayHalfLife= >> > #PriorityCalcPeriod= >> > #PriorityFavorSmall= >> > #PriorityMaxAge= >> > #PriorityUsageResetPeriod= >> > #PriorityWeightAge= >> > #PriorityWeightFairshare= >> > #PriorityWeightJobSize= >> > #PriorityWeightPartition= >> > #PriorityWeightQOS= >> > # >> > # >> > # LOGGING AND ACCOUNTING >> > #AccountingStorageEnforce=0 >> > #AccountingStorageHost= >> > #AccountingStoragePass= >> > #AccountingStoragePort= >> > AccountingStorageType=accounting_storage/none >> > #AccountingStorageUser= >> > #AccountingStoreFlags= >> > #JobCompHost= >> > #JobCompLoc= >> > #JobCompPass= >> > #JobCompPort= >> > JobCompType=jobcomp/none >> > #JobCompUser= >> > #JobContainerType=job_container/none >> > JobAcctGatherFrequency=30 >> > JobAcctGatherType=jobacct_gather/none >> > SlurmctldDebug=info >> > SlurmctldLogFile=/var/log/slurmctld.log >> > SlurmdDebug=info >> > SlurmdLogFile=/var/log/slurmd.log >> > #SlurmSchedLogFile= >> > #SlurmSchedLogLevel= >> > #DebugFlags= >> > # >> > # >> > # POWER SAVE SUPPORT FOR IDLE NODES (optional) >> > #SuspendProgram= >> > #ResumeProgram= >> > #SuspendTimeout= >> > #ResumeTimeout= >> > #ResumeRate= >> > #SuspendExcNodes= >> > #SuspendExcParts= >> > #SuspendRate= >> > #SuspendTime= >> > # >> > # >> > # COMPUTE NODES >> > NodeName=linux[1-32] CPUs=11 State=UNKNOWN____ >> > >> > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE >> > State=UP ____ >> > >> > __ __ >> > >> > >> > ____ >> > >> > Best Regards,____ >> > >> > Nousheen Parvaiz____ >> > >> > ᐧ____ >> >>