>Your running job is requesting 6 CPUs per node (4 nodes, 6 CPUs per node). That means 6 CPUs are being used on node hpc. >Your queued job is requesting 5 CPUs per node (4 nodes, 5 CPUs per node). In total, if it was running, that would require 11 CPUs on node hpc. But hpc only has 10 cores, so it can't run.
Right... I changed that but still the job is in pending state. I modified /etc/slurm/slurm.conf as below # grep hpc /etc/slurm/slurm.conf NodeName=hpc NodeAddr=10.1.1.1 CPUs=11 # for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory; done && scontrol show node hpc | grep RealMemory RealMemory=64259 AllocMem=1024 FreeMem=57116 Sockets=32 Boards=1 RealMemory=120705 AllocMem=1024 FreeMem=66403 Sockets=32 Boards=1 RealMemory=64259 AllocMem=1024 FreeMem=39966 Sockets=32 Boards=1 RealMemory=64259 AllocMem=1024 FreeMem=49189 Sockets=11 Boards=1 # for i in {0..2}; do scontrol show node compute-0-$i | grep CPUTot; done && scontrol show node hpc | grep CPUTot CPUAlloc=6 CPUTot=32 CPULoad=5.18 CPUAlloc=6 CPUTot=32 CPULoad=18.94 CPUAlloc=6 CPUTot=32 CPULoad=5.41 CPUAlloc=6 CPUTot=11 CPULoad=5.21 But still the job is pending $ scontrol show -d job 129 JobId=129 JobName=qe-fb UserId=mahmood(1000) GroupId=mahmood(1000) MCS_label=N/A Priority=1751 Nice=0 Account=fish QOS=normal WCKey=*default JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 DerivedExitCode=0:0 RunTime=00:00:00 TimeLimit=30-00:00:00 TimeMin=N/A SubmitTime=2019-12-17T15:00:37 EligibleTime=2019-12-17T15:00:37 AccrueTime=2019-12-17T15:00:37 StartTime=Unknown EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T15:00:38 Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:14534 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=4-4 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=20,mem=40G,node=4,billing=20 Socks/Node=* NtasksPerN:B:S:C=5:0:*:* CoreSpec=* MinCPUsNode=5 MinMemoryNode=10G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/mahmood/qe/f_borophene/slurm_qe.sh WorkDir=/home/mahmood/qe/f_borophene StdErr=/home/mahmood/qe/f_borophene/my_fb.log StdIn=/dev/null StdOut=/home/mahmood/qe/f_borophene/my_fb.log Power= >I'm not aware of any nodes, that have 32, or even 10 sockets. Are you sure, you want to use the cluster like that? Marcus, I have installed slurm via slurm roll on Rocks. All 4 nodes are dual socket Opetron 6282 with the following specs Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 I just wrote 11 CPUs for the head node in order to not fully utilize the head node with jobs. For example, compute-0-0 is $ scontrol show node compute-0-0 NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1 CPUAlloc=6 CPUTot=32 CPULoad=5.15 AvailableFeatures=rack-0,32CPUs ActiveFeatures=rack-0,32CPUs Gres=(null) NodeAddr=10.1.1.254 NodeHostName=compute-0-0 OS=Linux 3.10.0-1062.1.2.el7.x86_64 #1 SMP Mon Sep 30 14:19:46 UTC 2019 RealMemory=64259 AllocMem=1024 FreeMem=57050 Sockets=32 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=444124 Weight=20511900 Owner=N/A MCS_label=N/A Partitions=CLUSTER,WHEEL,SEA BootTime=2019-10-10T19:01:38 SlurmdStartTime=2019-12-17T13:50:37 CfgTRES=cpu=32,mem=64259M,billing=47 AllocTRES=cpu=6,mem=1G CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Regards, Mahmood