Bingo! You were right, I was asking for more cores than was available (our highmem nodes have less than out standard nodes). I was so convinced that the problem was related to my upgrading the OS on those nodes that it never crossed my mind that it was something as straightforward as that.
Thanks for your help. On Wed, Sep 29, 2021 at 7:49 PM Paul Brunk <pbr...@uga.edu> wrote: > Hello Byron: > > > > I’m guessing that your job is asking for more HW than the highmem_p > > has in it, or more cores or RAM within a node than any of the nodes > > have, or something like that. 'scontrol show job 10860160' might > > help. You can also look in slurmctld.log for that jobid. > > > > -- > > Paul Brunk, system administrator > > Georgia Advanced Computing Resource Center > > Enterprise IT Svcs, the University of Georgia > > > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of > *byron > *Sent:* Wednesday, September 29, 2021 10:35 > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* [slurm-users] job stuck as pending - reason "PartitionConfig" > > > > [EXTERNAL SENDER - PROCEED CAUTIOUSLY] > > Hi > > > > When I try to submit a job to one of our partitions it just stay in the > stay pending with the reason "PartitionConfig". Can someone point me in > the right direction for how to troubleshoot this? I'm a bit stumpped. > > > > Some details of the setup > > > > The version is 19.05.7 > > > > This is the job that is stuck in state pending > > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 10860160 highmem MooseBen byron PD 0:00 16 > (PartitionConfig) > > > > $ sinfo -p highmem > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > highmem up infinite 1 drain intel-0012 > highmem up infinite 19 idle intel-[0001-0011,0013-0020] > > > > The output from scontrol show part > > PartitionName=highmem > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=NO QoS=N/A > DefaultTime=02:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=intel-00[01-20] > PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > OverSubscribe=EXCLUSIVE > OverTimeLimit=NONE PreemptMode=REQUEUE > State=UP TotalCPUs=320 TotalNodes=20 SelectTypeParameters=NONE > JobDefaults=(null) > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > >