Re: [slurm-users] Munge decode failing on new node

2020-04-19 Thread Chris Samuel
On Friday, 17 April 2020 2:22:00 PM PDT Dean Schulze wrote: > Both work. The only discrepancy is that the slurm controller output had > these two lines: > > UID: ??? (1000) > GID: ??? (1000) > > Like the controller doesn't know the username for UID 1000. What does thi

Re: [slurm-users] One node is not used by slurm

2020-04-19 Thread Renfro, Michael
Someone else might see more than I do, but from what you’ve posted, it’s clear that compute-0-0 will be used only after other lower-weighted nodes are too full to accept a particular job. I assume you’ve already submitted a set of jobs requesting enough resources to fill up all the nodes, and t

[slurm-users] One node is not used by slurm

2020-04-19 Thread Mahmood Naderan
Hi, Although compute-0-0 is included in a partition, I have noticed that no job is offloaded there automatically. If someone intentionally write --nodelist=compute-0-0 it will be fine. # grep -r compute-0-0 . ./nodenames.conf.new:NodeName=compute-0-0 NodeAddr=10.1.1.254 CPUs=32 Weight=20511900 Fea

Re: [slurm-users] Munge decode failing on new node

2020-04-19 Thread Brian Andrus
I see potentially 2 things you should likely do: 1. Run ntpd on your nodes. You can even have them sync with your master. 2. Sync your user data on the nodes too. Even if that is just ensuring /etc/passwd and /etc/group are the same on them all While ntp is not required for slurm, the time sy