Hi Ahmet,

On 6/16/20 11:27 AM, mercan wrote:
Did you check /var/log/messages file for errors. Systemctl logs this file, instead of the slurmctl log file.

Ahmet M.

The syslog reports the same errors from slurmctld as are being reported by every Slurm 20.02 command.

I have found a workaround: Replace NodeName lines "Boards=1 SocketsPerBoard=2" by "Sockets=2" in slurm.conf and reconfigure the daemons. For some reason 20.02 doesn't handle "Boards" configurations correctly.

Any site with "Boards" in slurm.conf should reconfigure to "Sockets" before installing/upgrading to 20.02.

It may be a good idea to track updates to bug https://bugs.schedmd.com/show_bug.cgi?id=9241

Best regards,
Ole

16.06.2020 11:12 tarihinde Ole Holm Nielsen yazdı:
Today we upgraded the controller node from 19.05 to 20.02.3, and immediately all Slurm commands (on the controller node) give error messages for all partitions:

# sinfo --version
sinfo: error: NodeNames=a[001-140] CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs.
(lines deleted)
slurm 20.02.3

In slurm.conf we have defined NodeName like:

NodeName=a[001-140] Weight=10001 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 ThreadsPerCore=1 ...

According to the slurm.conf manual the CPUs should then be calculated automatically:

"If CPUs is omitted, its default will be set equal to the product of Boards, Sockets, CoresPerSocket, and ThreadsPerCore."

Has anyone else seen this error with Slurm 20.02?

I wonder if there is a problem with specifying SocketsPerBoard in stead of Sockets?  The slurm.conf manual doesn't seem to prefer one over the other.

I've opened a bug https://bugs.schedmd.com/show_bug.cgi?id=9241

Reply via email to