On 10/28/22 08:30, Richard Chang wrote:
Yes, the system is a HPE Cray EX, and I am trying to use switch/hpe_slingshot.

I see that Slurm 22.05 has added support for "switch/hpe_slingshot" with HPE Slingshot systems:

> SwitchType
> Identifies the type of switch or interconnect used for application communications. Acceptable values include "switch/cray_aries" for Cray systems, "switch/hpe_slingshot" for HPE Slingshot systems and "switch/none" for switches not requiring special processing for job launch or termination (Ethernet, and InfiniBand). The default value is "switch/none". All Slurm daemons, commands and running jobs must be restarted for a change in SwitchType to take effect. If running jobs exist at the time slurmctld is restarted with a new value of SwitchType, records of all jobs in any state may be lost.

You probably need to contact your HPE support people. A support contract with SchedMD is highly recommended when you have a complex setup with very new technology. See https://www.schedmd.com/support.php

/Ole

On 10/28/2022 11:21 AM, Ole Holm Nielsen wrote:
On 10/28/22 07:35, Richard Chang wrote:
I have observed that when I specify a switch type in the slurm.conf file and that particular switch type is not present in the slurmctld node, slurmctld panics and shuts down. Is this expected ? My slurmctld doesn't have the switch type, but the computes have that switch type. how can I set it up so that it can utilise the feature but not break slurm.

What is you line in slurm.conf?  The manual page seems to describe what you have observed:

SwitchType
              Identifies the type of switch or interconnect used for applica‐
              tion      communications.      Acceptable     values include
              "switch/cray_aries" for Cray systems, "switch/none" for switches               not  requiring  special processing for job launch or termination
              (Ethernet,  and   InfiniBand)   and   The default value   is
              "switch/none".   All  Slurm  daemons,  commands and running jobs               must be restarted for a change in SwitchType to take effect.  If               running jobs exist at the time slurmctld is restarted with a new               value of SwitchType, records of all jobs in  any state may  be
              lost.

Why do you want to use this configuration?  Is your system a Cray?

Reply via email to