And a follow-up question: Does topology.conf need to be on all the
nodes, or just the slurm controller? It's not clear from that web page.
I would assume only the controller needs it.
Prentice
On 1/17/19 4:49 PM, Prentice Bisbal wrote:
From https://slurm.schedmd.com/topology.html:
Note that compute nodes on switches that lack a common parent switch
can be used, but no job will span leaf switches without a common
parent (unless the TopologyParam=TopoOptional option is used). For
example, it is legal to remove the line "SwitchName=s4
Switches=s[0-3]" from the above topology.conf file. In that case, no
job will span more than four compute nodes on any single leaf switch.
This configuration can be useful if one wants to schedule multiple
phyisical clusters as a single logical cluster under the control of a
single slurmctld daemon.
My current environment falls into the category of multiple physical
clusters being treated as a single logical cluster under the control
of a single slurmctld daemon. At least, that's my goal.
In my environment, I have 2 "clusters" connected by their own separate
IB fabrics, and one "cluster" connected with 10 GbE. I have a fourth
cluster connected with only 1GbE. For this 4th cluster, we don't want
jobs to span nodes, due to the slow performance of 1 GbE. (This
cluster is intended for serial and low-core count parallel jobs) If I
just leave those nodes out of the topology.conf file, will that have
the desired affect of not allocating multi-node jobs to those nodes,
or will it result in an error of some sort?