Re: [slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Ole Holm Nielsen Mon, 30 Oct 2023 07:13:22 -0700

Hi Max,

Thanks so much for your fast response with a solution! I didn't know thatNetworkManager (falsely) claims that the network is online as soon as thefirst interface comes up :-(

Your solution of a wait-for-interfaces Systemd service makes a lot ofsense, and I'm going to try it out.


Best regards,
Ole

On 10/30/23 14:30, Max Rutkowski wrote:

Hi,
we're not using Omni-Path but also had issues with Infiniband taking toolong and slurmd failing to start due to that.
Our solution was to implement a little wait-for-interface systemd servicewhich delays the network.target until the ib interface has come up.
Our discovery was that the network-online.target is triggered by theNetworkManager as soon as the first interface is connected.
I've put the solution we use on my GitHub:https://github.com/maxlxl/network.target_wait-for-interfaces
You may need to do small adjustments, but it's pretty straight forward

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620 in

general.


Kind regards
Max

On 30.10.23 13:50, Ole Holm Nielsen wrote:
I'm fighting this strange scenario where slurmd is started before theInfiniband/OPA network is fully up. The Node Health Check (NHC)executed by slurmd then fails the node (as it should). This happensonly on EL8 Linux (AlmaLinux 8.8) nodes, whereas our CentOS 7.9 nodeswith Infiniband/OPA network work without problems.
Question: Does anyone know how to reliably delay the start of the slurmdSystemd service until the Infiniband/OPA network is fully up?
Note: Our Infiniband/OPA network fabric is Omni-Path 100 Gbit/s, notMellanox IB. On AlmaLinux 8.8 we use the in-distro OPA drivers sincethe CornelisNetworks drivers are not available for RHEL 8.8.

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620

The details:
The slurmd service is started by the service file/usr/lib/systemd/system/slurmd.service after the "network-online.target"has been reached.
It seems that NetworkManager reports "network-online.target" BEFORE theInfiniband/OPA device ib0 is actually up, and this seems to be the causeof our problems!
Here are some important sequences of events from the syslog showing thatthe network goes online before the Infiniband/OPA network (hfi1_0adapter) is up:
Oct 30 13:01:40 d064 systemd[1]: Reached target Network is Online.
(lines deleted)
Oct 30 13:01:41 d064 slurmd[2333]: slurmd: error: health_check failed:rc:1 output:ERROR: nhc: Health check failed: check_hw_ib: No IB portis ACTIVE (LinkUp 100 Gb/sec).
(lines deleted)
Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: 8051: Link up
Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: set_link_state:current GOING_UP, new INIT (LINKUP)Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: physical statechanged to PHYS_LINKUP (0x5), phy 0x50
I tried to delay the NetworkManager "network-online.target" by setting await on the ib0 device and reboot, but that seems to be ignored:
$ nmcli -p connection modify "System ib0"connection.connection.wait-device-timeout 20
I'm hoping that other sites using Omni-Path have seen this and maybe canshare a fix or workaround?
Of course we could remove the Infiniband check in Node Health Check(NHC), but that would not really be acceptable during operations.
Thanks for sharing any insights,
Ole
--
Max Rutkowski
IT-Services und IT-Betrieb
Tel.: +49 (0)331/6264-2341
E-Mail: max.rutkow...@gfz-potsdam.de
___________________________________

Helmholtz-Zentrum Potsdam
*Deutsches GeoForschungsZentrum GFZ*
Stiftung des öff. Rechts Land Brandenburg
Telegrafenberg, 14473 Potsdam

Re: [slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Reply via email to