Hi Ole,

Yes, it's very similar. I've put our systemd unit file also online on 
https://gist.github.com/wpoely86/cf88e8e41ee885677082a7b08e12ae11

And we add it as a dependency for slurmd:

$ cat /etc/systemd/system/slurmd.service.d/wait.conf

[Service]
Environment="CUDA_DEVICE_ORDER=PCI_BUS_ID"
LimitMEMLOCK=infinity

[Unit]
After=waitforib.service
Requires=munge.service
Wants=waitforib.service


So far this has worked flawlessly.


Ward



On 2/11/2023 09:28, Ole Holm Nielsen wrote:
Hi Ward,

Thanks a lot for the feedback!  The method of probing 
/sys/class/infiniband/*/ports/*/state is also used in the NHC script 
lbnl_hw.nhc and has the advantage of not depending on the nmcli command from 
the NetworkManager package.

Can I ask you how you implement your script as a service in the Systemd booting 
process, perhaps similar to Max's solution in 
https://github.com/maxlxl/network.target_wait-for-interfaces ?

Thanks,
Ole

On 11/1/23 20:09, Ward Poelmans wrote:
We have a slightly difference script to do the same. It only relies on /sys:

# Search for infiniband devices and check waits until
# at least one reports that it is ACTIVE

if [[ ! -d /sys/class/infiniband ]]
then
     logger "No infiniband found"
     exit 0
fi

ports=$(ls /sys/class/infiniband/*/ports/*/state)

for (( count = 0; count < 300; count++ ))
do
     for port in ${ports}; do
         if grep -qc ACTIVE $port; then
             logger "Infiniband online at $port"
             exit 0
         fi
     done
     sleep 1
done

logger "Failed to find an active infiniband interface"
exit 1

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to