Ole,

Look at the NetworkManager-wait-online.service man page bellow (from RHEL 8.8). 
Maybe your IB interfaces aren't properly configured in NetworkManager. The *** 
were added by me.

" NetworkManager-wait-online.service blocks until NetworkManager logs "startup 
complete" and announces startup
       complete on D-Bus. How long that takes depends on the network and the 
NetworkManager configuration. If it
       takes longer than expected, then the reasons need to be investigated in 
NetworkManager.

       There are various reasons what affects NetworkManager reaching "startup 
complete" and how long
       NetworkManager-wait-online.service blocks.

       ·   In general, ***startup complete is not reached as long as 
NetworkManager is busy activating a device and as
           long as there are profiles in activating state ***. During boot, 
NetworkManager starts autoactivating
           suitable profiles that are ***configured to autoconnect***. If 
activation fails, NetworkManager might retry
           right away (depending on connection.autoconnect-retries setting). 
While trying and retrying,
           NetworkManager is busy until all profiles and devices either reached 
an activated or disconnected state
           and no further events are expected.

           ***Basically, as long as there are devices and connections in 
activating state visible with nmcli device
           and nmcli connection, startup is still pending. ***"



PÚBLICA
-----Mensagem original-----
De: slurm-users <slurm-users-boun...@lists.schedmd.com> Em nome de Ole Holm 
Nielsen
Enviada em: quarta-feira, 1 de novembro de 2023 05:19
Para: slurm-users@lists.schedmd.com
Assunto: Re: [slurm-users] RES: How to delay the start of slurmd until 
Infiniband/OPA network is fully up?

Hi Paulo,

On 11/1/23 01:12, Paulo Jose Braga Estrela wrote:
> I think that you should use NetworkManager-wait-online.service In RHEL 8. 
> Take a look at its man page. It only allows the system reach network-online 
> after all network interfaces are online. So, if your OP interfaces are 
> managed by Network Manager, you can use it.

Unfortunately NetworkManager-wait-online.service returns as soon as 1 network 
interface is up.  It doesn't wait for any other networks, including the 
Infiniband/OPA network, unfortunately :-(

You can see that the NetworkManager-wait-online.service file executes:

ExecStart=/usr/bin/nm-online -s -q

and this is causing our problems with Infiniband/OPA networks.  This is the 
reason why we need Max's workaround wait-for-interfaces.service.

/Ole


> -----Mensagem original-----
> De: slurm-users <slurm-users-boun...@lists.schedmd.com> Em nome de Ole
> Holm Nielsen Enviada em: terça-feira, 31 de outubro de 2023 07:00
> Para: Slurm User Community List <slurm-users@lists.schedmd.com>
> Assunto: Re: [slurm-users] How to delay the start of slurmd until 
> Infiniband/OPA network is fully up?
>
> Hi Jeffrey,
>
> On 10/30/23 20:15, Jeffrey R. Lang wrote:
>> The service is available in RHEL 8 via the EPEL package repository as 
>> system-networkd, i.e. systemd-networkd.x86_64                                
>>            253.4-1.el8    epel
>
> Thanks for the info.  We can install the systemd-networkd RPM from the EPEL 
> repo as you suggest.
>
> I tried to understand the properties of systemd-networkd before implementing 
> it in our compute nodes.  While there are lots of networkd man-pages, it's 
> harder to find an overview of the actual properties of networkd.  This is 
> what I found:
>
> * Networkd is a service included in recent versions of Systemd.  It seems to 
> be an alternative to NetworkManager.
>
> * Red Hat has stated that systemd-networkd is NOT going to be implemented in 
> RHEL 8 or 9.
>
> * Comparing systemd-networkd and NetworkManager:
> https://fedo/
> racloud.readthedocs.io%2Fen%2Flatest%2Fnetworkd.html&data=05%7C01%7Cpa
> ulo.estrela%40petrobras.com.br%7Cb488d8141bdd4e0fde0908dbdab42982%7C5b
> 6f62419a574be48e501dfa72e79a57%7C0%7C0%7C638344239576802836%7CUnknown%
> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gPEtcsxK5IYKUrY4j7YwzI3TClHCjGUl%2BCO
> TxfCvupc%3D&reserved=0
>
> * Networkd is described in the Wikipedia article
> https://en.w/
> ikipedia.org%2Fwiki%2FSystemd&data=05%7C01%7Cpaulo.estrela%40petrobras
> .com.br%7Cb488d8141bdd4e0fde0908dbdab42982%7C5b6f62419a574be48e501dfa7
> 2e79a57%7C0%7C0%7C638344239576802836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
> %7C&sdata=tmTrTlFh67hQ4XjjWHv3reLrNiNiXGirgcAstFigGWk%3D&reserved=0
>
> While networkd seems to be really nifty, I hesitate to replace NetworkManager 
> by networkd on our EL8 and EL9 systems because this is an unsupported and 
> only lightly tested setup, and it may require additional work to keep our 
> systems up-to-date in the future.
>
> It seems to me that Max Rutkowski's solution in
> https://github.com/maxlxl/network.target_wait-for-interfaces is less 
> intrusive than converting to systemd-networkd.
>
> Best regards,
> Ole
>
>
>> -----Original Message-----
>> From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf
>> Of Ole Holm Nielsen
>> Sent: Monday, October 30, 2023 1:56 PM
>> To: slurm-users@lists.schedmd.com
>> Subject: Re: [slurm-users] How to delay the start of slurmd until 
>> Infiniband/OPA network is fully up?
>>
>> ◆ This message was sent from a non-UWYO address. Please exercise caution 
>> when clicking links or opening attachments from external sources.
>>
>>
>> Hi Jens,
>>
>> Thanks for your feedback:
>>
>> On 30-10-2023 15:52, Jens Elkner wrote:
>>> Actually there is no need for such a script since
>>> /lib/systemd/systemd-networkd-wait-online should be able to handle it.
>>
>> It seems that systemd-networkd exists in Fedora FC38 Linux, but not
>> in RHEL 8 and clones, AFAICT.

O emitente desta mensagem é responsável por seu conteúdo e endereçamento e deve 
observar as normas internas da Petrobras. Cabe ao destinatário assegurar que as 
informações e dados pessoais contidos neste correio eletrônico somente sejam 
utilizados com o grau de sigilo adequado e em conformidade com a legislação de 
proteção de dados e privacidade aplicável. A utilização das informações e dados 
pessoais contidos neste correio eletrônico em desconformidade com as normas 
aplicáveis acarretará a aplicação das sanções cabíveis.

The sender of this message is responsible for its content and address and must 
comply with Petrobras' internal rules. It is up to the recipient to ensure that 
the information and personal data contained in this email are only used with 
the appropriate degree of confidentiality and in compliance with applicable 
data protection and privacy legislation. The use of the information and 
personal data contained in this e-mail in violation of the applicable rules 
will result in the application of the applicable sanctions.

El remitente de este mensaje es responsable por su contenido y dirección y debe 
cumplir con las normas internas de Petrobras. Corresponde al destinatario 
asegurarse de que la información y los datos personales contenidos en este 
correo electrónico solo se utilicen con el grado adecuado de confidencialidad y 
de conformidad con la legislación aplicable en materia de privacidad y 
protección de datos. El uso de la información y datos personales contenidos en 
este correo electrónico en contravención de las normas aplicables dará lugar a 
la aplicación de las sanciones correspondientes.

Reply via email to