om>>
Date: Thursday, February 11, 2021 at 3:17 PM
To: "nathan.crawf...@uci.edu<mailto:nathan.crawf...@uci.edu>"
mailto:nathan.crawf...@uci.edu>>, Lustre User
Discussion Mailing List
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] LNET
, and that feature is not slated to arrive until after 2.14
> (afaik).
>
>
>
> Chris Horn
>
>
>
> *From: *lustre-discuss on
> behalf of "Spitz, Cory James"
> *Date: *Thursday, February 11, 2021 at 3:17 PM
> *To: *"nathan.crawf...@uci.edu" , Lu
Hi Colin,
I've done checks of the performance/error counters, and used the
in-OS-repo version ibdiagnet. Apart from a couple nodes with known failing
cables/HCAs (not involved in lnet connectino probs), the fabric was
healthy. It did pick up that the IPoIB partition was still at 20gbit/s from
wh
Discussion
Mailing List
Subject: Re: [lustre-discuss] LNET IB intermittent connection
Resent-From:
Resent-Date: Thursday, February 11, 2021 at 3:17 PM
Hi, Nate.
You asked, “can LNET be easily configured to go over the @tcp connection when
the @o2ib flakes out?”
Yes, you can use LNet Multi-
Hi, Nate.
You asked, “can LNET be easily configured to go over the @tcp connection when
the @o2ib flakes out?”
Yes, you can use LNet Multi-Rail for it and that _is_ covered in the “fine
manual”, chapter 16 ☺
https://doc.lustre.org/lustre_manual.xhtml#lnetmr
-Cory
On 2/10/21, 4:54 PM, "lustre-
Hi Nathan,
Have you examined the underlying fabric to ensure it's functioning
correctly?
https://www.mellanox.com/products/adapter-software/infiniband-management-and-monitoring-tools
might interest you
-cf
On Wed, Feb 10, 2021 at 3:54 PM Nathan Crawford wrote:
> Hi All,
>
> I've recently be
Hi All,
I've recently been having a bunch of LNET over Infiniband
connection-lost/-restored errors and am trying to find the cause and/or
tune the system to better cope. There is a lot of stuff on the wiki (
https://wiki.lustre.org/Lustre_Resiliency:_Understanding_Lustre_Message_Loss_and_Tuning_