[lustre-discuss] Random drop off OST from clients

2023-10-05 Thread Lixin Liu
Hi, Recently, we frequently see OSTs are randomly dropped by some client nodes. We have 4 Lustre filesystems, total 126 OSTs. All clients are running 2.15.3 client on CentOS 7. Servers are CentOS 7 with Lustre 2.12.8 (3 FS') and 2.15.3 on Alma 8.8. Failures can happen from both versions of ser

Re: [lustre-discuss] Lnet errors

2023-10-05 Thread Jeff Johnson
I couldn't say exactly but.. - Your net is o2ib1. Is there an o2ib0? - Are you routing? If so, lnet routing or IB routing? Any issues with the routers or routing? - Verify the stability of lnet and the fabric path between client and server in the messages above using a tool like lne

[lustre-discuss] Lnet errors

2023-10-05 Thread Alastair Basden via lustre-discuss
Hi, Lustre 2.12.2. We are seeing lots of errors on the servers such as: Oct 5 11:16:48 oss04 kernel: LNetError: 6414:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending PUT to 12345-172.19.171.15@o2ib1: -125 Oct 5 11:16:48 oss04 kernel: LustreError: 6414:0:(events.c:450:serv