Hi,
Recently, we frequently see OSTs are randomly dropped by some client nodes.
We have 4 Lustre filesystems, total 126 OSTs. All clients are running 2.15.3
client on CentOS 7.
Servers are CentOS 7 with Lustre 2.12.8 (3 FS') and 2.15.3 on Alma 8.8.
Failures can happen
from both versions of ser
I couldn't say exactly but..
- Your net is o2ib1. Is there an o2ib0?
- Are you routing? If so, lnet routing or IB routing? Any issues with
the routers or routing?
- Verify the stability of lnet and the fabric path between client and
server in the messages above using a tool like lne
Hi,
Lustre 2.12.2.
We are seeing lots of errors on the servers such as:
Oct 5 11:16:48 oss04 kernel: LNetError:
6414:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending PUT to
12345-172.19.171.15@o2ib1: -125
Oct 5 11:16:48 oss04 kernel: LustreError:
6414:0:(events.c:450:serv