Re: [lustre-discuss] MDT hanging

2021-03-09 Thread Simon Guilbault via lustre-discuss
Hi, One of the things that the ZFS pacemaker resource does not seem to pick up failure is when MMP fails due to some problem with the SAS bus. We added this short script running as a systemd daemon to do a failover when this happens. The other check in this script is using NHC, mostly to check if t

[lustre-discuss] MDT hanging

2021-03-09 Thread Christopher Mountford via lustre-discuss
Hi, We've had a couple of MDT hangs on 2 of our lustre filesystems after updating to 2.12.6 (though I'm sure I've seen this exact behaviour on previous versions). Ths symptoms are a gradualy increasing load on the affected MDS, processes doing I/O on the filesystem blocking indefinately, showin