[ceph-users] Re: Lost data from a RBD while client was not connected

J-P Methot Wed, 04 Aug 2021 12:41:32 -0700

I'm replying to my own message as it appears we have "fixed" the issue.Basically, we restarted all OSD hosts and all the presumed lost datareappeared. It's likely that some OSDs were stuck unreachable but weresomehow never flagged as such in the cluster.


On 8/3/21 8:15 PM, J-P Methot wrote:

Hi,
We've encountered this issue on Ceph Pacific, with an OpenstackWallaby cluster hooked to it. Essentially, we're slowly pushing thissetup into production so we're testing it and encountered this oddity.My colleague wanted to do some network redundancy tests, so hemanually shutdown a rbd-backed VM in openstack and then startedshutting down network switches. This didn't go well and causedinstabilities on the network, with potential packet loss. When hefixed the problem, he started the VM back up and the filesystem wascorrupt and non-recoverable. There was no activity from Ceph clientswhile the tests were going on. There's no errors in Ceph status. Nomissing PGs or objects are reported. As far as Ceph is concerned, itbelieves that there's no issue, despite this RBD mysteriously gettingcorrupt.
So to recap:
1.Clean shutdown of VM in openstack

2.Network tests cause downtime and packet loss.

3.VM activates but can't boot. Black screen in console.
4.Investigation shows that the XFS filesystem on the VM's sda1 isunrecoverable by xfs_repair.
So, my question is, when there is no client activity, can data in thecluster still get corrupt and unrecoverable if there is networkinstability? Or is the cause something else?

--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Lost data from a RBD while client was not connected

Reply via email to