[ovirt-users] Single Peer in Gluster Cluster Failure caused Storage Domain outage

simon Wed, 03 Aug 2022 17:04:34 -0700

Hi All,

We have a 3 node HCI cluster with Gluster 2+1 volumes.


The first node had a hardware memory failure which caused file corruption to 
the engine lv and the server would only boot into maintenance mode.

For some reason glusterd wouldn't start and one of the volumes  became 
inaccessible with the Storage domain going offline. This caused multiple VMs to 
go into a paused or shutdown state.
Putting the host into maintenance mode and then shutting it down was done in an 
attempt to allow gluster to continue across 2 nodes (one being the arbiter). 
Unfortunately this didn't work.

The solution was to do the following:
1. Remove the contents of /var/lib/glusterd except for glusterd.info
2. Start glusterd
3. Peer probe one of the other 2 peers
4. Restart glusterd
5. Cross fingers and toes

Although this was a successful outcome I would like to know why losing 1 
gluster peer caused the outage of a single storage domain and therefore outages 
of VMs with disks on that storage domain.

Kind Regards

Simon...
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QHA7YMU666W6KKZWZ5U3XFTWIND6ZMEQ/

[ovirt-users] Single Peer in Gluster Cluster Failure caused Storage Domain outage

Reply via email to