Hi All, We have a 3 node HCI cluster with Gluster 2+1 volumes.
The first node had a hardware memory failure which caused file corruption to the engine lv and the server would only boot into maintenance mode. For some reason glusterd wouldn't start and one of the volumes became inaccessible with the Storage domain going offline. This caused multiple VMs to go into a paused or shutdown state. Putting the host into maintenance mode and then shutting it down was done in an attempt to allow gluster to continue across 2 nodes (one being the arbiter). Unfortunately this didn't work. The solution was to do the following: 1. Remove the contents of /var/lib/glusterd except for glusterd.info 2. Start glusterd 3. Peer probe one of the other 2 peers 4. Restart glusterd 5. Cross fingers and toes Although this was a successful outcome I would like to know why losing 1 gluster peer caused the outage of a single storage domain and therefore outages of VMs with disks on that storage domain. Kind Regards Simon... _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QHA7YMU666W6KKZWZ5U3XFTWIND6ZMEQ/