[ovirt-users] Standard operating procedure for a node failure on HCI required

Thomas Hoberg Thu, 01 Apr 2021 15:14:40 -0700

oVirt may have started as a vSphere 'look-alike', but it graduated to a Nutanix 
'clone', at least in terms of marketing.


IMHO that means the 3-node hyperconverged default oVirt setup (2 replicas and 1 
arbiter) deserves special love in terms of documenting failure scenarios. 

3-node HCI is supposed to defend you against long-term effects of any single 
point of failure. There is no protection against the loss of dynamic 
state/session data, but state-free services should recover or resume: that's 
what it's all about.

Sadly, what I find missing in the oVirt and Gluster documentation is an SOP 
(standard operating procedure) that one should follow in case of a 
late-night/early-morning on-call wakeup when one of those three HCI nodes 
should have failed... dramatically or via a 'brown out' e.g. where only the 
storage part was actually lost.

My impression is that the oVirt and Gluster teams are barely talking, but in 
HCI that's fatal.

And I sure can't find those recovery procedures, not even in the commercial RH 
documents.

So please, either add them or show me where I missed them.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZFFH2U2RM2R3POGHXUZ3MLI4FB4BVLL/

[ovirt-users] Standard operating procedure for a node failure on HCI required

Reply via email to