Hello, pls let me ask if SFEX is valid as the disk exclusive access control for Pacemaker clusters on VMware environment.
My client is planning to configure Pacemaker HA clusters on several VMware vSphere 6.5 hosts. Each of the HA clusters consists of two VM nodes of active and standby across two different ESX hosts, with shared LVM disk resources. As for the disk exclusive control and fencing mechanism with Pacemaker, our IT vendor is proposing to use SFEX (Shared Disk File EXclusiveness) and fence_vmware_soap (to reset the failing node via vCenter). Here, I am very concerned about a case of an ESX host hanging for over a minute like due to intermittent HW failures, so fence_vmware_soap would not work. Forcing the standby node to takeover the disk resources with SFEX, but if the hanging node comes back eventually, the hanged I/Os that were queued on the last active node just before the ESX hanged-up would flood over and corrupt the SFEX-takenover disk resources, because there was no SCSI persistent reservation and no valid HW watchdog timer for VMs on VMware. So I think SFEX is valid only if combined with STONITH IPMI for baremetal servers or even VMware hosts, and we should use fence_scsi for the recent SPC-3 compliant disk storage with fence_vmware_soap on VMware. Am I right? In addition, is fence_scsi with fence_vmware_soap proven enough in production environments on RHEL7x on VMware? Thank you for any responses. Satoshi
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org