Please see another thread on DEV that proposes the fix for KVM HA -> [DISCUSS] KVM HA with IPMI Fencing
---- We propose the following solution that in our understanding should cover all use cases and provide a fencing mechanism. NOTE: Proposed IPMI fencing, is just a script. If you are using HP hardware with ILO, it could be an ILO executable with specific parameters. In theory - this can be *any* script not just IPMI. Please take few minutes to read this through, to avoid duplicate efforts... Proposed FS below: ---------------- https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+HA+with+IPMI+Fencing On 10/12/15 12:54 AM, Frank Louwers wrote: > >> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com> wrote: >> >> Can you please explain what the issue is with KVM HA? In my tests, HA starts >> all VMs just fine without the hypervisor coming back. At least that is on >> current 4.6. Assuming a cluster of multiple nodes of course. It will then do >> a neighbor check from another host in the same cluster. >> >> Also, malfunctioning NFS leads to corruption and therefore we fence a box >> when the shared storage is unreliable. Combining primary and secondary NFS >> is not a good idea for production in my opinion. > > Well, it depends how you look at it, and what your situation is. > > If you use 1 NFS export als primary storage (and only NFS), then yes, the > system works as one would expect, and doesn’t need to be fixed. > > However, HA is “not functioning” in any of these scenario’s: > > - you don’t use NFS as your only primary storage > - you use more than one NFS primary storage > > Even worse: imagine you only use local storage as primary storage, but have 1 > NFS configured (as the UI “wizard” forces you to configure one). You don’t > have any active VM configured on the primary storage. You then perform > maintenance on the NFS storage, and take it offline… > > All your hosts will then reboot, resulting in major downtime, that’s > completely unnecessary. There’s not even an option to disable this at this > point… We’ve removed the reboot instructions from the HA script on all our > instances… > > Regards, > > Frank >