On Wed, 2014-04-09 at 16:19 +0100, Jonathan Gowar wrote: > The HA instance has not yet come back online, I'll wait to see what > happens with this.
I left this for another hour, still nothing. I had left 1 of 2 hypervisor machines switched off during the testing, and then the remaining host went to alert state. I then powered the off host up, and CS registered it up. The other machine stayed in an alerted state. I restarted cs agent, it made no difference, so I rebooted it. It's come back, still in an alerted state, with these messages "Lost connection to the server. Dealing with the remaining commands...". Next I restart CS management, and now the both hosts are offline, 1 still in it's alert state the other disconnected. All non-HA VMs have stopped, and the 1 HA VM that's been trying to restart all this time, is still trying to (bless!). The System VMs have now entered Starting states, so they're not happy either. So, to recap, concisely. This has been result of 1 (invoked) powerfail on 1 hypervisor. Events * lost power to hv-1 * ssv + proxy migrate and start * virt router + ha guest fails to start * .... waits * after an hour virt router starts * .... waits longer * after another ha guest fails to start * power on hv-1 in running state * hv-2 changes to alert state * restart agent on hv-2 * restart management on management * reboot hv-2 * hv-2 returns with alert state * hv-1 disconnects Current state * all guests have stopped * ha guest is STILL trying to start * hv-1 is attempting to connect * hv-2 alerting * virtual router running * ssvm + proxy running Seems like quite a horrific fallout in the event of loosing 1 hypervisor. Jon