On Wed, 2014-04-09 at 16:19 +0100, Jonathan Gowar wrote:
> The HA instance has not yet come back online, I'll wait to see what
> happens with this.

I left this for another hour, still nothing.  I had left 1 of 2
hypervisor machines switched off during the testing, and then the
remaining host went to alert state.

I then powered the off host up, and CS registered it up.  The other
machine stayed in an alerted state.  I restarted cs agent, it made no
difference, so I rebooted it.  It's come back, still in an alerted
state, with these messages "Lost connection to the server. Dealing with
the remaining commands...".

Next I restart CS management, and now the both hosts are offline, 1
still in it's alert state the other disconnected.  All non-HA VMs have
stopped, and the 1 HA VM that's been trying to restart all this time, is
still trying to (bless!).

The System VMs have now entered Starting states, so they're not happy
either.

So, to recap, concisely.  This has been result of 1 (invoked) powerfail
on 1 hypervisor.


Events

* lost power to hv-1
* ssv + proxy migrate and start
* virt router + ha guest fails to start
* .... waits
* after an hour virt router starts
* .... waits longer
* after another ha guest fails to start 
* power on hv-1 in running state
* hv-2 changes to alert state
* restart agent on hv-2
* restart management on management
* reboot hv-2
* hv-2 returns with alert state
* hv-1 disconnects

Current state

* all guests have stopped
* ha guest is STILL trying to start
* hv-1 is attempting to connect
* hv-2 alerting
* virtual router running
* ssvm + proxy running

Seems like quite a horrific fallout in the event of loosing 1
hypervisor.

Jon

Reply via email to