Re: Recover VM after KVM host down (and HA not working) ?

Jean-Francois Nadeau Wed, 27 Dec 2017 16:07:13 -0800

Hmm could this be the culprit ?

WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-694feb6c)
(logid:160220c5) Agent investigation was requested on host
Host[-4-Routing], but host does not support investigation because it has no
NFS storage. Skipping investigation.


The primary storage is NFS.

On Sat, Dec 23, 2017 at 10:14 AM, Jean-Francois Nadeau <
[email protected]> wrote:

> Clearly the management server doesn't realize the instance on the failed
> host is not running...  but the host is in Alert state and powered down,
> and missing NFS heartbeats.
>
> 2017-12-23 14:57:52,427 DEBUG [c.c.h.Status] (AgentTaskPool-10:ctx-694feb6c)
> (logid:160220c5) Transition:[Resource state = Enabled, Agent event =
> AgentDisconnected, Host id = 4, name = r62-i122-36-01.domain.com]
> 2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
> (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
> 2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
> (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
> host 4
>
> Next step ?
>
> On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
> [email protected]> wrote:
>
>> I'd really like to get at the bottom of this.    It does sound like the
>> behavior mentioned in https://issues.apache.org/j
>> ira/browse/CLOUDSTACK-5582 but should be long fixed.
>>
>> One suspect log entry (be unrelated) I noticed is this recurring
>> exception in the manger logs :
>>
>> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
>> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>>
>> Which I guess is caused by the use of an external DHCP so manager fails
>> to determine a running VM IP.    Which brings me to my next question....
>> how is a VM marked for HA actually monitored ?
>>
>>
>> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <[email protected]>
>> wrote:
>>
>>> If all else fails, change its state to the correct  state in the MySQL
>>> database and restart the management  service. Sadly that is the only way
>>> I
>>> could do it when my Cloudstack got confused and stuck an instance in an
>>> intermediate state where I couldn't do anything with it.
>>>
>>> On Dec 22, 2017 at 9:09 AM, <Jean-Francois Nadeau <
>>> [email protected]>>
>>> wrote:
>>>
>>> Good morning,
>>>
>>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>>
>>> Im trying to recover VMs after an host failure (powered off from OOB).
>>>
>>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>>> advanced mode with vlan separation and created a shared network with no
>>> services since I wish to use an external DHCP.
>>>
>>> First,  say I don't have a compute offering with HA enabled and a KVM
>>> host
>>> goes down...  I can't put it in maintenance mode while down and disabling
>>> it have no effect on the state of the lost VMs.  VM stays in running
>>> state
>>> according to manager.   What should I do to force restart on remaining
>>> healthy hosts ?
>>>
>>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>>> with a compute offering with HA enabled.   Same result.  Manager do see
>>> the
>>> host as disconnected and powered off but take no action.   I certainly
>>> miss
>>> something here.  Please help !
>>>
>>> Regards,
>>>
>>> Jean-Francois
>>>
>>
>>
>

Re: Recover VM after KVM host down (and HA not working) ?

Reply via email to