Hmm could this be the culprit ? WARN [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-694feb6c) (logid:160220c5) Agent investigation was requested on host Host[-4-Routing], but host does not support investigation because it has no NFS storage. Skipping investigation.
The primary storage is NFS. On Sat, Dec 23, 2017 at 10:14 AM, Jean-Francois Nadeau < the.jfnad...@gmail.com> wrote: > Clearly the management server doesn't realize the instance on the failed > host is not running... but the host is in Alert state and powered down, > and missing NFS heartbeats. > > 2017-12-23 14:57:52,427 DEBUG [c.c.h.Status] (AgentTaskPool-10:ctx-694feb6c) > (logid:160220c5) Transition:[Resource state = Enabled, Agent event = > AgentDisconnected, Host id = 4, name = r62-i122-36-01.domain.com] > 2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl] > (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4 > 2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl] > (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on > host 4 > > Next step ? > > On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau < > the.jfnad...@gmail.com> wrote: > >> I'd really like to get at the bottom of this. It does sound like the >> behavior mentioned in https://issues.apache.org/j >> ira/browse/CLOUDSTACK-5582 but should be long fixed. >> >> One suspect log entry (be unrelated) I noticed is this recurring >> exception in the manger logs : >> >> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b) >> (logid:16dd70ad) Caught the Exception in VmIpFetchTask >> >> Which I guess is caused by the use of an external DHCP so manager fails >> to determine a running VM IP. Which brings me to my next question.... >> how is a VM marked for HA actually monitored ? >> >> >> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.gr...@gmail.com> >> wrote: >> >>> If all else fails, change its state to the correct state in the MySQL >>> database and restart the management service. Sadly that is the only way >>> I >>> could do it when my Cloudstack got confused and stuck an instance in an >>> intermediate state where I couldn't do anything with it. >>> >>> On Dec 22, 2017 at 9:09 AM, <Jean-Francois Nadeau < >>> the.jfnad...@gmail.com>> >>> wrote: >>> >>> Good morning, >>> >>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM. >>> >>> Im trying to recover VMs after an host failure (powered off from OOB). >>> >>> Primary storage is NFS and IPMI is configured for the KVM hosts. Zone is >>> advanced mode with vlan separation and created a shared network with no >>> services since I wish to use an external DHCP. >>> >>> First, say I don't have a compute offering with HA enabled and a KVM >>> host >>> goes down... I can't put it in maintenance mode while down and disabling >>> it have no effect on the state of the lost VMs. VM stays in running >>> state >>> according to manager. What should I do to force restart on remaining >>> healthy hosts ? >>> >>> Then I enabled IPMI on all KVM hosts and attempted the same experience >>> with a compute offering with HA enabled. Same result. Manager do see >>> the >>> host as disconnected and powered off but take no action. I certainly >>> miss >>> something here. Please help ! >>> >>> Regards, >>> >>> Jean-Francois >>> >> >> >