I'd really like to get at the bottom of this. It does sound like the behavior mentioned in https://issues.apache.org/jira/browse/CLOUDSTACK-5582 but should be long fixed.
One suspect log entry (be unrelated) I noticed is this recurring exception in the manger logs : ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b) (logid:16dd70ad) Caught the Exception in VmIpFetchTask Which I guess is caused by the use of an external DHCP so manager fails to determine a running VM IP. Which brings me to my next question.... how is a VM marked for HA actually monitored ? On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.gr...@gmail.com> wrote: > If all else fails, change its state to the correct state in the MySQL > database and restart the management service. Sadly that is the only way I > could do it when my Cloudstack got confused and stuck an instance in an > intermediate state where I couldn't do anything with it. > > On Dec 22, 2017 at 9:09 AM, <Jean-Francois Nadeau <the.jfnad...@gmail.com > >> > wrote: > > Good morning, > > New to ACS and doing a POC with 4.10 on Centos 7 and KVM. > > Im trying to recover VMs after an host failure (powered off from OOB). > > Primary storage is NFS and IPMI is configured for the KVM hosts. Zone is > advanced mode with vlan separation and created a shared network with no > services since I wish to use an external DHCP. > > First, say I don't have a compute offering with HA enabled and a KVM host > goes down... I can't put it in maintenance mode while down and disabling > it have no effect on the state of the lost VMs. VM stays in running state > according to manager. What should I do to force restart on remaining > healthy hosts ? > > Then I enabled IPMI on all KVM hosts and attempted the same experience > with a compute offering with HA enabled. Same result. Manager do see the > host as disconnected and powered off but take no action. I certainly miss > something here. Please help ! > > Regards, > > Jean-Francois >