Hi,
There have been an unusual behavior in our private cloud environment built using CloudStack-4.2 from past couple of days. We have been observing that VM's stops unexpectedly on the hosts. The VM's have been launched with offerings having *HAEnable=False*. Has someone else encountered similar kind of behavior ? Following are the excerpts of logs from the management server, WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop a VM due to Resource [Host:1] is unreachable: Host 1: Host with s pecified id is not in the right state: Alert WARN [apache.cloudstack.alerts] (DirectAgent-25:) alertType:: 8 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: InsightsDev01, id: 23) stopped unexpectedly on host id:1, availability zone id:1, pod id:1 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) Processing HAWork[11-HA-24-Running-Investigating] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) HA on VM[User|InsightsDev02] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) SimpleInvestigator found VM[User|InsightsDev02]to be alive? null WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop vm, agent unavailable: com.cloud.exception.AgentUnavailableExcept ion: Resource [Host:1] is unreachable: Host 1: Host with specified id is not in the right state: Alert WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to actually stop VM[User|InsightsDev01] but continue with release because it's a force stop INFO [xen.resource.XenServer56FP1Resource] (DirectAgent-44:) Fence command for VM i-5-20-VM INFO [cloud.ha.HighAvailabilityManagerImpl] (DirectAgent-25:) Schedule vm for HA: VM[User|InsightsDev01] WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop a VM due to Resource [Host:1] is unreachable: Host 1: Host with specified id is not in the right state: Alert WARN [apache.cloudstack.alerts] (DirectAgent-25:) alertType:: 8 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: InsightJenkins01, id: 25) stopped unexpectedly on host id:1, availability zone id:1, pod id:1 INFO [xen.resource.XenServer56FP1Resource] (DirectAgent-107:) Fence command for VM r-8-VM WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop vm, agent unavailable: com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Host with specified id is not in the right state: Alert WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to actually stop VM[User|InsightJenkins01] but continue with release because it's a force stop INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9) Fencer null returned true ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9) Terminating HAWork[9-HA-20-Running-Investigating] com.cloud.utils.exception.CloudRuntimeException: Caught exception even though it should be handled. at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485) at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) Caused by: com.cloud.exception.ConcurrentOperationException: VM is being operated on. at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189) at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476) ... 1 more INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6) Fencer null returned true INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) Processing HAWork[12-HA-25-Running-Investigating] ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6) Terminating HAWork[6-HA-8-Running-Investigating] com.cloud.utils.exception.CloudRuntimeException: Caught exception even though it should be handled. at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485) at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) Caused by: com.cloud.exception.ConcurrentOperationException: VM is being operated on. at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189) at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476) ... 1 more INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) HA on VM[User|InsightJenkins01] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) VM VM[User|InsightJenkins01] has been changed. Current State = Stopped Previous State = Running last updated = 5 previous updated = 3 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) Completed HAWork[12-HA-25-Running-Investigating] INFO [xen.resource.XenServer56FP1Resource] (DirectAgent-316:) Fence command for VM r-10-VM INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) Processing HAWork[13-HA-26-Running-Investigating] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) Processing HAWork[14-HA-27-Running-Investigating] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) HA on VM[User|InsightJenkins02] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) HA on VM[User|InsightsQA01] INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) SimpleInvestigator found VM[User|InsightJenkins02]to be alive? null INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) SimpleInvestigator found VM[User|InsightsQA01]to be alive? null INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7) Fencer null returned true ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7) Terminating HAWork[7-HA-10-Running-Investigating] Thanks. -- - Md. Iliyas Shirol