The log says that host 1 is in 'Alert' state. Is this the host where VMs got stopped? Was there any issue with the host like it went down, or lost connectivity with MS etc.? Share the full logs if possible.
-Koushik On 24-Dec-2013, at 2:46 PM, iliyas shirol <iliyas.shi...@gmail.com> wrote: > Hi, > > > There have been an unusual behavior in our private cloud environment built > using CloudStack-4.2 from past couple of days. We have been observing that > VM's stops unexpectedly on the hosts. The VM's have been launched with > offerings having *HAEnable=False*. Has someone else encountered similar > kind of behavior ? > > Following are the excerpts of logs from the management server, > > WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop > a VM due to Resource [Host:1] is unreachable: Host 1: Host with s > pecified id is not in the right state: Alert > WARN [apache.cloudstack.alerts] (DirectAgent-25:) alertType:: 8 // > dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: > InsightsDev01, id: 23) stopped unexpectedly on host id:1, availability > zone id:1, pod id:1 > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) > Processing HAWork[11-HA-24-Running-Investigating] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) HA on > VM[User|InsightsDev02] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) > SimpleInvestigator found VM[User|InsightsDev02]to be alive? null > WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop > vm, agent unavailable: com.cloud.exception.AgentUnavailableExcept > ion: Resource [Host:1] is unreachable: Host 1: Host with specified id is > not in the right state: Alert > WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to > actually stop VM[User|InsightsDev01] but continue with release because > it's a force stop > INFO [xen.resource.XenServer56FP1Resource] (DirectAgent-44:) Fence command > for VM i-5-20-VM > INFO [cloud.ha.HighAvailabilityManagerImpl] (DirectAgent-25:) Schedule vm > for HA: VM[User|InsightsDev01] > WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop > a VM due to Resource [Host:1] is unreachable: Host 1: Host with specified > id is not in the right state: Alert > WARN [apache.cloudstack.alerts] (DirectAgent-25:) alertType:: 8 // > dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name: > InsightJenkins01, id: 25) stopped unexpectedly on host id:1, availability > zone id:1, pod id:1 > INFO [xen.resource.XenServer56FP1Resource] (DirectAgent-107:) Fence > command for VM r-8-VM > WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop > vm, agent unavailable: com.cloud.exception.AgentUnavailableException: > Resource [Host:1] is unreachable: Host 1: Host with specified id is not in > the right state: Alert > WARN [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to > actually stop VM[User|InsightJenkins01] but continue with release because > it's a force stop > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9) Fencer > null returned true > ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9) > Terminating HAWork[9-HA-20-Running-Investigating] > com.cloud.utils.exception.CloudRuntimeException: Caught exception even > though it should be handled. > at > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485) > at > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) > Caused by: com.cloud.exception.ConcurrentOperationException: VM is being > operated on. > at > com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189) > at > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476) > ... 1 more > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6) Fencer > null returned true > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) > Processing HAWork[12-HA-25-Running-Investigating] > ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6) > Terminating HAWork[6-HA-8-Running-Investigating] > com.cloud.utils.exception.CloudRuntimeException: Caught exception even > though it should be handled. > at > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485) > at > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) > Caused by: com.cloud.exception.ConcurrentOperationException: VM is being > operated on. > at > com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189) > at > com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476) > ... 1 more > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) HA on > VM[User|InsightJenkins01] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) VM > VM[User|InsightJenkins01] has been changed. Current State = Stopped > Previous State = Running last updated = 5 previous updated = 3 > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) > Completed HAWork[12-HA-25-Running-Investigating] > INFO [xen.resource.XenServer56FP1Resource] (DirectAgent-316:) Fence > command for VM r-10-VM > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) > Processing HAWork[13-HA-26-Running-Investigating] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) > Processing HAWork[14-HA-27-Running-Investigating] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) HA on > VM[User|InsightJenkins02] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) HA on > VM[User|InsightsQA01] > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) > SimpleInvestigator found VM[User|InsightJenkins02]to be alive? null > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) > SimpleInvestigator found VM[User|InsightsQA01]to be alive? null > INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7) Fencer > null returned true > ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7) > Terminating HAWork[7-HA-10-Running-Investigating] > > Thanks. > > -- > - > Md. Iliyas Shirol