The log says that host 1 is in 'Alert' state. Is this the host where VMs got 
stopped? Was there any issue with the host like it went down, or lost 
connectivity with MS etc.? Share the full logs if possible.

-Koushik

On 24-Dec-2013, at 2:46 PM, iliyas shirol <iliyas.shi...@gmail.com> wrote:

> Hi,
> 
> 
> There have been an unusual behavior in our private cloud environment built
> using CloudStack-4.2 from past couple of days. We have been observing that
> VM's stops unexpectedly on the hosts. The VM's have been launched with
> offerings having *HAEnable=False*. Has someone else encountered similar
> kind of behavior ?
> 
> Following are the excerpts of logs from the management server,
> 
> WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
> a VM due to Resource [Host:1] is unreachable: Host 1: Host with s
> pecified id is not in the right state: Alert
> WARN  [apache.cloudstack.alerts] (DirectAgent-25:)  alertType:: 8 //
> dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name:
> InsightsDev01, id: 23) stopped unexpectedly on host id:1, availability
> zone id:1, pod id:1
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11)
> Processing HAWork[11-HA-24-Running-Investigating]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) HA on
> VM[User|InsightsDev02]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11)
> SimpleInvestigator found VM[User|InsightsDev02]to be alive? null
> WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
> vm, agent unavailable: com.cloud.exception.AgentUnavailableExcept
> ion: Resource [Host:1] is unreachable: Host 1: Host with specified id is
> not in the right state: Alert
> WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to
> actually stop VM[User|InsightsDev01] but continue with release because
> it's a force stop
> INFO  [xen.resource.XenServer56FP1Resource] (DirectAgent-44:) Fence command
> for VM i-5-20-VM
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (DirectAgent-25:) Schedule vm
> for HA:  VM[User|InsightsDev01]
> WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
> a VM due to Resource [Host:1] is unreachable: Host 1: Host with specified
> id is not in the right state: Alert
> WARN  [apache.cloudstack.alerts] (DirectAgent-25:)  alertType:: 8 //
> dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name:
> InsightJenkins01, id: 25) stopped unexpectedly on host id:1, availability
> zone id:1, pod id:1
> INFO  [xen.resource.XenServer56FP1Resource] (DirectAgent-107:) Fence
> command for VM r-8-VM
> WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
> vm, agent unavailable: com.cloud.exception.AgentUnavailableException:
> Resource [Host:1] is unreachable: Host 1: Host with specified id is not in
> the right state: Alert
> WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to
> actually stop VM[User|InsightJenkins01] but continue with release because
> it's a force stop
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9) Fencer
> null returned true
> ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9)
> Terminating HAWork[9-HA-20-Running-Investigating]
> com.cloud.utils.exception.CloudRuntimeException: Caught exception even
> though it should be handled.
>        at
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485)
>        at
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)
> Caused by: com.cloud.exception.ConcurrentOperationException: VM is being
> operated on.
>        at
> com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189)
>        at
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476)
>        ... 1 more
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6) Fencer
> null returned true
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12)
> Processing HAWork[12-HA-25-Running-Investigating]
> ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6)
> Terminating HAWork[6-HA-8-Running-Investigating]
> com.cloud.utils.exception.CloudRuntimeException: Caught exception even
> though it should be handled.
>        at
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485)
>        at
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)
> Caused by: com.cloud.exception.ConcurrentOperationException: VM is being
> operated on.
>        at
> com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189)
>        at
> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476)
>        ... 1 more
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) HA on
> VM[User|InsightJenkins01]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) VM
> VM[User|InsightJenkins01] has been changed.  Current State = Stopped
> Previous State = Running last updated = 5 previous updated = 3
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12)
> Completed HAWork[12-HA-25-Running-Investigating]
> INFO  [xen.resource.XenServer56FP1Resource] (DirectAgent-316:) Fence
> command for VM r-10-VM
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13)
> Processing HAWork[13-HA-26-Running-Investigating]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14)
> Processing HAWork[14-HA-27-Running-Investigating]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) HA on
> VM[User|InsightJenkins02]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) HA on
> VM[User|InsightsQA01]
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13)
> SimpleInvestigator found VM[User|InsightJenkins02]to be alive? null
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14)
> SimpleInvestigator found VM[User|InsightsQA01]to be alive? null
> INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7) Fencer
> null returned true
> ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7)
> Terminating HAWork[7-HA-10-Running-Investigating]
> 
> Thanks.
> 
> -- 
> -
> Md. Iliyas Shirol

Reply via email to