Hi,

There have been an unusual behavior in our private cloud environment built
using CloudStack-4.2 from past couple of days. We have been observing that
VM's stops unexpectedly on the hosts. The VM's have been launched with
offerings having *HAEnable=False*. Has someone else encountered similar
kind of behavior ?

Following are the excerpts of logs from the management server,

WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
a VM due to Resource [Host:1] is unreachable: Host 1: Host with s
pecified id is not in the right state: Alert
WARN  [apache.cloudstack.alerts] (DirectAgent-25:)  alertType:: 8 //
dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name:
 InsightsDev01, id: 23) stopped unexpectedly on host id:1, availability
zone id:1, pod id:1
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11)
Processing HAWork[11-HA-24-Running-Investigating]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11) HA on
VM[User|InsightsDev02]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-11)
SimpleInvestigator found VM[User|InsightsDev02]to be alive? null
WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
vm, agent unavailable: com.cloud.exception.AgentUnavailableExcept
ion: Resource [Host:1] is unreachable: Host 1: Host with specified id is
not in the right state: Alert
WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to
actually stop VM[User|InsightsDev01] but continue with release because
 it's a force stop
INFO  [xen.resource.XenServer56FP1Resource] (DirectAgent-44:) Fence command
for VM i-5-20-VM
INFO  [cloud.ha.HighAvailabilityManagerImpl] (DirectAgent-25:) Schedule vm
for HA:  VM[User|InsightsDev01]
WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
a VM due to Resource [Host:1] is unreachable: Host 1: Host with specified
id is not in the right state: Alert
WARN  [apache.cloudstack.alerts] (DirectAgent-25:)  alertType:: 8 //
dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM (name:
InsightJenkins01, id: 25) stopped unexpectedly on host id:1, availability
zone id:1, pod id:1
INFO  [xen.resource.XenServer56FP1Resource] (DirectAgent-107:) Fence
command for VM r-8-VM
WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to stop
vm, agent unavailable: com.cloud.exception.AgentUnavailableException:
Resource [Host:1] is unreachable: Host 1: Host with specified id is not in
the right state: Alert
WARN  [cloud.vm.VirtualMachineManagerImpl] (DirectAgent-25:) Unable to
actually stop VM[User|InsightJenkins01] but continue with release because
it's a force stop
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9) Fencer
null returned true
ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-9)
Terminating HAWork[9-HA-20-Running-Investigating]
com.cloud.utils.exception.CloudRuntimeException: Caught exception even
though it should be handled.
        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485)
        at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)
Caused by: com.cloud.exception.ConcurrentOperationException: VM is being
operated on.
        at
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189)
        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476)
        ... 1 more
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6) Fencer
null returned true
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12)
Processing HAWork[12-HA-25-Running-Investigating]
ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-6)
Terminating HAWork[6-HA-8-Running-Investigating]
com.cloud.utils.exception.CloudRuntimeException: Caught exception even
though it should be handled.
        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:485)
        at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)
Caused by: com.cloud.exception.ConcurrentOperationException: VM is being
operated on.
        at
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1189)
        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476)
        ... 1 more
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) HA on
VM[User|InsightJenkins01]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12) VM
VM[User|InsightJenkins01] has been changed.  Current State = Stopped
Previous State = Running last updated = 5 previous updated = 3
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-12)
Completed HAWork[12-HA-25-Running-Investigating]
INFO  [xen.resource.XenServer56FP1Resource] (DirectAgent-316:) Fence
command for VM r-10-VM
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13)
Processing HAWork[13-HA-26-Running-Investigating]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14)
Processing HAWork[14-HA-27-Running-Investigating]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13) HA on
VM[User|InsightJenkins02]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14) HA on
VM[User|InsightsQA01]
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:work-13)
SimpleInvestigator found VM[User|InsightJenkins02]to be alive? null
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-14)
SimpleInvestigator found VM[User|InsightsQA01]to be alive? null
INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7) Fencer
null returned true
ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-7)
Terminating HAWork[7-HA-10-Running-Investigating]

Thanks.

-- 
-
Md. Iliyas Shirol

Reply via email to