Hi all. I'm testing HA and when I power down one node, VM with ha offering enabled does not start on other node.
env used: management + 2 hypervisors + nfs server management/hypervisors: CentOS 6.4 + CS-4.2(revision 2852) + advanced zone + KVM I have the following nullpointerexception in logs: 2013-09-10 15:06:48,583 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) Processing HAWork[28-HA-246-Running-Investigating] 2013-09-10 15:06:48,586 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) HA on VM[User|hahaha] 2013-09-10 15:06:48,588 DEBUG [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-0:work-28) Unable to reach the agent for VM[User|hahaha]: Resource [Host:4] is unreachable: Host 4: Host with specified id is not in the right state: Down 2013-09-10 15:06:48,588 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) SimpleInvestigator found VM[User|hahaha]to be alive? null 2013-09-10 15:06:48,593 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) XenServerInvestigator found VM[User|hahaha]to be alive? null 2013-09-10 15:06:48,593 DEBUG [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-28) testing if VM[User|hahaha] is alive 2013-09-10 15:06:48,598 DEBUG [agent.manager.AgentManagerImpl] (HA-Worker-0:work-28) Host with id null doesn't exist 2013-09-10 15:06:48,598 DEBUG [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-28) VM[User|hahaha] could not be pinged, returning that it is unknown 2013-09-10 15:06:48,599 DEBUG [agent.transport.Request] (HA-Worker-0:work-28) Seq 1-1213400199: Sending { Cmd , MgmtId: 161332943028, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_routerIp":"169.254.0.190","_privateIp":"10.10.10.17","wait":20}}] } 2013-09-10 15:06:52,804 DEBUG [agent.transport.Request] (HA-Worker-0:work-28) Seq 1-1213400199: Received: { Ans: , MgmtId: 161332943028, via: 1, Ver: v1, Flags: 10, { Answer } } 2013-09-10 15:06:52,804 DEBUG [agent.manager.AgentManagerImpl] (HA-Worker-0:work-28) Details from executing class com.cloud.agent.api.PingTestCommand: PING 10.10.10.17 (10.10.10.17): 56 data bytes64 bytes from 10.10.10.222: Destination Host UnreachableVr HL TOS Len ID Flg off TTL Pro cks Src Dst Data 4 5 00 5400 0000 0 0040 40 01 a711 10.10.10.222 10.10.10.17 --- 10.10.10.17 ping statistics ---1 packets transmitted, 0 packets received, 100% packet lossUnable to ping the vm, exiting 2013-09-10 15:06:52,804 DEBUG [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-28) VM[User|hahaha] could not be pinged, returning that it is unknown 2013-09-10 15:06:52,804 DEBUG [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-28) Returning null since we're unable to determine state of VM[User|hahaha] 2013-09-10 15:06:52,804 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) null found VM[User|hahaha]to be alive? null 2013-09-10 15:06:52,804 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-28) Not a System Vm, unable to determine state of VM[User|hahaha] returning null 2013-09-10 15:06:52,804 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-28) Testing if VM[User|hahaha] is alive 2013-09-10 15:06:52,808 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-28) Unable to find a management nic, cannot ping this system VM, unable to determine state of VM[User|hahaha] returning null 2013-09-10 15:06:52,808 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) null found VM[User|hahaha]to be alive? null 2013-09-10 15:06:52,812 DEBUG [agent.transport.Request] (HA-Worker-0:work-28) Seq 1-1213400206: Sending { Cmd , MgmtId: 161332943028, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"6807c438-876d-3f73-ba01-8ad718fd774d-LibvirtComputingResource","privateNetwork":{"ip":"77.72.128.116","netmask":"255.255.255.240","mac":"00:25:90:36:20:6a","isSecurityGroupEnabled":false},"storageNetwork1":{"ip":"77.72.128.116","netmask":"255.255.255.240","mac":"00:25:90:36:20:6a","isSecurityGroupEnabled":false}},"wait":20}}] } 2013-09-10 15:06:52,921 DEBUG [agent.transport.Request] (HA-Worker-0:work-28) Seq 1-1213400206: Received: { Ans: , MgmtId: 161332943028, via: 1, Ver: v1, Flags: 10, { Answer } } 2013-09-10 15:06:52,921 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) KVMInvestigator found VM[User|hahaha]to be alive? null 2013-09-10 15:06:52,921 DEBUG [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) Fencing off VM that we don't know the state of 2013-09-10 15:06:52,921 DEBUG [cloud.ha.XenServerFencer] (HA-Worker-0:work-28) Don't know how to fence non XenServer hosts KVM 2013-09-10 15:06:52,921 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) Fencer null returned null 2013-09-10 15:06:52,926 DEBUG [agent.transport.Request] (HA-Worker-0:work-28) Seq 1-1213400207: Sending { Cmd , MgmtId: 161332943028, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.FenceCommand":{"vmName":"i-2-246-VM","hostGuid":"6807c438-876d-3f73-ba01-8ad718fd774d-LibvirtComputingResource","hostIp":"77.72.128.116","inSeq":false,"wait":0}}] } 2013-09-10 15:06:53,038 DEBUG [agent.transport.Request] (HA-Worker-0:work-28) Seq 1-1213400207: Received: { Ans: , MgmtId: 161332943028, via: 1, Ver: v1, Flags: 10, { FenceAnswer } } 2013-09-10 15:06:53,038 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) Fencer KVMFenceBuilder returned true 2013-09-10 15:06:53,046 DEBUG [cloud.capacity.CapacityManagerImpl] (HA-Worker-0:work-28) VM state transitted from :Running to Stopping with event: StopRequestedvm's original host id: 4 new host id: 4 host id before state transition: 4 2013-09-10 15:06:53,048 DEBUG [cloud.vm.UserVmManagerImpl] (HA-Worker-0:work-28) Collect vm disk statistics from host before stopping Vm 2013-09-10 15:06:53,052 DEBUG [agent.manager.AgentManagerImpl] (HA-Worker-0:work-28) Can not send command com.cloud.agent.api.GetVmDiskStatsCommand due to Host 4 is not up 2013-09-10 15:06:53,054 WARN [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-28) Unable to stop vm, agent unavailable: com.cloud.exception.AgentUnavailableException: Resource [Host:4] is unreachable: Host 4: Host with specified id is not in the right state: Down 2013-09-10 15:06:53,055 WARN [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-28) Unable to actually stop VM[User|hahaha] but continue with release because it's a force stop 2013-09-10 15:06:53,058 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-28) VM[User|hahaha] is stopped on the host. Proceeding to release resource held. 2013-09-10 15:06:53,062 DEBUG [cloud.network.NetworkModelImpl] (HA-Worker-0:work-28) Service SecurityGroup is not supported in the network id=205 2013-09-10 15:06:53,065 DEBUG [cloud.network.NetworkManagerImpl] (HA-Worker-0:work-28) Changing active number of nics for network id=205 on -1 2013-09-10 15:06:53,070 DEBUG [cloud.network.NetworkManagerImpl] (HA-Worker-0:work-28) Asking VirtualRouter to release Nic[942-246-9bc94718-8d0d-4463-83c4-7780cdfbe7d9-10.10.10.17] 2013-09-10 15:06:53,070 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-28) Successfully released network resources for the vm VM[User|hahaha] 2013-09-10 15:06:53,071 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-28) Successfully released storage resources for the vm VM[User|hahaha] 2013-09-10 15:06:53,084 DEBUG [cloud.network.NetworkModelImpl] (HA-Worker-0:work-28) Service SecurityGroup is not supported in the network id=205 2013-09-10 15:06:53,088 DEBUG [cloud.network.NetworkModelImpl] (HA-Worker-0:work-28) Service SecurityGroup is not supported in the network id=205 2013-09-10 15:06:53,096 DEBUG [cloud.capacity.CapacityManagerImpl] (HA-Worker-0:work-28) VM state transitted from :Stopping to Stopped with event: OperationSucceededvm's original host.. 2013-09-10 15:06:53,114 ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-28) Terminating HAWork[28-HA-246-Running-Scheduled] java.lang.NullPointerException at com.cloud.storage.VolumeManagerImpl.canVmRestartOnAnotherServer(VolumeManagerImpl.java:2641) at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:516) at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) Is it a bug ? -- Regards, Valery http://protocol.by/slayer