Looks like the KVM investigator is not able to determine the state of the 
agent. Can you share the full log?

> -----Original Message-----
> From: Valery Ciareszka [mailto:valery.teres...@gmail.com]
> Sent: Thursday, July 11, 2013 7:47 PM
> To: users
> Subject: cs 4.1 host disconnected status
> 
> Hi all.
> 
> I use the following environment: CS 4.1, KVM, Centos 6.4
> (management+node1+node2), OpenIndiana NFS server as primary and
> secondary storage.
> and I have the following problem:
> If I switch one hypervisor node off via ipmi (simulate server crash), it never
> goes to Disconnected status in management. Accordingly, ha-enabled VMs
> are not restarted on another hypervisor node, because it believes that
> disconnected node is still online.
> 
> 
> I get following in management server logs:
> 
> 2013-07-11 10:19:16,153 DEBUG [agent.transport.Request]
> (AgentManager-Handler-13:null) Seq 19-1133189098:             Processing:
>  { Ans: , MgmtId: 161603152803976, via: 19, Ver: v1, Flags: 10,
> [{"Answer":{"result":false,"details":     "Unable to ping computing host,
> exiting","wait":0}}] }
> 2013-07-11 10:19:16,153 DEBUG [agent.transport.Request]
> (AgentTaskPool-1:null) Seq 19-1133189098: Received:  { Ans: , MgmtId:
> 161603152803976, via: 19, Ver: v1, Flags: 10, { Answer } }
> 2013-07-11 10:19:16,153 DEBUG [cloud.ha.AbstractInvestigatorImpl]
> (AgentTaskPool-1:null) host (172.16.20.241) cannot  be pinged, returning null
> ('I don't know')
> 2013-07-11 10:19:16,153 DEBUG [cloud.ha.UserVmDomRInvestigator]
> (AgentTaskPool-1:null) could not reach agent, could   not reach agent's
> host, returning that we don't have enough information
> 2013-07-11 10:19:16,153 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:null) null unable to determine  the state of the host.
>  Moving on.
> 2013-07-11 10:19:16,153 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:null) null unable to determine  the state of the host.
>  Moving on.
> 2013-07-11 10:19:16,153 WARN  [agent.manager.AgentManagerImpl]
> (AgentTaskPool-1:null) Agent state cannot be           determined, do
> nothing
> 
> 
> If I power on dead node, it goes to state "Connecting" and then "Up" in
> management interface.
> 
> 2013-07-11 13:57:24,311 DEBUG [cloud.host.Status] (Thread-6:null) Ping
> timeout for host 12, do invstigation
> 2013-07-11 13:58:24,315 DEBUG [cloud.host.Status] (Thread-6:null) Ping
> timeout for host 12, do invstigation
> 2013-07-11 13:59:24,320 DEBUG [cloud.host.Status] (Thread-6:null) Ping
> timeout for host 12, do invstigation
> 2013-07-11 13:59:57,239 DEBUG [cloud.host.Status]
> (AgentConnectTaskPool-5:null) Transition:[Resource state = Enabled, Agent
> event = AgentConnected, Host id = 12, name = ad112.colobridge.net]
> 2013-07-11 13:59:57,264 DEBUG [cloud.host.Status]
> (AgentConnectTaskPool-5:null) Agent status update: [id = 12; name =
> ad112.colobridge.net; old status = Up; event = AgentConnected; new status
> = Connecting; old update count = 1285; new update count = 1286]
> 2013-07-11 14:00:50,611 DEBUG [cloud.host.Status]
> (AgentConnectTaskPool-5:null) Transition:[Resource state = Enabled, Agent
> event = Ready, Host id = 12, name = ad112.colobridge.net]
> 2013-07-11 14:00:50,633 DEBUG [cloud.host.Status]
> (AgentConnectTaskPool-5:null) Agent status update: [id = 12; name =
> ad112.colobridge.net; old status = Connecting; event = Ready; new status =
> Up; old update count = 1286; new update count = 1287]
> 
> 
> If I restart cloud-management service, dead node goes to state
> "Disconnected" in management interface.
> (there is nothing special in logs in this case)
> 
> If I do nothing,  dead node could stay in "Up" state forever (I waited for
> 12 hours) in management interface, throwing into logs "Agent state cannot
> be determined, do nothing"
> 
> Would appreciate if someone could help/suggest how to deal with this
> problem.
> 
> --
> Regards,
> Valery
> 
> http://protocol.by/slayer

Reply via email to