[KVM] Agent hanging and disconnecting when libvirt doesn't respond

Wido den Hollander Thu, 11 Jul 2013 12:32:52 -0700

Hi,

The last two days I noticed an incident on a cluster where HA kicked inbecause a host was marked as down since the Agent disconnected.


The problem was that libvirt didn't respond to the call the agent was doing.

The underlying problem was that the Qemu/KVM process was having someissues and over the monitor socket never responded to libvirt and on histurn libvirt never responded to the Agent.


In the logs I saw:

Ping Interval has gone past 300000.  Attempting to reconnect.

DEBUG [utils.nio.NioConnection] (Agent-Selector:null) Closing socketSocket[addr=/XX.XX.XX.X,port=8250,localport=49098]

[cloud.agent.Agent] (UgentTask-6:null) Lost connection to the server.Dealing with the remaining commands...

[cloud.agent.Agent] (UgentTask-6:null) Cannot connect because we stillhave 1 commands in progress.

[cloud.agent.Agent] (UgentTask-6:null) Lost connection to the server.Dealing with the remaining commands...

[cloud.agent.Agent] (UgentTask-6:null) Cannot connect because we stillhave 1 commands in progress.

[cloud.agent.Agent] (UgentTask-6:null) Lost connection to the server.Dealing with the remaining commands...

[cloud.agent.Agent] (UgentTask-6:null) Cannot connect because we stillhave 1 commands in progress.

This kept going on and on and on until I restarted the Agent since thatcommand would never come through since libvirt was blocking.

For scripts we have a timeout, so when qemu-img doesn't complete in timewe give up, but for other commands like this we don't have such a timeout.

What I did as a test for now is breaking out of the loop where we waitfor any remaining commands and have the Agent reconnect. But I don'tknow if that is a good decision.

We are now assuming that libvirt always responds, but that is not thecase. It could be numbers of reasons why libvirt can't respond.


Any suggestions on how to handle this case?

Wido

[KVM] Agent hanging and disconnecting when libvirt doesn't respond

Reply via email to