We found that secondary storage nfs was not working well. We then restore the nfs service and rebooting the machine on alert. Now it works!

Il 26/01/2015 16:32, Somesh Naidu ha scritto:
 From the logs it appears that agent got connected but can't say what happened 
next. Need further logs.

There are quite a few things that you could verify/check, like,
1. netstat shows a connection between mgmt. server (on port 8250) and systemvm.
2. the disk on the systemvm hasn't run out of space.

You could perform a stop/start of the VM to see if that recovers from the 
situation.

You may also try various other checks including running the health check script 
mentioned here, 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting.

Regards,
Somesh

-----Original Message-----
From: Ugo Vasi [mailto:ugo.v...@procne.it]
Sent: Monday, January 26, 2015 10:05 AM
To: users@cloudstack.apache.org
Subject: agent host in alert state

Hi all,
we have installed a cloudstack 4.3.0 in advanced network mode on ubuntu
systems with only kvm hypervisor.

Today we received these series of notification (email):
   1) Host disconnected, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
   2) Host is down, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
   3) Host disconnected, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
   4) Host is down, name: agent_name (id:7), availability zone:
zone_name, pod: pod_name
   5) Unable to restart vm_name which was running on host name:
agent_name(id:7), availability zone: zone_name, pod: pod_name

The server agent was not shut down nor rebooted and the virtual machines
are still running.

In agent log I found messages like these:

2015-01-26 15:04:40,728 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
2015-01-26 15:04:45,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Cannot connect because we still have 5 commands in progress.
2015-01-26 15:04:50,729 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...


I tried to restart the agent service and after some minutes the log says:

2015-01-26 15:05:42,207 INFO  [utils.nio.NioClient]
(Agent-Selector:null) Connecting to manager_ip:8250
2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient]
(Agent-Selector:null) SSL: Handshake done
2015-01-26 15:05:42,489 INFO  [utils.nio.NioClient]
(Agent-Selector:null) Connected to manager_ip:8250

But in manager interface I see this agent in Alert state.

Any idea to resolve this problem?




--

  U g o   V a s i    <ugo.v...@procne.it>
  P r o c n e  s.r.l    >)
  via Cotonificio 45  33010 Tavagnacco IT
  phone: +390432486523 fax: +390432486523

Le informazioni contenute in questo messaggio sono riservate e
confidenziali ed è vietata la diffusione in qualunque modo eseguita.
Qualora Lei non fosse la persona a cui il presente messaggio è
destinato, La invitiamo ad eliminarlo e a non leggerlo, dandocene
gentilmente comunicazione.
Per qualsiasi informazione si prega di contattare supp...@procne.it .
Rif. D.L. 196/2003




Reply via email to