Hello,

Are you sure you can connect from the hypervisors to the cloudstack-management 
on the host and port specified in the agent.properties?

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

----- Original Message -----
> From: "Indra Pramana" <in...@sg.or.id>
> To: users@cloudstack.apache.org
> Sent: Thursday, 31 March, 2016 03:14:59
> Subject: URGENT - CloudStack agent not able to connect to management server

> Dear all,
> 
> We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. All our
> agents got disconnected from the management server and unable to connect
> again, despite rebooting the management server and stopping and restarting
> the cloudstack-agent many times.
> 
> We even tried to physically reboot a hypervisor host (sacrificing all the
> running VMs inside) to see if it can reconnect after boot-up, and it's not
> able to reconnect (keep on "Connecting" state). Here's the excerpts from
> the logs:
> 
> ====
> 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> Sending ping: Seq 0-11:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> }
> 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null)
> Received response: Seq 0-11:  { Ans: , MgmtId: 161342671900, via: 75, Ver:
> v1, Flags: 100010,
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> }
> 2016-03-31 10:08:49,271 DEBUG [kvm.resource.LibvirtComputingResource]
> (UgentTask-5:null) Executing:
> /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> get_rule_logs_for_vms
> 2016-03-31 10:08:49,350 DEBUG [kvm.resource.LibvirtComputingResource]
> (UgentTask-5:null) Execution is successful.
> 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> Sending ping: Seq 0-12:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> }
> 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
> Received response: Seq 0-12:  { Ans: , MgmtId: 161342671900, via: 75, Ver:
> v1, Flags: 100010,
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> }
> 2016-03-31 10:09:49,272 DEBUG [kvm.resource.LibvirtComputingResource]
> (UgentTask-5:null) Executing:
> /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> get_rule_logs_for_vms
> 2016-03-31 10:09:49,345 DEBUG [kvm.resource.LibvirtComputingResource]
> (UgentTask-5:null) Execution is successful.
> 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> Sending ping: Seq 0-13:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> }
> 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null)
> Received response: Seq 0-13:  { Ans: , MgmtId: 161342671900, via: 75, Ver:
> v1, Flags: 100010,
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> }
> ====
> 
> On the existing hypervisor hosts, normally the agent would stuck at this
> stage and from Cloudstack GUI, we don't see the agent in "Connecting"
> state, it will be either on "Disconnected" or "Alert" state.
> 
> ====
> 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null) Executing:
> /bin/bash -c uname -r
> 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null) Execution
> is successful.
> 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding
> shutdown hook
> 2016-03-31 07:37:09,833 INFO  [cloud.agent.Agent] (main:null) Agent [id =
> 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers = 5 :
> host = 10.x.x.x : port = 8250
> 2016-03-31 07:37:09,856 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> Connecting to 10.x.x.x:8250
> 2016-03-31 07:37:10,178 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> SSL: Handshake done
> 2016-03-31 07:37:10,179 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> Connected to 10.x.x.x:8250
> ====
> 
> No other significant and useful logs found on both the agents and
> management server logs.
> 
> Anyone can give a clue on what could be the problem? Have been trying to
> reconnect in the past couple of hours without any issues. Any help is
> greatly appreciated.
> 
> Looking forward to your reply, thnk you.
> 
> Cheers.
> 
> -ip-

Reply via email to