Hello, Are you sure you can connect from the hypervisors to the cloudstack-management on the host and port specified in the agent.properties?
-- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ----- Original Message ----- > From: "Indra Pramana" <in...@sg.or.id> > To: users@cloudstack.apache.org > Sent: Thursday, 31 March, 2016 03:14:59 > Subject: URGENT - CloudStack agent not able to connect to management server > Dear all, > > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. All our > agents got disconnected from the management server and unable to connect > again, despite rebooting the management server and stopping and restarting > the cloudstack-agent many times. > > We even tried to physically reboot a hypervisor host (sacrificing all the > running VMs inside) to see if it can reconnect after boot-up, and it's not > able to reconnect (keep on "Connecting" state). Here's the excerpts from > the logs: > > ==== > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > Sending ping: Seq 0-11: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] > } > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) > Received response: Seq 0-11: { Ans: , MgmtId: 161342671900, via: 75, Ver: > v1, Flags: 100010, > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}] > } > 2016-03-31 10:08:49,271 DEBUG [kvm.resource.LibvirtComputingResource] > (UgentTask-5:null) Executing: > /usr/share/cloudstack-common/scripts/vm/network/security_group.py > get_rule_logs_for_vms > 2016-03-31 10:08:49,350 DEBUG [kvm.resource.LibvirtComputingResource] > (UgentTask-5:null) Execution is successful. > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > Sending ping: Seq 0-12: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] > } > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) > Received response: Seq 0-12: { Ans: , MgmtId: 161342671900, via: 75, Ver: > v1, Flags: 100010, > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}] > } > 2016-03-31 10:09:49,272 DEBUG [kvm.resource.LibvirtComputingResource] > (UgentTask-5:null) Executing: > /usr/share/cloudstack-common/scripts/vm/network/security_group.py > get_rule_logs_for_vms > 2016-03-31 10:09:49,345 DEBUG [kvm.resource.LibvirtComputingResource] > (UgentTask-5:null) Execution is successful. > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > Sending ping: Seq 0-13: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] > } > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) > Received response: Seq 0-13: { Ans: , MgmtId: 161342671900, via: 75, Ver: > v1, Flags: 100010, > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}] > } > ==== > > On the existing hypervisor hosts, normally the agent would stuck at this > stage and from Cloudstack GUI, we don't see the agent in "Connecting" > state, it will be either on "Disconnected" or "Alert" state. > > ==== > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null) Executing: > /bin/bash -c uname -r > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null) Execution > is successful. > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding > shutdown hook > 2016-03-31 07:37:09,833 INFO [cloud.agent.Agent] (main:null) Agent [id = > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers = 5 : > host = 10.x.x.x : port = 8250 > 2016-03-31 07:37:09,856 INFO [utils.nio.NioClient] (Agent-Selector:null) > Connecting to 10.x.x.x:8250 > 2016-03-31 07:37:10,178 INFO [utils.nio.NioClient] (Agent-Selector:null) > SSL: Handshake done > 2016-03-31 07:37:10,179 INFO [utils.nio.NioClient] (Agent-Selector:null) > Connected to 10.x.x.x:8250 > ==== > > No other significant and useful logs found on both the agents and > management server logs. > > Anyone can give a clue on what could be the problem? Have been trying to > reconnect in the past couple of hours without any issues. Any help is > greatly appreciated. > > Looking forward to your reply, thnk you. > > Cheers. > > -ip-