Hi Sanjeev and Rafael, Good day to you, and thank you for your replies and advice.
We are getting a new management server and HA proxy load balancers. Will see if this can resolve the problem. Thank you. On Tue, Apr 5, 2016 at 8:24 PM, Rafael Weingärtner < rafaelweingart...@gmail.com> wrote: > How many hosts (hypervisors) are you managing with a single MS? > > If you add new MSs, you need to balance their (HTTP 8080 and TCP 8250) > access with something like the HA proxy load balancer. > > > > On Tue, Apr 5, 2016 at 2:09 AM, Sanjeev Neelarapu < > sanjeev.neelar...@accelerite.com> wrote: > > > Adding additional management server would definitely help. > > > > Best Regards, > > Sanjeev N > > Chief Product Engineer, Accelerite > > Off: +91 40 6722 9368 | EMail: sanjeev.neelar...@accelerite.com > > > > > > -----Original Message----- > > From: Indra Pramana [mailto:in...@sg.or.id] > > Sent: Sunday, April 03, 2016 5:14 PM > > To: users@cloudstack.apache.org > > Subject: Re: URGENT - CloudStack agent not able to connect to management > > server > > > > Hi Lucian, > > > > Good day to you, and thank you for your reply. Apologise for the delay in > > my reply. > > > > Yes, I can confirm that we can access the host and port specified. Based > > on the logs, the host can connect to the management server but there's no > > follow-up logs which usually come after it's connected. Eventually, we > > could only connect back the host after we rebooted it, which means > > sacrificing all the VMs which were still up and running during the > > disconnection. > > > > At the time when the first hypervisor was disconnected, the CloudStack > > management servers were very busy handling the disconnections, trying to > > fence the hosts and initiate HA for all the affected VMs, based on the > > logs. Could this have put a strain on the management server, causing it > to > > disconnect all the remaining hosts? Will adding new management server be > > able to resolve the problem? > > > > Any advice is appreciated. > > > > Looking forward to your reply, thank you. > > > > Cheers. > > > > On Thu, Mar 31, 2016 at 5:28 PM, Nux! <n...@li.nux.ro> wrote: > > > > > Hello, > > > > > > Are you sure you can connect from the hypervisors to the > > > cloudstack-management on the host and port specified in the > > > agent.properties? > > > > > > -- > > > Sent from the Delta quadrant using Borg technology! > > > > > > Nux! > > > www.nux.ro > > > > > > ----- Original Message ----- > > > > From: "Indra Pramana" <in...@sg.or.id> > > > > To: users@cloudstack.apache.org > > > > Sent: Thursday, 31 March, 2016 03:14:59 > > > > Subject: URGENT - CloudStack agent not able to connect to management > > > server > > > > > > > Dear all, > > > > > > > > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. > > > > All > > > our > > > > agents got disconnected from the management server and unable to > > > > connect again, despite rebooting the management server and stopping > > > > and > > > restarting > > > > the cloudstack-agent many times. > > > > > > > > We even tried to physically reboot a hypervisor host (sacrificing > > > > all the running VMs inside) to see if it can reconnect after > > > > boot-up, and it's > > > not > > > > able to reconnect (keep on "Connecting" state). Here's the excerpts > > > > from the logs: > > > > > > > > ==== > > > > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > > > > Sending ping: Seq 0-11: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: > > > > 11, > > > > > > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState > > > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true, > > > "hostType":"Routing","hostId":0,"wait":0}}] > > > > } > > > > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent] > > > > (Agent-Handler-2:null) Received response: Seq 0-11: { Ans: , > > > > MgmtId: 161342671900, via: 75, > > > Ver: > > > > v1, Flags: 100010, > > > > > > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing"," > > > hostId":0,"wait":0},"result":true,"wait":0}}] > > > > } > > > > 2016-03-31 10:08:49,271 DEBUG > > > > [kvm.resource.LibvirtComputingResource] > > > > (UgentTask-5:null) Executing: > > > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py > > > > get_rule_logs_for_vms > > > > 2016-03-31 10:08:49,350 DEBUG > > > > [kvm.resource.LibvirtComputingResource] > > > > (UgentTask-5:null) Execution is successful. > > > > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > > > > Sending ping: Seq 0-12: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: > > > > 11, > > > > > > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState > > > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true, > > > "hostType":"Routing","hostId":0,"wait":0}}] > > > > } > > > > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent] > > > > (Agent-Handler-3:null) Received response: Seq 0-12: { Ans: , > > > > MgmtId: 161342671900, via: 75, > > > Ver: > > > > v1, Flags: 100010, > > > > > > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing"," > > > hostId":0,"wait":0},"result":true,"wait":0}}] > > > > } > > > > 2016-03-31 10:09:49,272 DEBUG > > > > [kvm.resource.LibvirtComputingResource] > > > > (UgentTask-5:null) Executing: > > > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py > > > > get_rule_logs_for_vms > > > > 2016-03-31 10:09:49,345 DEBUG > > > > [kvm.resource.LibvirtComputingResource] > > > > (UgentTask-5:null) Execution is successful. > > > > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > > > > Sending ping: Seq 0-13: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: > > > > 11, > > > > > > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState > > > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true, > > > "hostType":"Routing","hostId":0,"wait":0}}] > > > > } > > > > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent] > > > > (Agent-Handler-4:null) Received response: Seq 0-13: { Ans: , > > > > MgmtId: 161342671900, via: 75, > > > Ver: > > > > v1, Flags: 100010, > > > > > > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing"," > > > hostId":0,"wait":0},"result":true,"wait":0}}] > > > > } > > > > ==== > > > > > > > > On the existing hypervisor hosts, normally the agent would stuck at > > > > this stage and from Cloudstack GUI, we don't see the agent in > > "Connecting" > > > > state, it will be either on "Disconnected" or "Alert" state. > > > > > > > > ==== > > > > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null) > > > Executing: > > > > /bin/bash -c uname -r > > > > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null) > > > > Execution is successful. > > > > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding > > > > shutdown hook > > > > 2016-03-31 07:37:09,833 INFO [cloud.agent.Agent] (main:null) Agent > > > > [id = > > > > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers = > > 5 : > > > > host = 10.x.x.x : port = 8250 > > > > 2016-03-31 07:37:09,856 INFO [utils.nio.NioClient] > > > > (Agent-Selector:null) Connecting to 10.x.x.x:8250 > > > > 2016-03-31 07:37:10,178 INFO [utils.nio.NioClient] > > > > (Agent-Selector:null) > > > > SSL: Handshake done > > > > 2016-03-31 07:37:10,179 INFO [utils.nio.NioClient] > > > > (Agent-Selector:null) Connected to 10.x.x.x:8250 ==== > > > > > > > > No other significant and useful logs found on both the agents and > > > > management server logs. > > > > > > > > Anyone can give a clue on what could be the problem? Have been > > > > trying to reconnect in the past couple of hours without any issues. > > > > Any help is greatly appreciated. > > > > > > > > Looking forward to your reply, thnk you. > > > > > > > > Cheers. > > > > > > > > -ip- > > > > > > > > > > > DISCLAIMER > > ========== > > This e-mail may contain privileged and confidential information which is > > the property of Accelerite, a Persistent Systems business. It is intended > > only for the use of the individual or entity to which it is addressed. If > > you are not the intended recipient, you are not authorized to read, > retain, > > copy, print, distribute or use this message. If you have received this > > communication in error, please notify the sender and delete all copies of > > this message. Accelerite, a Persistent Systems business does not accept > any > > liability for virus infected mails. > > > > > > -- > Rafael Weingärtner >