Daren I'm also running 4.5.2 - and like the stability we get with it.
For the features we need, 4.5.2 - has everything that is required, so I dont see huge benefit of upgrading to latest ACS ATM. Also, our environments are very large and complex - so upgrade is not something I can take lightly. With that said, i do have a small 8 node Lab environment i can try the upgrade on, it consists of 4 ESXi and 4 KVM nodes - so it should be a fair test. Lets wait for Jacob to respond with his test of setting up IP/Netmask for eth1 router vm, if it does not help, i'll try to upgrade to see if i can reproduce the issue. Regards ilya On 7/28/16 9:43 PM, Darren Tang wrote: > Hi ilya: > I can confirm that issus, please check : > https://issues.apache.org/jira/browse/CLOUDSTACK-9144 > When we deployed cloudstack(4.6/4.7/4.8) with vmware(5.x/6.0) in basic > zone, The VR is nerver leaves the "starting" state. fell back to 4.5 is > fine. > Maybe you can test it by yourself. > > 2016-07-29 3:24 GMT+08:00 ilya <ilya.mailing.li...@gmail.com>: > >> I guess it would help to know what type of zone you use? >> >> Is it advanced, isolated vpc or shared network? what type of isolation? >> or perhaps basic zone? >> >> Lastly, try stopping the iptables and restarting cloud agent (via stop >> and start) >> >> Please see my response in-line >> >> On 7/28/16 6:58 AM, Jacob Seeley wrote: >>> Hi ilya, >>> >>> Funny you brought up debugging the router VM. After I responding >> yesterday, I did just that and I did find some odd things. >>> Just to be clear (I think we're on the same page), since I'm not the OP >> of this thread, the virtual router always gets deployed and it starts up >> just fine; however, CloudStack reports that it's always stuck in starting. >> VMs that get deployed ultimately fail. CloudStack reports the router >> version as UNKNOWN. >>> Before I provide what I found debugging the router VM, I'll address some >> of your points. >>> >>> ### FOLLOW-UP QUESTIONS ### >>> >>> " Another reason would be an issue of hypervisor accessing the NFS mount >> used for secondary storage." >>> I don't believe this is an issue. The hypervisor (VMware) does mount the >> secondary storage via NFS just fine. If this were an issue, I would think >> the Secondary Storage and Console VMs would not deploy. >>> >>> " Use console of vCenter to see what is happening on router vm. You can >> login locally with root/password and see the content of /var/log/cloud.out >> file, paste it on pastebin - if it makes no sense to you..." >>> It looks like to me that /var/log/cloud.out is only logged to when >> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script. >> As such, there isn't even a file for /var/log/cloud.out. Even when I set >> that variable, I never get anything logged to /var/log/cloud.out. However, >> there is a /var/log/cloud.log. Here is the contents of that: >> http://pastebin.com/aaTsRKZE >>> >>> " you can also run /etc/init.d/cloud stop and start.. that will give you >> a fresh start on logs.." >>> The service is in a failed state. It's worth noting that this service is >> in a started state on the Console and Secondary Storage VMs. >> >> this is concerning - see you did "sh -x", read on.. >> >>> >>> " also, confirm that management server can talk to VR on POD IP >>> (management) on port 3922.." >>> It appears this is not an issue; see below: >> >> 3922 from MS to VR - this is the SSH daemon on VR with private key >> 8250 from VR to MS - cloudstack java agent on VR talking to MS >> >> >>> >>> root@r-4-VM:~# telnet 10.70.110.101 8250 >>> Trying 10.70.110.101... >>> Connected to 10.70.110.101. >>> Escape character is '^]'. >>> >> >> >>> ### ROUTE VM DEBUG ### >>> >>> Here is what I found with router VM gets deployed (please tell me if >> anything seems off): >>> 2 NICs; only one NIC gets an IP address. CloudStack NIC1 shows an IP >> address coming from the defaultGuestNetwork. NIC2 is traffic type Control >> but has an IP address of 0.0.0.0 >> >> It is an issue for concern to see 0.0.0.0 assigned to eth1 >> >> Lets assume NIC1 (as eth0) and NIC2 (as eth1). >> >> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. This >> IP should be coming from the POD network range -> when you added a pod - >> i assume you did it as part of Add Zone wizard... >> >> To see the PODIP range, goto UI >> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1 >> (assume you did not create anything special), Management, IP Ranges -> >> you should see a range defined there and it should not be 0.0.0.0... >> >>> From the CloudStack management server, I cannot SSH into the router VM >> on NIC1. I've found this is because of iptables rules on the router VM. If >> I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH >> into the router VM using the SSH key at port 3922. >>> The service "cloud" is in a failed state. Looking at the cloud init >> script, I see the following: >>> >>> CMDLINE=$(cat /var/cache/cloud/cmdline) >>> >>> TYPE="router" >>> for i in $CMDLINE >>> do >>> # search for foo=bar pattern and cut out foo >>> FIRSTPATTERN=$(echo $i | cut -d= -f1) >>> case $FIRSTPATTERN in >>> type) >>> TYPE=$(echo $i | cut -d= -f2) >>> ;; >>> esac >>> done >>> >>> The file cat /var/cache/cloud/cmdline exist; here are the contents: >>> >>> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 >> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 >> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr= >> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr >> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 >> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ >> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ >> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 >>> >> >> >> You can also try updating your /var/cache/cloud/cmdline with proper >> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under >> Infrastructure, Routers, r-4, Nics and look for control nic.. >> >> Then try starting the cloud service.. >> >> Also, did you enable baremetal support? can you deploy a zone without >> baremetal support? Perhaps there is a bug on how IPs are assigned to >> eth1 (control nic)... >> >> >>> The previous code suggests that the value of TYPE starts as router but >> will get set to dhcpsrvr, as indicated by the contents of >> /var/cache/cloud/cmdline. Is this normal? >>> Further down the script, I see: >>> >>> CLOUDSTACK_HOME="/usr/local/cloud" >> <----------------------------------------Exists >>> if [ -f $CLOUDSTACK_HOME/systemvm/utils.sh ]; >> <----------------------------------------Does not exist. Seems odd! >>> then >>> . $CLOUDSTACK_HOME/systemvm/utils.sh >>> else >>> _failure >>> fi >>> >>> # mkdir -p /var/log/vmops >>> >>> start() { >>> local pid=$(get_pids) >>> if [ "$pid" != "" ]; then >>> echo "CloudStack cloud sevice is already running, PID = $pid" >>> return 0 >>> fi >>> >>> echo -n "Starting CloudStack cloud service (type=$TYPE) " >>> if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ]; >> <------------------------------------------------------Does not exist. >> Seems odd! >>> then >>> if [ "$pid" == "" ] >>> then >>> (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & ) >>> pid=$(get_pids) >>> echo $pid > /var/run/cloud.pid >>> fi >>> _success >>> else >>> _failure >>> fi >>> echo >>> echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request >>> } >>> >>> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder >> exists; however, the script then looks for the file >> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks >> is supposed to start the script run.sh but that also doesn't exist. This >> seems like a problem to me. >>> Here you can see step through when I try to start the cloud service: >>> >>> sh -x /etc/init.d/cloud start >>> + ENABLED=0 >>> + [ -e /etc/default/cloud ] >>> + . /etc/default/cloud >>> + ENABLED=0 >>> + cat /var/cache/cloud/cmdline >>> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75 >> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com >> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr= >> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr >> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 >> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ >> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ >> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 >>> + [ ! -z ] >>> + LOG_FILE=/dev/null >>> + TYPE=router >>> + cut -d= -f1 >>> + echo template=domP >>> + FIRSTPATTERN=template >>> + cut -d= -f1 >>> + echo name=r-4-VM >>> + FIRSTPATTERN=name >>> + cut -d= -f1 >>> + echo eth0ip=10.70.116.75 >>> + FIRSTPATTERN=eth0ip >>> + cut -d= -f1 >>> + echo eth0mask=255.255.255.0 >>> + FIRSTPATTERN=eth0mask >>> + cut -d= -f1 >>> + echo gateway=10.70.116.1 >>> + FIRSTPATTERN=gateway >>> + cut -d= -f1 >>> + echo domain=vit.vertitechit.com >>> + FIRSTPATTERN=domain >>> + cut -d= -f1 >>> + echo cidrsize=24 >>> + FIRSTPATTERN=cidrsize >>> + cut -d= -f1 >>> + echo dhcprange=10.70.116.1 >>> + FIRSTPATTERN=dhcprange >>> + cut -d= -f1 >>> + echo eth1ip=0.0.0.0 >>> + FIRSTPATTERN=eth1ip >>> + cut -d= -f1 >>> + echo eth1mask=0.0.0.0 >>> + FIRSTPATTERN=eth1mask >>> + cut -d= -f1 >>> + echo mgmtcidr=10.70.110.0/24 >>> + FIRSTPATTERN=mgmtcidr >>> + cut -d= -f1 >>> + echo localgw=10.70.116.1 >>> + FIRSTPATTERN=localgw >>> + cut -d= -f1 >>> + echo sshonguest=true >>> + FIRSTPATTERN=sshonguest >>> + cut -d= -f1 >>> + echo type=dhcpsrvr >>> + FIRSTPATTERN=type >>> + cut -d= -f2 >>> + echo type=dhcpsrvr >>> + TYPE=dhcpsrvr >>> + cut -d= -f1 >>> + echo disable_rp_filter=true >>> + FIRSTPATTERN=disable_rp_filter >>> + cut -d= -f1 >>> + echo extra_pubnics=2 >>> + FIRSTPATTERN=extra_pubnics >>> + cut -d= -f1 >>> + echo dns1=10.70.10.21 >>> + FIRSTPATTERN=dns1 >>> + cut -d= -f1 >>> + echo >> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ >>> + FIRSTPATTERN=baremetalnotificationsecuritykey >>> + cut -d= -f1 >>> + echo >> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ >>> + FIRSTPATTERN=baremetalnotificationapikey >>> + cut -d= -f1 >>> + echo host=10.70.110.101 >>> + FIRSTPATTERN=host >>> + cut -d= -f1 >>> + echo port=8080 >>> + FIRSTPATTERN=port >>> + cut -d= -f1 >>> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 >>> + FIRSTPATTERN=nic_macs >>> + [ -f /etc/init.d/functions ] >>> + [ -f ./lib/lsb/init-functions ] >>> + RETVAL=0 >>> + CLOUDSTACK_HOME=/usr/local/cloud >>> + [ -f /usr/local/cloud/systemvm/utils.sh ] >>> + _failure >>> + [ -f /etc/init.d/functions ] >>> + echo Failed >>> Failed >>> + [ 0 != 0 ] >>> + exit 0 >>> >>> Thoughts? >>> >>> Jacob Seeley >>> Sr. Infrastructure Engineer >>> VertitechIT >>> 413-268-1631 >>> >>> www.vertitechit.com >>> >>> -----Original Message----- >>> From: ilya [mailto:ilya.mailing.li...@gmail.com] >>> Sent: Wednesday, July 27, 2016 8:43 PM >>> To: users@cloudstack.apache.org >>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting >>> >>> Hi Jacob >>> >>> I gave this a second read - if your issue is Router VM in starting mode >>> - but not started - it means cloudstack agent on routerVM cannot talk to >> management server on 8250 over POD network. >>> >>> Another reason would be an issue of hypervisor accessing the NFS mount >> used for secondary storage. >>> >>> Use console of vCenter to see what is happening on router vm. You can >> login locally with root/password and see the content of /var/log/cloud.out >> file, paste it on pastebin - if it makes no sense to you... >>> >>> you can also run /etc/init.d/cloud stop and start.. that will give you a >> fresh start on logs.. >>> >>> also, confirm that management server can talk to VR on POD IP >>> (management) on port 3922.. >>> >>> Regards >>> ilya >>> >>> On 7/27/16 9:34 AM, Jacob Seeley wrote: >>>> ilya, >>>> >>>> Here are the contents of the secondary storage: >>>> >>>> . >>>> ./template >>>> ./template/tmpl >>>> ./template/tmpl/1 >>>> ./template/tmpl/1/8 >>>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova >>>> ./template/tmpl/1/8/template.properties >>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw >>>> are.ovf >>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw >>>> are-disk3.vmdk >>>> ./template/tmpl/1/7 >>>> ./template/tmpl/1/7/template.properties >>>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova >>>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf >>>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk >>>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf >>>> ./systemvm >>>> ./systemvm/systemvm-4.8.0.1.iso >>>> ./systemvm/.lck-bf162a0100000000 >>>> ./snapshots >>>> ./volumes >>>> >>>> I've noticed that both the Secondary Storage VM and Console Proxy VM >> mount this ISO and as stated before, they come up just fine. >>>> >>>> Regards, >>>> >>>> Jacob Seeley >>>> Sr. Infrastructure Engineer >>>> VertitechIT >>>> 413-268-1631 >>>> >>>> www.vertitechit.com >>>> >>>> -----Original Message----- >>>> From: ilya [mailto:ilya.mailing.li...@gmail.com] >>>> Sent: Wednesday, July 27, 2016 3:22 AM >>>> To: users@cloudstack.apache.org >>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting >>>> >>>> Jacob >>>> >>>> The upgrade usually occurs though systemvm.iso - that is generated by >> cloudstack on the first start. >>>> >>>> Please show the content of your secondary store specifically >>>> >>>> /mnt/[secondary-storage]/systemvm >>>> >>>> Regards >>>> ilya >>>> >>>> On 7/25/16 11:19 AM, Jacob Seeley wrote: >>>>> Here is a pastebin snippet the management-server.log - >>>>> http://pastebin.com/GCLm53Gz >>>>> >>>>> Hopefully the relevant data is in there. >>>>> >>>>> I made sure to start from scratch for this example. Everything from >> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is >> fresh. I deployed a new instance in CloudStack, a VM internally named >> i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to >> deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79. >>>>> >>>>> Thank you, >>>>> >>>>> Jacob Seeley >>>>> Sr. Infrastructure Engineer >>>>> VertitechIT >>>>> 413-268-1631 >>>>> >>>>> www.vertitechit.com >>>>> >>>>> -----Original Message----- >>>>> From: Suresh Sadhu [mailto:suresh.sa...@accelerite.com] >>>>> Sent: Monday, July 25, 2016 1:37 AM >>>>> To: users@cloudstack.apache.org >>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting >>>>> >>>>> please upload the logs in the issue. >>>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <darrentang...@gmail.com> >> wrote: >>>>>> >>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144 >>>>>> >>>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <glenn.wag...@shapeblue.com>: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> What template are you using to start your first VM? - the default >>>>>>> vmware template? >>>>>>> If you look in vcenter , what does the console show you ? >>>>>>> >>>>>>> >>>>>>> Glenn >>>>>>> >>>>>>> >>>>>>> >>>>>>> glenn.wag...@shapeblue.com >>>>>>> www.shapeblue.com >>>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town >>>>>>> 7130South Africa @shapeblue >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Pascal R. [mailto:repa...@gmail.com] >>>>>>> Sent: Monday, 04 July 2016 1:26 PM >>>>>>> To: users@cloudstack.apache.org >>>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting >>>>>>> >>>>>>> hi, >>>>>>> >>>>>>> we have a CS4.8 deployment with VMWare 5.5. >>>>>>> >>>>>>> When trying to launch the first VM, the VS is created. VS starts >>>>>>> up, but in CS, it stuck with "starting" state. >>>>>>> >>>>>>> i can't find any usefull information in the logs. >>>>>>> >>>>>>> any hint? >>>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> DISCLAIMER >>>>> ========== >>>>> This e-mail may contain privileged and confidential information which >> is the property of Accelerite, a Persistent Systems business. It is >> intended only for the use of the individual or entity to which it is >> addressed. If you are not the intended recipient, you are not authorized to >> read, retain, copy, print, distribute or use this message. If you have >> received this communication in error, please notify the sender and delete >> all copies of this message. Accelerite, a Persistent Systems business does >> not accept any liability for virus infected mails. >>>>> >> >