Hi, I have performed an upgrade on one of my other production environments and have three XS7.0 clusters with latest hotfixes. I have performed an upgrade from 4.11.3 to 4.13.1. The systemVM deployment in one cluster is successful but getting failed in another cluster.
https://drive.google.com/file/d/1eBD20a1WJfe_eYCaTxM5JneMBnnebkd1/view?usp=sharing You can review logs from above link. Below are the systemVMs that have failed to make agent up, the XS tools was installed successfully in it. 2020-09-15 12:37:52,415 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-2:ctx-e478e44a job-13124/job-13131 ctx-7af48048) (logid:f96bf120) Failed to setup keystore and generate CSR for system vm: v-753-VM 2020-09-15 12:59:27,658 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-15:ctx-5a06ea58 job-13124/job-13158 ctx-61790402) (logid:f96bf120) Failed to setup keystore and generate CSR for system vm: v-758-VM 2020-09-15 13:06:14,731 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-24:ctx-4f858410 job-13124/job-13174 ctx-d127150a) (logid:f96bf120) Failed to setup keystore and generate CSR for system vm: v-760-VM 2020-09-15 13:21:37,907 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-3:ctx-4c2ba231 job-13175/job-13184 ctx-6228d3cf) (logid:6bd89e55) Failed to setup keystore and generate CSR for system vm: v-761-VM 2020-09-15 13:25:43,696 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-5:ctx-4ecf72a0 job-13175/job-13195 ctx-2aa50a59) (logid:6bd89e55) Failed to setup keystore and generate CSR for system vm: v-761-VM You can check below systemVMs that goes up in cluster and have agent connected. Its a weird problem, I didn't see any error XS logs. v-754-VM s-756-VM Ammad Ali On Fri, Oct 9, 2020 at 12:28 PM Andrija Panic <andrija.pa...@gmail.com> wrote: > This error is a different zone, as I can see (what I shared above). > > For the VM which you asked us to check logs - can you see which zone this > is in and then: > - disable the zone, > - destroy the SSVM (if it is showing as startING or stopING state - make > sure to destroy the VM on XenCenter first, then edit the vm_instances table > to set "state"=Stopped" for that s-25457-VM) - then destroy in ACS > - enable the zone again, it should create a brand new SSVM (check that your > CPUs are not heavily oversubscribed, as, in theory, this could also be an > extremely slow CPU issue) > > If the issue continues, please post the log again (to pastebin or > elsewhere) - and share the new SSVM name that failed to be configured. > -- it's worth digging in XS logs to see if you have some other issues, > which ACS is not aware of. > > > Best, > > On Fri, 9 Oct 2020 at 09:09, Andrija Panic <andrija.pa...@gmail.com> > wrote: > > > Ammad, > > > > what's you current status with your issue? > > > > In logs I can see that there is some issue with SR: > > > > 2020-09-10 22:58:48,374 DEBUG > [c.c.h.x.r.w.x.CitrixStartCommandWrapper] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) 1. The VM s-25505-VM is > in > > Starting state. > > 2020-09-10 22:58:48,375 DEBUG [c.c.h.x.r.CitrixResourceBase] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) no guest OS type, start > it > > as HVM guest > > 2020-09-10 22:58:48,390 DEBUG [c.c.h.x.r.CitrixResourceBase] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) Created VM > > 3fa168bd-9e29-03df-52bb-a0fe2a53d390 for s-25505-VM > > 2020-09-10 22:58:48,394 DEBUG [c.c.h.x.r.CitrixResourceBase] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) PV args are > > > %template=domP%type=secstorage%host=172.16.2.42%port=8250%name=s-25505-VM%zone=10%pod=10%guid=s-25505-VM%workers=5%resource=com.cloud.storage.resource.PremiumSecondaryStorageResource%instance=SecStorage%sslcopy=false%role=templateProcessor%mtu=1500%eth2ip=175.107.206.196%eth2mask=255.255.255.0%gateway=175.107.206.1%public.network.device=eth2%eth0ip=169.254.1.184%eth0mask=255.255.0.0%eth1ip=172.16.2.56%eth1mask=255.255.255.192%mgmtcidr= > > > 172.16.0.0/26%localgw=172.16.2.62%private.network.device=eth1%internaldns1=202.163.96.3%internaldns2=202.163.96.4%dns1=202.163.96.3%dns2=202.163.96.4%nfsVersion=null > > 2020-09-10 22:58:48,396 DEBUG [c.c.h.x.r.CitrixResourceBase] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) HVM args are > template=domP > > type=secstorage host=172.16.2.42 port=8250 name=s-25505-VM zone=10 pod=10 > > guid=s-25505-VM workers=5 > > resource=com.cloud.storage.resource.PremiumSecondaryStorageResource > > instance=SecStorage sslcopy=false role=templateProcessor mtu=1500 > > eth2ip=175.107.206.196 eth2mask=255.255.255.0 gateway=175.107.206.1 > > public.network.device=eth2 eth0ip=169.254.1.184 eth0mask=255.255.0.0 > > eth1ip=172.16.2.56 eth1mask=255.255.255.192 mgmtcidr=172.16.0.0/26 > > localgw=172.16.2.62 private.network.device=eth1 internaldns1=202.163.96.3 > > internaldns2=202.163.96.4 dns1=202.163.96.3 dns2=202.163.96.4 > > nfsVersion=null > > 2020-09-10 22:58:48,399 DEBUG [c.c.h.x.r.CitrixResourceBase] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) Failed to find SR by name > > 'XenServer Tools', will try to find 'XCP-ng Tools' SR > > 2020-09-10 22:58:48,400 WARN > [c.c.h.x.r.w.x.CitrixStartCommandWrapper] > > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) Catch Exception: class > > com.cloud.utils.exception.CloudRuntimeException due to > > com.cloud.utils.exception.CloudRuntimeException: There are 0 SRs with > name > > XenServer Tools > > com.cloud.utils.exception.CloudRuntimeException: There are 0 SRs with > name > > XenServer Tools > > at > > > com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.createPatchVbd(CitrixResourceBase.java:1061) > > > > Your SSVM can not even start, leave alone trying to access it etc. > > Also, after the upgrade, I saw that the public SSH key was successfully > > injected into the systemvm.iso, and that ISO is copied to all hosts, as > > well as the current id_rsa private key - so assuming your SSVM starts > > sucessfully, you should be able to access it, as a) you have the id_rsa > on > > each XS host, and b) systemvm.iso does contain a good public_key(inside > the > > authorized_keys file) - so assuming netwotk is UP (169.254...), you > should > > be able to access it via ssh on port 3922. > > > > Best, > > > > > > > > On Mon, 14 Sep 2020 at 11:39, Ammad Syed <syedamma...@gmail.com> wrote: > > > >> You guys can check job-1248274 logs or SSVM s-25457-VM that has > >> successfully connected agent state. > >> > >> Ammad Ali > >> > >> On Mon, Sep 14, 2020 at 2:33 PM Ammad Syed <syedamma...@gmail.com> > wrote: > >> > >> > Here are the upgrade management server logs. > >> > > >> > > >> > > >> > https://drive.google.com/file/d/17cxh7f-24ibnXCKvUUPqt3p1YTzlMQN8/view?usp=sharing > >> > > >> > Ammad Ali > >> > > >> > On Mon, Sep 14, 2020 at 2:17 PM Ammad Syed <syedamma...@gmail.com> > >> wrote: > >> > > >> >> In addition to previous email there is only one host in one zone > where > >> >> systemVM agent goes up and on all other hosts on that zone agent > >> failed. > >> >> > >> >> If you guys need, I can provide management server logs as well. > >> >> > >> >> Also is there a way to enable debugging in ACS logs to specifically > >> find > >> >> out where the problem is? > >> >> > >> >> Ammad Ali > >> >> > >> >> On Mon, Sep 14, 2020 at 10:39 AM Ammad Syed <syedamma...@gmail.com> > >> >> wrote: > >> >> > >> >>> Hi Perl, > >> >>> > >> >>> I have taken those steps and verified that systemvm.iso is copied to > >> all > >> >>> hosts in all zones. > >> >>> > >> >>> I have recreated the systemvm and ssh to that host and checked the > >> >>> md5sum of iso there and on acs. Both were same. > >> >>> > >> >>> However the md5sum on which systemvm was working has also same > md5sum > >> of > >> >>> systemvm iso. The iso is getting copied successfully. The problem > >> looks > >> >>> somewhere else. > >> >>> > >> >>> I have checked in xenserver logs as well but didn’t find any logs > that > >> >>> something has failed. > >> >>> > >> >>> Ammad > >> >>> Sent from my iPhone > >> >>> > >> >>> > On 14-Sep-2020, at 9:15 AM, Pearl d'Silva < > >> pearl.dsi...@shapeblue.com> > >> >>> wrote: > >> >>> > > >> >>> > Hi Ammad, > >> >>> > > >> >>> > Is the understanding right that the steps as mentioned by you in > the > >> >>> previous mail has in-fact worked on one zone? If that's the case, > >> could you > >> >>> please ensure that all the hosts in all the other zones have the new > >> >>> systemVM iso copied into them by checking the timestamps as well and > >> >>> comparing the checksums against the iso on the management server, so > >> that > >> >>> when the system VM's are recreated, they pick up the new iso. > >> >>> > > >> >>> > Thanks, > >> >>> > Pearl > >> >>> > > >> >>> > ________________________________ > >> >>> > From: Ammad Syed <syedamma...@gmail.com> > >> >>> > Sent: Sunday, September 13, 2020 12:49 PM > >> >>> > To: users@cloudstack.apache.org <users@cloudstack.apache.org> > >> >>> > Subject: Re: Cloudstack 4.11.3 to 4.13.1 SystemVMs Error > >> >>> > > >> >>> > Hi Andrija, > >> >>> > > >> >>> > Here is the permission of mount point in SSVM. > >> >>> > > >> >>> > root@s-25437-VM:~# df -h > >> >>> > Filesystem Size Used Avail Use% Mounted on > >> >>> > udev 233M 0 233M 0% /dev > >> >>> > tmpfs 98M 19M 80M 19% /run > >> >>> > /dev/xvda5 1.1G 773M 282M 74% / > >> >>> > tmpfs 244M 0 244M 0% /dev/shm > >> >>> > tmpfs 5.0M 0 5.0M 0% /run/lock > >> >>> > tmpfs 244M 0 244M 0% /sys/fs/cgroup > >> >>> > /dev/xvda1 92M 35M 57M 39% /boot > >> >>> > /dev/xvda6 435M 21M 410M 5% /var > >> >>> > /dev/xvda7 75M 1.6M 72M 3% /tmp > >> >>> > 172.16.10.35:/nfs/KHI02 12T 7.0T 5.1T 59% > >> >>> > /mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092 > >> >>> > tmpfs 49M 0 49M 0% /run/user/0 > >> >>> > root@s-25437-VM:~# > >> >>> > root@s-25437-VM:~# > >> >>> > root@s-25437-VM:~# cd > >> >>> /mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092 > >> >>> > root@s-25437-VM > >> :/mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092# > >> >>> > root@s-25437-VM > >> :/mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092# > >> >>> ls > >> >>> > -lah > >> >>> > total 12K > >> >>> > drwxrwxrwx 5 root root 70 Dec 21 2018 . > >> >>> > drwxrwxrwx 3 root root 4.0K Sep 10 23:12 .. > >> >>> > drwxrwxrwx 52 root root 4.0K Aug 13 19:37 snapshots > >> >>> > drwxrwxrwx 3 root root 26 Jun 7 2013 template > >> >>> > drwxrwxrwx 98 root root 4.0K Sep 1 12:52 volumes > >> >>> > > >> >>> > -Ammad Ali > >> >>> > > >> >>> >> On Fri, Sep 11, 2020 at 6:09 PM Andrija Panic < > >> >>> andrija.pa...@gmail.com> > >> >>> >> wrote: > >> >>> >> > >> >>> >> Can you share permissions of your secondary storage (mount it > then > >> ls > >> >>> -lah > >> >>> >> the mount point) > >> >>> >> > >> >>> > > >> >>> > pearl.dsi...@shapeblue.com > >> >>> > www.shapeblue.com > >> >>> > 3 London Bridge Street, 3rd floor, News Building, London SE1 > 9SGUK > >> >>> > @shapeblue > >> >>> > > >> >>> > > >> >>> > > >> >>> >>> On Fri, 11 Sep 2020 at 01:39, Ammad Syed <syedamma...@gmail.com > > > >> >>> wrote: > >> >>> >>> > >> >>> >>> Hi Andrija, > >> >>> >>> > >> >>> >>> I have performed an upgrade on my production system from 4.11.3 > to > >> >>> >> 4.13.1. > >> >>> >>> Even I cleared the tags but the issue is still there. > >> >>> >>> > >> >>> >>> I have four zones and only in one zone and on specific host, the > >> >>> >>> systemVM's agent goes up but on all other zones, key injection > to > >> >>> >> systemVM > >> >>> >>> is still failing on all zones and PODs. I have checked, the > >> updated > >> >>> ISO > >> >>> >> is > >> >>> >>> there on all hosts. The md5sum of systemvm.iso is same on xen > >> hosts > >> >>> and > >> >>> >> acs > >> >>> >>> host. > >> >>> >>> > >> >>> >>> Look like a weird problem. How can I troubleshoot this further ? > >> Any > >> >>> >> advise > >> >>> >>> would be appreciated. > >> >>> >>> > >> >>> >>> -Ammad > >> >>> >>> > >> >>> >> > >> >>> >> > >> >>> >> -- > >> >>> >> > >> >>> >> Andrija Panić > >> >>> >> > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > Regards, > >> >>> > > >> >>> > > >> >>> > Syed Ammad Ali > >> >>> > >> >> > >> >> > >> >> -- > >> >> Regards, > >> >> > >> >> > >> >> Syed Ammad Ali > >> >> > >> > > >> > > >> > -- > >> > Regards, > >> > > >> > > >> > Syed Ammad Ali > >> > > >> > >> > >> -- > >> Regards, > >> > >> > >> Syed Ammad Ali > >> > > > > > > -- > > > > Andrija Panić > > > > > -- > > Andrija Panić > -- Regards, Syed Ammad Ali