Hi Yiping, do you have snapshots enabled on the NetApp filer? (it used to be seen as a ".snapshot" subdirectory in each directory)
If so try disabling snapshots - there used to be a bug where the .snapshot directory would confuse CloudStack. paul.an...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue -----Original Message----- From: Yiping Zhang <yipzh...@adobe.com.INVALID> Sent: 05 June 2019 23:38 To: users@cloudstack.apache.org Subject: Re: Can't start systemVM in a new advanced zone deployment Hi, Sergey: I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, so I was looking at wrong time periods earlier). I have uploaded more logs into pastebin. From these log entries, it appears that when copying template to VM, it tried to open destination VMDK file and got error file not found. In case that the CloudStack attempted to create a systemVM, the destination VMDK file path it is looking for is "<datastore>/<disk-name>/<disk-name>.vmdk", see uploaded log at https://pastebin.com/aFysZkTy In case when I manually created new VM from a (different) template in vCenter UI, the destination VMDK file path it is looking for is "<datastore>/<VM-NAME>/<VM-NAME>.vmdk", see uploaded log at https://pastebin.com/yHcsD8xB So, I am confused as to how the path for destination VMDK was determined and by CloudStack or VMware, how did I end up with this? Yiping On 6/5/19, 12:32 PM, "Sergey Levitskiy" <serg...@hotmail.com> wrote: Some operations log get transferred to vCenter log vpxd.log. It is not straightforward to trace I but Vmware will be able to help should you open case with them. On 6/5/19, 11:39 AM, "Yiping Zhang" <yipzh...@adobe.com.INVALID> wrote: Hi, Sergey: During the time period when I had problem cloning template, there are only a few unique entries in vmkernel.log, and they were repeated hundreds/thousands of times by all the cpu cores: 2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 0x5, world: 8491029 [ams-main]): Busy 2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0 2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev "naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local disk on this host. Yiping On 6/5/19, 11:15 AM, "Sergey Levitskiy" <serg...@hotmail.com> wrote: This must be specific to that environment. For a full clone mode ACS simply calls cloneVMTask of vSphere API so basically until cloning of that template succeeds when attmepted in vSphere client it would keep failing in ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001? On 6/5/19, 8:47 AM, "Yiping Zhang" <yipzh...@adobe.com.INVALID> wrote: Well, I can always reproduce it in this particular vSphere set up, but in a different ACS+vSphere environment, I don't see this problem. Yiping On 6/5/19, 1:00 AM, "Andrija Panic" <andrija.pa...@gmail.com> wrote: Yiping, if you are sure you can reproduce the issue, it would be good to raise a GitHub issue and provide as much detail as possible. Andrija On Wed, 5 Jun 2019 at 05:29, Yiping Zhang <yipzh...@adobe.com.invalid> wrote: > Hi, Sergey: > > Thanks for the tip. After setting vmware.create.full.clone=false, I was > able to create and start system VM instances. However, I feel that the > underlying problem still exists, and I am just working around it instead of > fixing it, because in my lab CloudStack instance with the same version of > ACS and vSphere, I still have vmware.create.full.clone=true and all is > working as expected. > > I did some reading on VMware docs regarding full clone vs. linked clone. > It seems that the best practice is to use full clone for production, > especially if there are high rates of changes to the disks. So > eventually, I need to understand and fix the root cause for this issue. > At least for now, I am over this hurdle and I can move on. > > Thanks again, > > Yiping > > On 6/4/19, 11:13 AM, "Sergey Levitskiy" <serg...@hotmail.com> wrote: > > Everything looks good and consistent including all references in VMDK > and its snapshot. I would try these 2 routes: > 1. Figure out what vSphere error actually means from vmkernel log of > ESX when ACS tries to clone the template. If the same error happens while > doing it outside of ACS then a support case with VMware can be an option > 2. Try using link clones. This can be done by this global setting and > restarting management server > vmware.create.full.clone false > > > On 6/4/19, 9:57 AM, "Yiping Zhang" <yipzh...@adobe.com.INVALID> wrote: > > Hi, Sergey: > > Thanks for the help. By now, I have dropped and recreated DB, > re-deployed this zone multiple times, blow away primary and secondary > storage (including all contents on them) , or just delete template itself > from primary storage, multiple times. Every time I ended up with the same > error at the same place. > > The full management server log, from the point I seeded the > systemvmtemplate for vmware, to deploying a new advanced zone and enable > the zone to let CS to create system VM's and finally disable the zone to > stop infinite loop of trying to recreate failed system VM's, are posted > at pastebin: > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3R&data=02%7C01%7Cyipzhang%40adobe.com%7C5ca1ec6f2130421b49a308d6e9ec7cbb%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636953599284884679&sdata=0DkDZZOatpBC%2BtUwau%2BJY6rt7PW%2BHzM2Ns2KpFh4%2BcM%3D&reserved=0 > > Here are the content of relevant files for the template on primary > storage: > > 1) /vmfsvolumes: > > ls -l /vmfs/volumes/ > total 2052 > drwxr-xr-x 1 root root 8 Jan 1 1970 > 414f6a73-87cd6dac-9585-133ddd409762 > lrwxr-xr-x 1 root root 17 Jun 4 16:37 > 42054b8459633172be231d72a52d59d4 -> afc5e946-03bfe3c2 <== this is > the NFS datastore for primary storage > drwxr-xr-x 1 root root 8 Jan 1 1970 > 5cd4b46b-fa4fcff0-d2a1-00215a9b31c0 > drwxr-xr-t 1 root root 1400 Jun 3 22:50 > 5cd4b471-c2318b91-8fb2-00215a9b31c0 > drwxr-xr-x 1 root root 8 Jan 1 1970 > 5cd4b471-da49a95b-bdb6-00215a9b31c0 > drwxr-xr-x 4 root root 4096 Jun 3 23:38 > afc5e946-03bfe3c2 > drwxr-xr-x 1 root root 8 Jan 1 1970 > b70c377c-54a9d28a-6a7b-3f462a475f73 > > 2) content in template dir on primary storage: > > ls -l > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/ > total 1154596 > -rw------- 1 root root 8192 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-000001-delta.vmdk > -rw------- 1 root root 366 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > -rw-r--r-- 1 root root 268 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog > -rw------- 1 root root 9711 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn > -rw------- 1 root root 2097152000 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk > -rw------- 1 root root 518 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a.vmdk > -rw-r--r-- 1 root root 471 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a.vmsd > -rwxr-xr-x 1 root root 1402 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a.vmtx > > 3) *.vmdk file content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmdk > # Disk DescriptorFile > version=1 > encoding="UTF-8" > CID=ecb01275 > parentCID=ffffffff > isNativeSnapshot="no" > createType="vmfs" > > # Extent description > RW 4096000 VMFS "533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk" > > # The Disk Data Base > #DDB > > ddb.adapterType = "lsilogic" > ddb.geometry.cylinders = "4063" > ddb.geometry.heads = "16" > ddb.geometry.sectors = "63" > ddb.longContentID = "1c60ba48999abde959998f05ecb01275" > ddb.thinProvisioned = "1" > ddb.uuid = "60 00 C2 9b 52 6d 98 c4-1f 44 51 ce 1e 70 a9 70" > ddb.virtualHWVersion = "13" > > 4) *-0001.vmdk content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > > # Disk DescriptorFile > version=1 > encoding="UTF-8" > CID=ecb01275 > parentCID=ecb01275 > isNativeSnapshot="no" > createType="vmfsSparse" > parentFileNameHint="533b6fcf3fa6301aadcc2b168f3f999a.vmdk" > # Extent description > RW 4096000 VMFSSPARSE > "533b6fcf3fa6301aadcc2b168f3f999a-000001-delta.vmdk" > > # The Disk Data Base > #DDB > > ddb.longContentID = "1c60ba48999abde959998f05ecb01275" > > > 5) *.vmtx content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmtx > > .encoding = "UTF-8" > config.version = "8" > virtualHW.version = "8" > nvram = "533b6fcf3fa6301aadcc2b168f3f999a.nvram" > pciBridge0.present = "TRUE" > svga.present = "TRUE" > pciBridge4.present = "TRUE" > pciBridge4.virtualDev = "pcieRootPort" > pciBridge4.functions = "8" > pciBridge5.present = "TRUE" > pciBridge5.virtualDev = "pcieRootPort" > pciBridge5.functions = "8" > pciBridge6.present = "TRUE" > pciBridge6.virtualDev = "pcieRootPort" > pciBridge6.functions = "8" > pciBridge7.present = "TRUE" > pciBridge7.virtualDev = "pcieRootPort" > pciBridge7.functions = "8" > vmci0.present = "TRUE" > hpet0.present = "TRUE" > floppy0.present = "FALSE" > memSize = "256" > scsi0.virtualDev = "lsilogic" > scsi0.present = "TRUE" > ide0:0.startConnected = "FALSE" > ide0:0.deviceType = "atapi-cdrom" > ide0:0.fileName = "CD/DVD drive 0" > ide0:0.present = "TRUE" > scsi0:0.deviceType = "scsi-hardDisk" > scsi0:0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk" > scsi0:0.present = "TRUE" > displayName = "533b6fcf3fa6301aadcc2b168f3f999a" > annotation = "systemvmtemplate-4.11.2.0-vmware" > guestOS = "otherlinux-64" > toolScripts.afterPowerOn = "TRUE" > toolScripts.afterResume = "TRUE" > toolScripts.beforeSuspend = "TRUE" > toolScripts.beforePowerOff = "TRUE" > uuid.bios = "42 02 f1 40 33 e8 de e5-1a c5 93 2a c9 12 47 61" > vc.uuid = "50 02 5b d9 e9 c9 77 86-28 3e 84 00 22 2b eb d3" > firmware = "bios" > migrate.hostLog = "533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog" > > > 6) *.vmsd file content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmsd > .encoding = "UTF-8" > snapshot.lastUID = "1" > snapshot.current = "1" > snapshot0.uid = "1" > snapshot0.filename = > "533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn" > snapshot0.displayName = "cloud.template.base" > snapshot0.description = "Base snapshot" > snapshot0.createTimeHigh = "363123" > snapshot0.createTimeLow = "-679076964" > snapshot0.numDisks = "1" > snapshot0.disk0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a.vmdk" > snapshot0.disk0.node = "scsi0:0" > snapshot.numSnapshots = "1" > > 7) *-Snapshot1.vmsn content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn > > ҾSnapshot\?%?cfgFilet%t%.encoding = "UTF-8" > config.version = "8" > virtualHW.version = "8" > nvram = "533b6fcf3fa6301aadcc2b168f3f999a.nvram" > pciBridge0.present = "TRUE" > svga.present = "TRUE" > pciBridge4.present = "TRUE" > pciBridge4.virtualDev = "pcieRootPort" > pciBridge4.functions = "8" > pciBridge5.present = "TRUE" > pciBridge5.virtualDev = "pcieRootPort" > pciBridge5.functions = "8" > pciBridge6.present = "TRUE" > pciBridge6.virtualDev = "pcieRootPort" > pciBridge6.functions = "8" > pciBridge7.present = "TRUE" > pciBridge7.virtualDev = "pcieRootPort" > pciBridge7.functions = "8" > vmci0.present = "TRUE" > hpet0.present = "TRUE" > floppy0.present = "FALSE" > memSize = "256" > scsi0.virtualDev = "lsilogic" > scsi0.present = "TRUE" > ide0:0.startConnected = "FALSE" > ide0:0.deviceType = "atapi-cdrom" > ide0:0.fileName = "CD/DVD drive 0" > ide0:0.present = "TRUE" > scsi0:0.deviceType = "scsi-hardDisk" > scsi0:0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a.vmdk" > scsi0:0.present = "TRUE" > displayName = "533b6fcf3fa6301aadcc2b168f3f999a" > annotation = "systemvmtemplate-4.11.2.0-vmware" > guestOS = "otherlinux-64" > toolScripts.afterPowerOn = "TRUE" > toolScripts.afterResume = "TRUE" > toolScripts.beforeSuspend = "TRUE" > toolScripts.beforePowerOff = "TRUE" > uuid.bios = "42 02 f1 40 33 e8 de e5-1a c5 93 2a c9 12 47 61" > vc.uuid = "50 02 5b d9 e9 c9 77 86-28 3e 84 00 22 2b eb d3" > firmware = "bios" > migrate.hostLog = "533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog" > > > ------------ > > That's all the data on the template VMDK. > > Much appreciate your time! > > Yiping > > > > On 6/4/19, 9:29 AM, "Sergey Levitskiy" <serg...@hotmail.com> > wrote: > > Have you tried deleting template from PS and let ACS to recopy > it again? If the issue is reproducible we can try to look what is wrong > with VMDK. Please post content of 533b6fcf3fa6301aadcc2b168f3f999a.vmdk , > 533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk and > 533b6fcf3fa6301aadcc2b168f3f999a.vmx (their equitant after ACS finishes > copying template). Also from one of your ESX hosts output of this > ls -al /vmfs/volumes > ls -al /vmfs/volumes/*/533b6fcf3fa6301aadcc2b168f3f999a (their > equitant after ACS finishes copying template) > > Can you also post management server log starting from the > point you unregister and delete template from the vCenter. > > On 6/4/19, 8:37 AM, "Yiping Zhang" <yipzh...@adobe.com.INVALID> > wrote: > > I have manually imported the OVA to vCenter and > successfully cloned a VM instance with it, on the same NFS datastore. > > > On 6/4/19, 8:25 AM, "Sergey Levitskiy" < > serg...@hotmail.com> wrote: > > I would suspect the template is corrupted on the > secondary storage. You can try disabling/enabling link clone feature and > see if it works the other way. > vmware.create.full.clone false > > Also systemVM template might have been generated on a > newer version of vSphere and not compatible with ESXi 6.5. What you can do > to validate this is to manually deploy OVA that is in Secondary storage and > try to spin up VM from it directly in vCenter. > > > > On 6/3/19, 5:41 PM, "Yiping Zhang" > <yipzh...@adobe.com.INVALID> wrote: > > Hi, list: > > I am struggling with deploying a new advanced zone > using ACS 4.11.2.0 + vSphere 6.5 + NetApp volumes for primary and secondary > storage devices. The initial setup of CS management server, seeding of > systemVM template, and advanced zone deployment all went smoothly. > > Once I enabled the zone in web UI and the systemVM > template gets copied/staged on to primary storage device. But subsequent VM > creations from this template would fail with errors: > > > 2019-06-03 18:38:15,764 INFO [c.c.h.v.m.HostMO] > (DirectAgent-7:ctx-d01169cb esx-0001-a-001.example.org, job-3/job-29, > cmd: CopyCommand) VM 533b6fcf3fa6301aadcc2b168f3f999a not found in host > cache > > 2019-06-03 18:38:17,017 INFO > [c.c.h.v.r.VmwareResource] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) > VmwareStorageProcessor and VmwareStorageSubsystemCommandHandler > successfully reconfigured > > 2019-06-03 18:38:17,128 INFO > [c.c.s.r.VmwareStorageProcessor] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) creating full > clone from template > > 2019-06-03 18:38:17,657 INFO > [c.c.h.v.u.VmwareHelper] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) > [ignored]failed toi get message for exception: Error caused by file > /vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > > 2019-06-03 18:38:17,658 ERROR > [c.c.s.r.VmwareStorageProcessor] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) clone volume > from base image failed due to Exception: java.lang.RuntimeException > > Message: Error caused by file > /vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > > > > If I try to create “new VM from template” > (533b6fcf3fa6301aadcc2b168f3f999a) on vCenter UI manually, I will receive > exactly the same error message. The name of the VMDK file in the error > message is a snapshot of the base disk image, but it is not part of the > original template OVA on the secondary storage. So, in the process of > copying the template from secondary to primary storage, a snapshot got > created and the disk became corrupted/unusable. > > Much later in the log file, there is another > error message “failed to fetch any free public IP address” (for ssvm, I > think). I don’t know if these two errors are related or if one is the root > cause for the other error. > > The full management server log is uploaded as > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3R&data=02%7C01%7Cyipzhang%40adobe.com%7C5ca1ec6f2130421b49a308d6e9ec7cbb%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636953599284884679&sdata=0DkDZZOatpBC%2BtUwau%2BJY6rt7PW%2BHzM2Ns2KpFh4%2BcM%3D&reserved=0 > > Any help or insight on what went wrong here are > much appreciated. > > Thanks > > Yiping > > > > > > > > > > > > > -- Andrija Panić