Re: [Xen-API] XCP 1.5 lv cleanup not happening

George Shuklin Thu, 15 Nov 2012 01:33:44 -0800

This is not a bug, you can see reason in /var/log/SMlog. They just wantto be sure VHD is no using on any host by mistake.

You can reproduce that kind of 'mistake' by running some VM's on slave,stopping xapi on that host, saying 'xe vm-reset-powerstate'. Master willthink slave is down, VM is halted, but actually it will continue tooperate.

On NFS this is not really big deal, but for LVM it can cause corruptionfor some (unrelated VHDs), because VM will continue to write to VHDoutside of actually alocated LE to LV.


On 15.11.2012 05:32, Ryan Farrington wrote:

Looks like it extended into 1.5 as well. Guess this is something Iwill need to test on 1.6 and maybe submit a bug report to get itfixed.. I wonder why it was added as a restriction.
*

**

*
*

**

*
*

*Ryan Farrington*

*
Sr Systems Engineer
*Emai**l *[email protected]
*Mobile *972.804.6803
*RemitDATA.com*
**
*
This e-mail may contain confidential or privileged information. If youare not the intended recipient, please erase this e-mail immediatelywithout reading it or sending it to anyone else. I would alsoappreciate your advising me (by return e-mail) if you have receivedthis e-mail by mistake. Thank you for your assistance.
*
------------------------------------------------------------------------
*From:* [email protected] [[email protected]]On Behalf Of George Shuklin [[email protected]]
*Sent:* Wednesday, November 14, 2012 4:41 PM
*To:* [email protected]
*Subject:* Re: [Xen-API] XCP 1.5 lv cleanup not happening
Yep, XCP 1.1 requirer all hosts to be online to purge VDI's from SR(LVM or NFS, does not matter).
Strangely, XCP 0.5 had no that kind of restriction.


On 15.11.2012 02:08, Ryan Farrington wrote:
A special thanks goes out to felipef for all the help today.

History:
(4) host pool – one in a failed state due to hardwarefailure
(1) 3.2T data lun – SR-UUID =aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
The issue:
The 3.2T datalun was presenting as 91% utilized and only 33%virtually allocated.
Work log:
Results were confirmed via the XC GUI and via the command line asidentified below
xe sr-list params=alluuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
                                physical-utilisation ( RO): 3170843492352

                                physical-size ( RO): 3457918435328

                                virtual size: 1316940152832

type ( RO): lvmohba

sm-config (MRO): allocation: thick; use_vhd: true
Further digging found that summing all the vdis on the SR resulted inthe virtual allocation number
                Commands + results:
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56aparams=physical-utilisation --minimal | sed 's/,/ + /g' | bc –l
physical utilization:  1,210,564,214,784
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56aparams=virtual-size --minimal | sed 's/,/ + /g' | bc –l
                virtual size: 1,316,940,152,832
At this point we started looking at the VG to see if there were someLVs that were taking space but not known by the xapi
                Command + result:

                                vgs
VG#PV #LV#SN Attr VSize VFree
VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a 1 33 0wz--n- 3.14T 267.36G
(lvs --units B | grep aa15042e | while read vg lv flags size; do echo-n "$size +" | sed 's/B//g'; done; echo 0)| bc -l
                                                3170843492352
So at this point we have confirmed that there are in fact lvs notaccounted for by xapi. So we look for them
lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do ["$(xe vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done
This returned a long list of UUIDs that did not havea matching entry in xapi
Grabbing one of the UUIDs at random and searching back in thexensource.log we find something strange
[20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.execexception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error inMetadata volume operation for SR. [opterr=VDI delete operation failedfor parameters:/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT,c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file withparams [3, 0, 512, 512]. Error: Input/output error]; ]
After a little googling around and finally finding a thread on thecitrix forums (http://forums.citrix.com/thread.jspa?threadID=299275)that pointed me at a process to rebuild the metadata for thatspecific SR without having to blow away the SR and start fresh.
                Commands
lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT
xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
This got rid of the SR_backend errors but the LVs continued topersist. Started looking in the SMlog started seeing lines thatpointed at the pool not being ready and exiting
<25168> 2012-11-14 12:27:24.195463      Pool is not ready, exiting
At this point I manually forced the offline node out of the pool andthe SMlog reported a success in the purge process.
                xe host-forget uuid=<down host>



_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Re: [Xen-API] XCP 1.5 lv cleanup not happening

Reply via email to