Re: [Xen-API] XCP 1.5 lv cleanup not happening

Ryan Farrington Wed, 14 Nov 2012 17:36:26 -0800

Looks like it extended into 1.5 as well.  Guess this is something I will need 
to test on 1.6 and maybe submit a bug report to get it fixed..  I wonder why it 
was added as a restriction.


Ryan Farrington
Sr Systems Engineer
Email [email protected]
Mobile 972.804.6803
RemitDATA.com


This e-mail may contain confidential or privileged information. If you are not 
the intended recipient, please erase this e-mail immediately without reading it 
or sending it to anyone else. I would also appreciate your advising me (by 
return e-mail) if you have received this e-mail by mistake. Thank you for your 
assistance.


________________________________
From: [email protected] [[email protected]] On Behalf 
Of George Shuklin [[email protected]]
Sent: Wednesday, November 14, 2012 4:41 PM
To: [email protected]
Subject: Re: [Xen-API] XCP 1.5 lv cleanup not happening

Yep, XCP 1.1 requirer all hosts to be online to purge VDI's from SR (LVM or 
NFS, does not matter).

Strangely, XCP 0.5 had no that kind of restriction.


On 15.11.2012 02:08, Ryan Farrington wrote:
A special thanks goes out to felipef for all the help today.

History:
                (4) host pool – one in a failed state due to hardware failure
                (1) 3.2T data lun – SR-UUID = 
aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

The issue:
The 3.2T datalun was presenting as 91% utilized and only 33% virtually 
allocated.

Work log:

Results were confirmed via the XC GUI and via the command line as identified 
below
                xe sr-list params=all uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
                                physical-utilisation ( RO): 3170843492352
                                physical-size ( RO): 3457918435328
                                virtual size: 1316940152832
type ( RO): lvmohba
sm-config (MRO): allocation: thick; use_vhd: true

Further digging found that summing all the vdis on the SR resulted in the 
virtual allocation number
                Commands + results:
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a 
params=physical-utilisation --minimal | sed 's/,/ + /g' | bc –l
physical utilization:  1,210,564,214,784
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=virtual-size 
--minimal | sed 's/,/ + /g' | bc –l
                virtual size: 1,316,940,152,832

At this point we started looking at the VG to see if there were some LVs that 
were taking space but not known by the xapi
                Command + result:
                                vgs
                                                VG                              
                                                                                
#PV #LV #SN Attr   VSize    VFree
VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a   1  33   0 wz--n-    3.14T 
267.36G

(lvs --units B | grep aa15042e | while read vg lv flags size; do echo -n "$size 
+" | sed 's/B//g'; done; echo 0)| bc -l
                                                3170843492352

So at this point we have confirmed that there are in fact lvs not accounted for 
by xapi. So we look for them
lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do [ "$(xe 
vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done
                This returned a long list of UUIDs that did not have a matching 
entry in xapi

Grabbing one of the UUIDs at random and searching back in the xensource.log we 
find something strange
                [20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388 
inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.exec 
exception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error in Metadata 
volume operation for SR. [opterr=VDI delete operation failed for parameters: 
/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT, 
c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file with params 
[3, 0, 512, 512]. Error: Input/output error];  ]

After a little googling around and finally finding a thread on the citrix 
forums (http://forums.citrix.com/thread.jspa?threadID=299275) that pointed me 
at a process to rebuild the metadata for that specific SR without having to 
blow away the SR and start fresh.
                Commands
lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT 
/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT
xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

This got rid of the SR_backend errors but the LVs continued to persist.  
Started looking in the SMlog started seeing lines that pointed at the pool not 
being ready and exiting
                <25168> 2012-11-14 12:27:24.195463      Pool is not ready, 
exiting

At this point I manually forced the offline node out of the pool and the SMlog 
reported a success in the purge process.
                xe host-forget uuid=<down host>




_______________________________________________
Xen-api mailing list
[email protected]<mailto:[email protected]>
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Re: [Xen-API] XCP 1.5 lv cleanup not happening

Reply via email to