A special thanks goes out to felipef for all the help today.

History:
                (4) host pool - one in a failed state due to hardware failure
                (1) 3.2T data lun - SR-UUID = 
aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

The issue:
The 3.2T datalun was presenting as 91% utilized and only 33% virtually 
allocated.

Work log:

Results were confirmed via the XC GUI and via the command line as identified 
below
                xe sr-list params=all uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
                                physical-utilisation ( RO): 3170843492352
                                physical-size ( RO): 3457918435328
                                virtual size: 1316940152832
type ( RO): lvmohba
sm-config (MRO): allocation: thick; use_vhd: true

Further digging found that summing all the vdis on the SR resulted in the 
virtual allocation number
                Commands + results:
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a 
params=physical-utilisation --minimal | sed 's/,/ + /g' | bc -l
physical utilization:  1,210,564,214,784
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=virtual-size 
--minimal | sed 's/,/ + /g' | bc -l
                virtual size: 1,316,940,152,832

At this point we started looking at the VG to see if there were some LVs that 
were taking space but not known by the xapi
                Command + result:
                                vgs
                                                VG                              
                                                                                
#PV #LV #SN Attr   VSize    VFree
VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a   1  33   0 wz--n-    3.14T 
267.36G

(lvs --units B | grep aa15042e | while read vg lv flags size; do echo -n "$size 
+" | sed 's/B//g'; done; echo 0)| bc -l
                                                3170843492352

So at this point we have confirmed that there are in fact lvs not accounted for 
by xapi. So we look for them
lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do [ "$(xe 
vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done
                This returned a long list of UUIDs that did not have a matching 
entry in xapi

Grabbing one of the UUIDs at random and searching back in the xensource.log we 
find something strange
                [20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388 
inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.exec 
exception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error in Metadata 
volume operation for SR. [opterr=VDI delete operation failed for parameters: 
/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT, 
c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file with params 
[3, 0, 512, 512]. Error: Input/output error];  ]

After a little googling around and finally finding a thread on the citrix 
forums (http://forums.citrix.com/thread.jspa?threadID=299275) that pointed me 
at a process to rebuild the metadata for that specific SR without having to 
blow away the SR and start fresh.
                Commands
lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT 
/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT
xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

This got rid of the SR_backend errors but the LVs continued to persist.  
Started looking in the SMlog started seeing lines that pointed at the pool not 
being ready and exiting
                <25168> 2012-11-14 12:27:24.195463      Pool is not ready, 
exiting

At this point I manually forced the offline node out of the pool and the SMlog 
reported a success in the purge process.
                xe host-forget uuid=<down host>

_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Reply via email to