On 5.3.2014, 16:01, Nir Soffer wrote: > ----- Original Message ----- >> From: "Boyan Tabakov" <bl...@alslayer.net> >> To: "Nir Soffer" <nsof...@redhat.com> >> Cc: users@ovirt.org >> Sent: Wednesday, March 5, 2014 3:38:25 PM >> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some >> nodes >> >> Hello Nir, >> >> On Wed Mar 5 14:37:17 2014, Nir Soffer wrote: >>> ----- Original Message ----- >>>> From: "Boyan Tabakov" <bl...@alslayer.net> >>>> To: "Nir Soffer" <nsof...@redhat.com> >>>> Cc: users@ovirt.org >>>> Sent: Tuesday, March 4, 2014 3:53:24 PM >>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on >>>> some nodes >>>> >>>> On Tue Mar 4 14:46:33 2014, Nir Soffer wrote: >>>>> ----- Original Message ----- >>>>>> From: "Nir Soffer" <nsof...@redhat.com> >>>>>> To: "Boyan Tabakov" <bl...@alslayer.net> >>>>>> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabe...@redhat.com> >>>>>> Sent: Monday, March 3, 2014 9:39:47 PM >>>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on >>>>>> some nodes >>>>>> >>>>>> Hi Zdenek, can you look into this strange incident? >>>>>> >>>>>> When user creates a disk on one host (create a new lv), the lv is not >>>>>> seen >>>>>> on another host in the cluster. >>>>>> >>>>>> Calling multipath -r cause the new lv to appear on the other host. >>>>>> >>>>>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but >>>>>> unusual. >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "Boyan Tabakov" <bl...@alslayer.net> >>>>>>> To: "Nir Soffer" <nsof...@redhat.com> >>>>>>> Cc: users@ovirt.org >>>>>>> Sent: Monday, March 3, 2014 9:51:05 AM >>>>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on >>>>>>> some >>>>>>> nodes >>>>>>>>>>>>> Consequently, when creating/booting >>>>>>>>>>>>> a VM with the said disk attached, the VM fails to start on host2, >>>>>>>>>>>>> because host2 can't see the LV. Similarly, if the VM is started >>>>>>>>>>>>> on >>>>>>>>>>>>> host1, it fails to migrate to host2. Extract from host2 log is in >>>>>>>>>>>>> the >>>>>>>>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. >>>>>>>>>>>>> >>>>>>>>>>>>> As far as I could track quickly the vdsm code, there is only call >>>>>>>>>>>>> to >>>>>>>>>>>>> lvs >>>>>>>>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully >>>>>>>>>>>>> refresh. >>>>>>>> >>>>>>>> lvs should see any change on the shared storage. >>>>>>>> >>>>>>>>>>>>> The only workaround so far has been to restart VDSM on host2, >>>>>>>>>>>>> which >>>>>>>>>>>>> makes it refresh all LVM data properly. >>>>>>>> >>>>>>>> When vdsm starts, it calls multipath -r, which ensure that we see all >>>>>>>> physical volumes. >>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> When is host2 supposed to pick up any newly created LVs in the SD >>>>>>>>>>>>> VG? >>>>>>>>>>>>> Any suggestions where the problem might be? >>>>>>>>>>>> >>>>>>>>>>>> When you create a new lv on the shared storage, the new lv should >>>>>>>>>>>> be >>>>>>>>>>>> visible on the other host. Lets start by verifying that you do see >>>>>>>>>>>> the new lv after a disk was created. >>>>>>>>>>>> >>>>>>>>>>>> Try this: >>>>>>>>>>>> >>>>>>>>>>>> 1. Create a new disk, and check the disk uuid in the engine ui >>>>>>>>>>>> 2. On another machine, run this command: >>>>>>>>>>>> >>>>>>>>>>>> lvs -o vg_name,lv_name,tags >>>>>>>>>>>> >>>>>>>>>>>> You can identify the new lv using tags, which should contain the >>>>>>>>>>>> new >>>>>>>>>>>> disk >>>>>>>>>>>> uuid. >>>>>>>>>>>> >>>>>>>>>>>> If you don't see the new lv from the other host, please provide >>>>>>>>>>>> /var/log/messages >>>>>>>>>>>> and /var/log/sanlock.log. >>>>>>>>>>> >>>>>>>>>>> Just tried that. The disk is not visible on the non-SPM node. >>>>>>>>>> >>>>>>>>>> This means that storage is not accessible from this host. >>>>>>>>> >>>>>>>>> Generally, the storage seems accessible ok. For example, if I restart >>>>>>>>> the vdsmd, all volumes get picked up correctly (become visible in lvs >>>>>>>>> output and VMs can be started with them). >>>>>>>> >>>>>>>> Lests repeat this test, but now, if you do not see the new lv, please >>>>>>>> run: >>>>>>>> >>>>>>>> multipath -r >>>>>>>> >>>>>>>> And report the results. >>>>>>>> >>>>>>> >>>>>>> Running multipath -r helped and the disk was properly picked up by the >>>>>>> second host. >>>>>>> >>>>>>> Is running multipath -r safe while host is not in maintenance mode? >>>>>> >>>>>> It should be safe, vdsm uses in some cases. >>>>>> >>>>>>> If yes, as a temporary workaround I can patch vdsmd to run multipath -r >>>>>>> when e.g. monitoring the storage domain. >>>>>> >>>>>> I suggested running multipath as debugging aid; normally this is not >>>>>> needed. >>>>>> >>>>>> You should see lv on the shared storage without running multipath. >>>>>> >>>>>> Zdenek, can you explain this? >>>>>> >>>>>>>>> One warning that I keep seeing in vdsm logs on both nodes is this: >>>>>>>>> >>>>>>>>> Thread-1617881::WARNING::2014-02-24 >>>>>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG >>>>>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded >>>>>>>>> critical size: mdasize=134217728 mdafree=0 >>>>>>>> >>>>>>>> Can you share the output of the command bellow? >>>>>>>> >>>>>>>> lvs -o >>>>>>>> >>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >>>>>>> >>>>>>> Here's the output for both hosts. >>>>>>> >>>>>>> host1: >>>>>>> [root@host1 ~]# lvs -o >>>>>>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count >>>>>>> LV UUID LV >>>>>>> Attr LSize VFree Ext #Ext Free LV Tags >>>>>>> >>>>>>> VMdaSize VMdaFree #LV #PV >>>>>>> jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL >>>>>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m >>>>>>> 1596 917 >>>>>>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465 >>>>>>> 128.00m 0 13 2 >>>>>> >>>>>> This looks wrong - your vg_mda_free is zero - as vdsm complains. >>> >>> Patch http://gerrit.ovirt.org/25408 should solve this issue. >>> >>> It may also solve the other issue with the missing lv - I could >>> not reproduce it yet. >>> >>> Can you try to apply this patch and report the results? >>> >>> Thanks, >>> Nir >> >> This patch helped, indeed! I tried it on the non-SPM node (as that's >> the node that I can currently easily put in maintenance) and the node >> started picking up newly created volumes correctly. I also set the >> user_lvmetad to 0 in the main lvm.conf, because without it manually >> running e.g. lvs was still using the metadata daemon. >> >> I can't confirm yet that this helps with the metadata volume warning, >> as that warning appears only on the SPM. I'll be able to put the SPM >> node in maintenance soon and will report later. >> >> This issue on Fedora makes me think - is Fedora still fully supported >> platform? > > It is supported, but probably not tested properly. > > Nir >
Alright! Thanks a lot for the help! BR, Boyan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users