On Sun, May 1, 2016 at 1:35 AM, <nico...@devels.es> wrote: > El 2016-04-30 23:22, Nir Soffer escribió: >> >> On Sun, May 1, 2016 at 12:48 AM, <nico...@devels.es> wrote: >>> >>> El 2016-04-30 22:37, Nir Soffer escribió: >>>> >>>> >>>> On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer <nsof...@redhat.com> wrote: >>>>> >>>>> >>>>> On Sat, Apr 30, 2016 at 7:16 PM, <nico...@devels.es> wrote: >>>>>> >>>>>> >>>>>> El 2016-04-30 16:55, Nir Soffer escribió: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Apr 30, 2016 at 11:33 AM, Nicolás <nico...@devels.es> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Nir, >>>>>>>> >>>>>>>> El 29/04/16 a las 22:34, Nir Soffer escribió: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 29, 2016 at 9:17 PM, <nico...@devels.es> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> We're running oVirt 3.6.5.3-1 and lately we're experiencing some >>>>>>>>>> issues >>>>>>>>>> with >>>>>>>>>> some VMs being paused because they're marked as non-responsive. >>>>>>>>>> Mostly, >>>>>>>>>> after a few seconds they recover, but we want to debug precisely >>>>>>>>>> this >>>>>>>>>> problem so we can fix it consistently. >>>>>>>>>> >>>>>>>>>> Our scenario is the following: >>>>>>>>>> >>>>>>>>>> ~495 VMs, of which ~120 are constantly up >>>>>>>>>> 3 datastores, all of them iSCSI-based: >>>>>>>>>> * ds1: 2T, currently has 276 disks >>>>>>>>>> * ds2: 2T, currently has 179 disks >>>>>>>>>> * ds3: 500G, currently has 65 disks >>>>>>>>>> 7 hosts: All have mostly the same hardware. CPU and memory are >>>>>>>>>> currently >>>>>>>>>> very lowly used (< 10%). >>>>>>>>>> >>>>>>>>>> ds1 and ds2 are physically the same backend which exports two >>>>>>>>>> 2TB >>>>>>>>>> volumes. >>>>>>>>>> ds3 is a different storage backend where we're currently migrating >>>>>>>>>> some >>>>>>>>>> disks from ds1 and ds2. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> What the the storage backend behind ds1 and 2? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand >>>>>>>> P4000 >>>>>>>> G2. >>>>>>>> >>>>>>>>>> Usually, when VMs become unresponsive, the whole host where they >>>>>>>>>> run >>>>>>>>>> gets >>>>>>>>>> unresponsive too, so that gives a hint about the problem, my bet >>>>>>>>>> is >>>>>>>>>> the >>>>>>>>>> culprit is somewhere on the host side and not on the VMs side. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Probably the vm became unresponsive because connection to the host >>>>>>>>> was >>>>>>>>> lost. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I forgot to mention that less commonly we have situations where the >>>>>>>> host >>>>>>>> doesn't get unresponsive but the VMs on it do and they don't become >>>>>>>> responsive ever again, so we have to forcibly power them off and >>>>>>>> start >>>>>>>> them >>>>>>>> on a different host. But in this case the connection with the host >>>>>>>> doesn't >>>>>>>> ever get lost (so basically the host is Up, but any VM run on them >>>>>>>> is >>>>>>>> unresponsive). >>>>>>>> >>>>>>>> >>>>>>>>>> When that >>>>>>>>>> happens, the host itself gets non-responsive and only recoverable >>>>>>>>>> after >>>>>>>>>> reboot, since it's unable to reconnect. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Piotr, can you check engine log and explain why host is not >>>>>>>>> reconnected? >>>>>>>>> >>>>>>>>>> I must say this is not specific to >>>>>>>>>> this oVirt version, when we were using v.3.6.4 the same happened, >>>>>>>>>> and >>>>>>>>>> it's >>>>>>>>>> also worthy mentioning we've not done any configuration changes >>>>>>>>>> and >>>>>>>>>> everything had been working quite well for a long time. >>>>>>>>>> >>>>>>>>>> We were monitoring our ds1 and ds2 physical backend to see >>>>>>>>>> performance >>>>>>>>>> and >>>>>>>>>> we suspect we've run out of IOPS since we're reaching the maximum >>>>>>>>>> specified >>>>>>>>>> by the manufacturer, probably at certain times the host cannot >>>>>>>>>> perform >>>>>>>>>> a >>>>>>>>>> storage operation within some time limit and it marks VMs as >>>>>>>>>> unresponsive. >>>>>>>>>> That's why we've set up ds3 and we're migrating ds1 and ds2 to >>>>>>>>>> ds3. >>>>>>>>>> When >>>>>>>>>> we >>>>>>>>>> run out of space on ds3 we'll create more smaller volumes to keep >>>>>>>>>> migrating. >>>>>>>>>> >>>>>>>>>> On the host side, when this happens, we've run repoplot on the >>>>>>>>>> vdsm >>>>>>>>>> log >>>>>>>>>> and >>>>>>>>>> I'm attaching the result. Clearly there's a *huge* LVM response >>>>>>>>>> time >>>>>>>>>> (~30 >>>>>>>>>> secs.). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Indeed the log show very slow vgck and vgs commands - these are >>>>>>>>> called >>>>>>>>> every >>>>>>>>> 5 minutes for checking the vg health and refreshing vdsm lvm cache. >>>>>>>>> >>>>>>>>> 1. starting vgck >>>>>>>>> >>>>>>>>> Thread-96::DEBUG::2016-04-29 >>>>>>>>> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' >>>>>>>>> devices >>>>>>>>> { pre >>>>>>>>> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >>>>>>>>> write_cache_state=0 disable_after_error_count=3 filter = [ >>>>>>>>> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\ >>>>>>>>> '', '\''r|.*|'\'' ] } global { locking_type=1 >>>>>>>>> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup >>>>>>>>> { >>>>>>>>> retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 >>>>>>>>> 9c-8eee-10368647c413 (cwd None) >>>>>>>>> >>>>>>>>> 2. vgck ends after 55 seconds >>>>>>>>> >>>>>>>>> Thread-96::DEBUG::2016-04-29 >>>>>>>>> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = >>>>>>>>> ' >>>>>>>>> WARNING: lvmetad is running but disabled. Restart lvmetad before >>>>>>>>> enabling it!\n'; <rc> = 0 >>>>>>>>> >>>>>>>>> 3. starting vgs >>>>>>>>> >>>>>>>>> Thread-96::DEBUG::2016-04-29 >>>>>>>>> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' >>>>>>>>> devices >>>>>>>>> { pref >>>>>>>>> erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >>>>>>>>> write_cache_state=0 disable_after_error_count=3 filter = [ >>>>>>>>> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', >>>>>>>>> '\''r|.*|'\'' ] } global { >>>>>>>>> locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >>>>>>>>> use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' >>>>>>>>> --noheadings --units b --nosuffix --separator '| >>>>>>>>> ' --ignoreskippedcluster -o >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >>>>>>>>> 5de4a000-a9c4-489c-8eee-10368 >>>>>>>>> 647c413 (cwd None) >>>>>>>>> >>>>>>>>> 4. vgs finished after 37 seconds >>>>>>>>> >>>>>>>>> Thread-96::DEBUG::2016-04-29 >>>>>>>>> 13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = >>>>>>>>> ' >>>>>>>>> WARNING: lvmetad is running but disabled. Restart lvmetad before >>>>>>>>> enabling it!\n'; <rc> = 0 >>>>>>>>> >>>>>>>>> Zdenek, how do you suggest to debug this slow lvm commands? >>>>>>>>> >>>>>>>>> Can you run the following commands on a host in trouble, and on >>>>>>>>> some >>>>>>>>> other >>>>>>>>> hosts in the same timeframe? >>>>>>>>> >>>>>>>>> time vgck -vvvv --config ' devices { filter = >>>>>>>>> ['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'', >>>>>>>>> '\''r|.*|'\'' ] } global { locking_type=1 >>>>>>>>> prioritise_write_locks=1 >>>>>>>>> wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 >>>>>>>>> retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413 >>>>>>>>> >>>>>>>>> time vgs -vvvv --config ' global { locking_type=1 >>>>>>>>> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup >>>>>>>>> { >>>>>>>>> retain_min = 50 retain_days = 0 } ' >>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413 >>>>>>>>> >>>>>>>>> Note that I added -vvvv to both commands, this will generate huge >>>>>>>>> amount >>>>>>>>> of debugging info. Please share the output of these commands. >>>>>>>>> >>>>>>>>> You may need to fix the commands. You can always copy and paste >>>>>>>>> directly >>>>>>>>> from vdsm log into the shell and add the -vvvv flag. >>>>>>>>> >>>>>>>> >>>>>>>> Indeed, there seems to be a big difference on hosts 5 and 6. I'm >>>>>>>> attaching >>>>>>>> the results of the execution of both commands on all hosts. Both >>>>>>>> commands >>>>>>>> show a pretty bigger output on hosts 5 and 6, and also a much bigger >>>>>>>> execution time. Times are also attached in a file called TIMES. >>>>>>>> >>>>>>>>>> Our host storage network is correctly configured and on a 1G >>>>>>>>>> interface, no errors on the host itself, switches, etc. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 1G serving about 20-70 vms per host? (495 vms, 120 always up, 7 >>>>>>>>> hosts)? >>>>>>>>> >>>>>>>>> Do you have separate network for management and storage, or both >>>>>>>>> use this 1G interface? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yes, we have separate networks for management, storage and motion. >>>>>>>> Storage >>>>>>>> and motion have 1G each (plus, for storage we use a bond of 2 >>>>>>>> interfaces >>>>>>>> in >>>>>>>> ALB-mode (6)). Currently no host has more than 30 VMs at a time. >>>>>>>> >>>>>>>>>> We've also limited storage in QoS to use 10MB/s and 40 IOPS, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> How did you configure this limit? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In the Data Center tab, I chose our DC and a QoS sub-tab appears, >>>>>>>> just >>>>>>>> like >>>>>>>> described here: http://www.ovirt.org/documentation/sla/network-qos/ >>>>>>>> >>>>>>>>> >>>>>>>>>> but this issue >>>>>>>>>> still happens, which leads me to be concerned whether this is not >>>>>>>>>> just >>>>>>>>>> an >>>>>>>>>> IOPS issue; each host handles about cca. 600 LVs. Could this be an >>>>>>>>>> issue >>>>>>>>>> too? I remark the LVM response times are low in normal conditions >>>>>>>>>> (~1-2 >>>>>>>>>> seconds). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> We recommend to limit number lvs per vg to 350. If you have 276 >>>>>>>>> disks >>>>>>>>> on >>>>>>>>> ds1, and the disks are using snapshots, you may have too many lvs, >>>>>>>>> which >>>>>>>>> can cause slowdowns in lvm operations. >>>>>>>>> >>>>>>>>> Can you share the output of: >>>>>>>>> >>>>>>>>> vdsm-tool dump-volume-chains 5de4a000-a9c4-489c-8eee-10368647c413 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Ok, right now no VG has more than 350 VGs so I guess this is not >>>>>>>> currently >>>>>>>> the problem. Unfortunately, I run the vdsm-tool command but it >>>>>>>> didn't >>>>>>>> end >>>>>>>> nor provide any output in cca. 1 hour, so I guess it was hanged and >>>>>>>> I >>>>>>>> stopped it. If you confirm this is a normal execution time I can >>>>>>>> leave >>>>>>>> it >>>>>>>> running whatever time it takes. >>>>>>>> >>>>>>>> host5:~# vgs >>>>>>>> VG #PV #LV #SN Attr VSize >>>>>>>> VFree >>>>>>>> 0927c050-6fb6-463c-bb37-8b8da641dcd3 1 63 0 wz--n- 499,62g >>>>>>>> 206,50g >>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413 1 335 0 wz--n- 2,00t >>>>>>>> 518,62g >>>>>>>> b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8 1 215 0 wz--n- 2,00t >>>>>>>> 495,25g >>>>>>>> sys_vg 1 6 0 wz--n- 136,44g >>>>>>>> 122,94g >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm attaching the vdsm.log, engine.log and repoplot PDF; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This is very useful, thanks. Can you send also the vdsm logs and >>>>>>>>> repoplots >>>>>>>>> from other hosts for the same timeframe? >>>>>>>>> >>>>>>>>>> if someone could >>>>>>>>>> give a hint on some additional problems in them and shed some >>>>>>>>>> light >>>>>>>>>> on >>>>>>>>>> the >>>>>>>>>> above thoughts I'd be very grateful. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Do you have sar configured on the host? having sar logs can reveal >>>>>>>>> more >>>>>>>>> info about this timeframe. >>>>>>>>> >>>>>>>>> Do you have information about amount of io from vms during this >>>>>>>>> timeframe? >>>>>>>>> amount of io on the storage backend during this timeframe? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Not currently, but I'll be alert for the time this happens again and >>>>>>>> I'll >>>>>>>> check some sar commands related to I/O and I'll provide feedback. >>>>>>>> Nevertheless, by the time I run the commands above no machine was >>>>>>>> unresponsive and I'm still getting such huge execution times. I >>>>>>>> tried >>>>>>>> running iotop now and I see there are 2 vgck and vgs processes with >>>>>>>> a >>>>>>>> rate >>>>>>>> of ~500Kb/s each for reading. >>>>>>>> >>>>>>>> root 24446 1.3 0.0 57008 6824 ? D< 09:15 0:00 >>>>>>>> /usr/sbin/lvm vgs --config devices { preferred_names = >>>>>>>> ["^/dev/mapper/"] >>>>>>>> ignore_suspended_devices=1 write_cache_state=0 >>>>>>>> disable_after_error_count=3 >>>>>>>> filter = [ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|', >>>>>>>> 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 >>>>>>>> wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 >>>>>>>> retain_days >>>>>>>> = 0 >>>>>>>> } --noheadings --units b --nosuffix --separator | >>>>>>>> --ignoreskippedcluster >>>>>>>> -o >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >>>>>>>> b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d >>>>>>>> >>>>>>>> I'm also attaching the iostat output for host 5 in a file called >>>>>>>> iostat-host5. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Interesting, all hosts are checking the same storage, but on host5 >>>>>>> and >>>>>>> host6 >>>>>>> the output of vgs and vgck is 10 times bigger - this explain why they >>>>>>> take >>>>>>> about >>>>>>> 10 times longer to run. >>>>>>> >>>>>>> $ ls -lh vgs-* >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 221K Apr 30 08:55 vgs-host1 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 239K Apr 30 08:55 vgs-host2 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 228K Apr 30 08:55 vgs-host3 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 288K Apr 30 08:55 vgs-host4 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host5 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host6 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 232K Apr 30 08:55 vgs-host7 >>>>>>> $ ls -lh vgck-* >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 238K Apr 30 08:55 vgck-host1 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 254K Apr 30 08:55 vgck-host2 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 244K Apr 30 08:55 vgck-host3 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 298K Apr 30 08:55 vgck-host4 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host5 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host6 >>>>>>> -rw-r--r--. 1 nsoffer nsoffer 248K Apr 30 08:55 vgck-host7 >>>>>>> >>>>>>> Can you do collect the output of "dmsetup table -v" on host 1, 5, and >>>>>>> 6? >>>>>>> >>>>>>> Nir >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sure, please find attached results of the command. Once again, for >>>>>> hosts >>>>>> 5 >>>>>> and 6 the size is 10 times bigger than for host 1. >>>>> >>>>> >>>>> >>>>> I think the issue causing slow vgs and vgck commands is stale >>>>> lvs on host 5 and 6. This may be the root caused for paused vms, but I >>>>> cannot >>>>> explain how it is related yet. >>>>> >>>>> Comparing vgck on host1 and host5 - we can see that vgck opens >>>>> 213 /dev/mapper/* devices, actually 53 uniq devices, most of them >>>>> are opened 4 times. On host5, vgck opens 2489 devices (622 uniq). >>>>> This explains why the operation takes about 10 times longer. >>>>> >>>>> Checking dmsteup table output, we can see that host1 has 53 devices, >>>>> and host5 622 devices. >>>>> >>>>> Checking the device open count, host1 has 15 stale devices >>>>> (Open count: 0), but host5 has 597 stale devices. >>>>> >>>>> Leaving stale devices is a known issue, but we never had evidence that >>>>> it >>>>> cause >>>>> trouble except warnings in lvm commands. >>>>> >>>>> Please open an ovirt bug about this issue, and include all the files >>>>> you >>>>> sent >>>>> so far in this thread, and all the vdsm logs on host5. >>>>> >>>>> To remove the stale devices, you can do: >>>>> >>>>> for name in `dmsetup info -c -o open,name | awk '/ 0 / {print $2}'`; do >>>>> echo "removing stale device: $name" >>>>> dmsetup remove $name >>>>> done >>>> >>>> >>>> >>>> Note that this cleanup is safe only in maintenance mode, since it will >>>> also remove >>>> the mapping for vdsm special lvs (ids, leases, inbox, outbox, master, >>>> metatada) >>>> that are accessed by vdsm regularly, and are not stale. >>>> >>>> This is also not safe if you use hosted engine, hosted engine agent >>>> should maintain >>>> its vg. >>>> >>>> The safest way to remove stale devices is to restart vdsm - it >>>> deactivates all unused >>>> lvs (except special lvs) during startup. >>>> >>> >>> Hm, I run it without even putting the host on maintenance, but I will >>> next >>> time. Fortunately it seems it didn't do any harm and everything is >>> working. >>> I'll probably do a script that will put host on maintenance, cleanup or >>> restart vdsm and start again (this is not a hosted engine) if the number >>> of >>> stale devices is bigger than a threshold. >> >> >> I would not do anything automatic at this point except monitoring this >> issue. >> >> For cleanup - you have 2 options: >> >> - host is up - restart vdsm >> - host in maintenance - remove stale devices >> >> Based on local testing, I think the issue is this: >> >> - When connecting to storage, all lvs becomes active - this may be a >> change >> in lvm during el 7 development that we did not notice. >> >> - Each time vdsm is restarted, all active lvs that should not be active >> are >> deactivated during vdsm startup >> >> - When deactivating storage domain, vdsm does not deactivate the lvs >> leaving stale devices. >> >> Can you confirm that works like this on your system? >> >> To test this: >> >> 1. Put one host to maintenance >> >> You should see some stale devices - the special lvs >> (ids, leases, master, metadata, inbox, outbox) >> >> And probably 2 OVF_STORE lvs per vg (128m lvs used to store >> vm ovf) >> >> 2. Cleanup the stale devices >> >> 3. Activate host >> >> You should see again about 600 stale devices, and lvm commands >> will probably run much slower as you reported. >> >> 4. Restart vdsm >> >> All the regular lvs should be inactive now, no stale devices >> should exists, except the special lvs (expected) >> >> Nir > > > Sure, I will test this tomorrow and provide feedback with results.
I you are brave, you can this this patch, perhaps on one of your hosts. https://gerrit.ovirt.org/56877 Cheers, Nir _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users