On Sat, Apr 30, 2016 at 11:33 AM, Nicolás <nico...@devels.es> wrote: > Hi Nir, > > El 29/04/16 a las 22:34, Nir Soffer escribió: >> >> On Fri, Apr 29, 2016 at 9:17 PM, <nico...@devels.es> wrote: >>> >>> Hi, >>> >>> We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues >>> with >>> some VMs being paused because they're marked as non-responsive. Mostly, >>> after a few seconds they recover, but we want to debug precisely this >>> problem so we can fix it consistently. >>> >>> Our scenario is the following: >>> >>> ~495 VMs, of which ~120 are constantly up >>> 3 datastores, all of them iSCSI-based: >>> * ds1: 2T, currently has 276 disks >>> * ds2: 2T, currently has 179 disks >>> * ds3: 500G, currently has 65 disks >>> 7 hosts: All have mostly the same hardware. CPU and memory are currently >>> very lowly used (< 10%). >>> >>> ds1 and ds2 are physically the same backend which exports two 2TB >>> volumes. >>> ds3 is a different storage backend where we're currently migrating some >>> disks from ds1 and ds2. >> >> What the the storage backend behind ds1 and 2? > > > The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 G2. > >>> Usually, when VMs become unresponsive, the whole host where they run gets >>> unresponsive too, so that gives a hint about the problem, my bet is the >>> culprit is somewhere on the host side and not on the VMs side. >> >> Probably the vm became unresponsive because connection to the host was >> lost. > > > I forgot to mention that less commonly we have situations where the host > doesn't get unresponsive but the VMs on it do and they don't become > responsive ever again, so we have to forcibly power them off and start them > on a different host. But in this case the connection with the host doesn't > ever get lost (so basically the host is Up, but any VM run on them is > unresponsive). > > >>> When that >>> happens, the host itself gets non-responsive and only recoverable after >>> reboot, since it's unable to reconnect. >> >> Piotr, can you check engine log and explain why host is not reconnected? >> >>> I must say this is not specific to >>> this oVirt version, when we were using v.3.6.4 the same happened, and >>> it's >>> also worthy mentioning we've not done any configuration changes and >>> everything had been working quite well for a long time. >>> >>> We were monitoring our ds1 and ds2 physical backend to see performance >>> and >>> we suspect we've run out of IOPS since we're reaching the maximum >>> specified >>> by the manufacturer, probably at certain times the host cannot perform a >>> storage operation within some time limit and it marks VMs as >>> unresponsive. >>> That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When >>> we >>> run out of space on ds3 we'll create more smaller volumes to keep >>> migrating. >>> >>> On the host side, when this happens, we've run repoplot on the vdsm log >>> and >>> I'm attaching the result. Clearly there's a *huge* LVM response time (~30 >>> secs.). >> >> Indeed the log show very slow vgck and vgs commands - these are called >> every >> 5 minutes for checking the vg health and refreshing vdsm lvm cache. >> >> 1. starting vgck >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices >> { pre >> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter = [ >> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\ >> '', '\''r|.*|'\'' ] } global { locking_type=1 >> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { >> retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 >> 9c-8eee-10368647c413 (cwd None) >> >> 2. vgck ends after 55 seconds >> >> Thread-96::DEBUG::2016-04-29 >> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' >> WARNING: lvmetad is running but disabled. Restart lvmetad before >> enabling it!\n'; <rc> = 0 >> >> 3. starting vgs >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices >> { pref >> erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter = [ >> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de >> >> v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', >> '\''r|.*|'\'' ] } global { >> locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' >> --noheadings --units b --nosuffix --separator '| >> ' --ignoreskippedcluster -o >> >> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >> 5de4a000-a9c4-489c-8eee-10368 >> 647c413 (cwd None) >> >> 4. vgs finished after 37 seconds >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' >> WARNING: lvmetad is running but disabled. Restart lvmetad before >> enabling it!\n'; <rc> = 0 >> >> Zdenek, how do you suggest to debug this slow lvm commands? >> >> Can you run the following commands on a host in trouble, and on some other >> hosts in the same timeframe? >> >> time vgck -vvvv --config ' devices { filter = >> ['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'', >> '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 >> wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 >> retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413 >> >> time vgs -vvvv --config ' global { locking_type=1 >> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { >> retain_min = 50 retain_days = 0 } ' >> 5de4a000-a9c4-489c-8eee-10368647c413 >> >> Note that I added -vvvv to both commands, this will generate huge amount >> of debugging info. Please share the output of these commands. >> >> You may need to fix the commands. You can always copy and paste directly >> from vdsm log into the shell and add the -vvvv flag. >> > > Indeed, there seems to be a big difference on hosts 5 and 6. I'm attaching > the results of the execution of both commands on all hosts. Both commands > show a pretty bigger output on hosts 5 and 6, and also a much bigger > execution time. Times are also attached in a file called TIMES. > >>> Our host storage network is correctly configured and on a 1G >>> interface, no errors on the host itself, switches, etc. >> >> 1G serving about 20-70 vms per host? (495 vms, 120 always up, 7 hosts)? >> >> Do you have separate network for management and storage, or both >> use this 1G interface? > > > Yes, we have separate networks for management, storage and motion. Storage > and motion have 1G each (plus, for storage we use a bond of 2 interfaces in > ALB-mode (6)). Currently no host has more than 30 VMs at a time. > >>> We've also limited storage in QoS to use 10MB/s and 40 IOPS, >> >> How did you configure this limit? > > > In the Data Center tab, I chose our DC and a QoS sub-tab appears, just like > described here: http://www.ovirt.org/documentation/sla/network-qos/ > >> >>> but this issue >>> still happens, which leads me to be concerned whether this is not just an >>> IOPS issue; each host handles about cca. 600 LVs. Could this be an issue >>> too? I remark the LVM response times are low in normal conditions (~1-2 >>> seconds). >> >> We recommend to limit number lvs per vg to 350. If you have 276 disks on >> ds1, and the disks are using snapshots, you may have too many lvs, which >> can cause slowdowns in lvm operations. >> >> Can you share the output of: >> >> vdsm-tool dump-volume-chains 5de4a000-a9c4-489c-8eee-10368647c413 > > > Ok, right now no VG has more than 350 VGs so I guess this is not currently > the problem. Unfortunately, I run the vdsm-tool command but it didn't end > nor provide any output in cca. 1 hour, so I guess it was hanged and I > stopped it. If you confirm this is a normal execution time I can leave it > running whatever time it takes. > > host5:~# vgs > VG #PV #LV #SN Attr VSize VFree > 0927c050-6fb6-463c-bb37-8b8da641dcd3 1 63 0 wz--n- 499,62g 206,50g > 5de4a000-a9c4-489c-8eee-10368647c413 1 335 0 wz--n- 2,00t 518,62g > b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8 1 215 0 wz--n- 2,00t 495,25g > sys_vg 1 6 0 wz--n- 136,44g 122,94g >>> >>> I'm attaching the vdsm.log, engine.log and repoplot PDF; >> >> This is very useful, thanks. Can you send also the vdsm logs and repoplots >> from other hosts for the same timeframe? >> >>> if someone could >>> give a hint on some additional problems in them and shed some light on >>> the >>> above thoughts I'd be very grateful. >> >> Do you have sar configured on the host? having sar logs can reveal more >> info about this timeframe. >> >> Do you have information about amount of io from vms during this timeframe? >> amount of io on the storage backend during this timeframe? > > > Not currently, but I'll be alert for the time this happens again and I'll > check some sar commands related to I/O and I'll provide feedback. > Nevertheless, by the time I run the commands above no machine was > unresponsive and I'm still getting such huge execution times. I tried > running iotop now and I see there are 2 vgck and vgs processes with a rate > of ~500Kb/s each for reading. > > root 24446 1.3 0.0 57008 6824 ? D< 09:15 0:00 > /usr/sbin/lvm vgs --config devices { preferred_names = ["^/dev/mapper/"] > ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 > filter = [ > 'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|', > 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 > wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 > } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o > uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name > b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d > > I'm also attaching the iostat output for host 5 in a file called > iostat-host5.
Interesting, all hosts are checking the same storage, but on host5 and host6 the output of vgs and vgck is 10 times bigger - this explain why they take about 10 times longer to run. $ ls -lh vgs-* -rw-r--r--. 1 nsoffer nsoffer 221K Apr 30 08:55 vgs-host1 -rw-r--r--. 1 nsoffer nsoffer 239K Apr 30 08:55 vgs-host2 -rw-r--r--. 1 nsoffer nsoffer 228K Apr 30 08:55 vgs-host3 -rw-r--r--. 1 nsoffer nsoffer 288K Apr 30 08:55 vgs-host4 -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host5 -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host6 -rw-r--r--. 1 nsoffer nsoffer 232K Apr 30 08:55 vgs-host7 $ ls -lh vgck-* -rw-r--r--. 1 nsoffer nsoffer 238K Apr 30 08:55 vgck-host1 -rw-r--r--. 1 nsoffer nsoffer 254K Apr 30 08:55 vgck-host2 -rw-r--r--. 1 nsoffer nsoffer 244K Apr 30 08:55 vgck-host3 -rw-r--r--. 1 nsoffer nsoffer 298K Apr 30 08:55 vgck-host4 -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host5 -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host6 -rw-r--r--. 1 nsoffer nsoffer 248K Apr 30 08:55 vgck-host7 Can you do collect the output of "dmsetup table -v" on host 1, 5, and 6? Nir _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users