Hello Christian, just a quick round up:
Did you still see the issue? It stopped for me after removing live snap shots. On 17.09.2015 07:39, Christian Hailer wrote: > Hi, > > just to get it straight: most of my VMs had one or more existing snapshots. > Do you think this is a problem currently? If I understand it correctly the BZ > of Markus concerns only a short period of time while removing a snapshot, but > my VMs stopped responding in the middle of the night without any > interaction... > I deleted all the snapshots, just in case :) my system is running fine for > nearly three days now, I'm not quite sure but I think it helped that I > changed the HDD and NIC of the Windows 2012 VMs to VirtIO devices... > > Best regards, Christian > > -----Ursprüngliche Nachricht----- > Von: Daniel Helgenberger [mailto:daniel.helgenber...@m-box.de] > Gesendet: Dienstag, 15. September 2015 22:24 > An: Markus Stockhausen <stockhau...@collogia.de>; Christian Hailer > <christ...@hailer.eu> > Cc: yd...@redhat.com; users@ovirt.org > Betreff: Re: AW: [ovirt-users] Some VMs in status "not responding" in oVirt > interface > > > > On 15.09.2015 21:31, Markus Stockhausen wrote: >> Hi Christian, >> >> I think of a package similar like this: >> >> qemu-debuginfo.x86_64 2:2.1.3-10.fc21 >> >> That allows gdb to show information about backtrace symbols. See >> comment 12 of https://bugzilla.redhat.com/show_bug.cgi?id=1262251 >> Makes error search much simpler - especially if qemu hangs. > > Markus, thanks for the BZ. I think I do see the same issue. Actually my VM is > currently the only with a live snapshot and (puppetmaster) does a lot of I/O. > > Christian, maybe this BZ1262251 also applicable? > > I'll go ahead and delete the live snapshot. If I see this issue again I will > submit the trace to your BZ. > > >> >> Markus >> >> ********************************** >> >> Von: Christian Hailer [christ...@hailer.eu] >> >> Gesendet: Dienstag, 15. September 2015 21:24 >> >> An: Markus Stockhausen; 'Daniel Helgenberger' >> >> Cc: yd...@redhat.com; users@ovirt.org >> >> Betreff: AW: [ovirt-users] Some VMs in status "not responding" in >> oVirt interface >> >> >> >> >> >> >> >> >> >> >> Hi Markus, >> >> gdb is available on CentOS 7, but what do you mean by qemu-debug? I >> Installed qemu-kvm-tools, maybe this is the pendant for CentOS? >> >> qemu-kvm-tools.x86_64 : KVM debugging and diagnostics tools >> qemu-kvm-tools-ev.x86_64 : KVM debugging and diagnostics tools >> qemu-kvm-tools-rhev.x86_64 : KVM debugging and diagnostics tools >> >> Regards, Christian >> >> >> >> >> >> Von: Markus Stockhausen [mailto:stockhau...@collogia.de] >> >> >> Gesendet: Dienstag, 15. September 2015 20:40 >> >> An: Daniel Helgenberger <daniel.helgenber...@m-box.de> >> >> Cc: Christian Hailer <christ...@hailer.eu>; yd...@redhat.com; >> users@ovirt.org >> >> Betreff: Re: [ovirt-users] Some VMs in status "not responding" in >> oVirt interface >> >> >> >> Do you have a chance to install qemu-debug? If yes I would try a backtrace. >> gdb -p <qemu-pid> >> >> # bt >> Markus >> >> >> Am 15.09.2015 4:15 nachm. schrieb Daniel Helgenberger >> <daniel.helgenber...@m-box.de>: >> >> >> >> >> >> Hello, >> >> >> >> I do not want to hijack the thread but maybe my issue is related? >> >> >> >> It might have started with ovirt 3.5.3; but I cannot tell for sure. >> >> >> >> For me, one vm (foreman) is affected; the second time in 14 days. I >> can confirm this as I also loose any network connection to the VM and >> >> the ability to connect a console. >> >> Also, the only thing witch 'fixes' the issue is right now 'kill -9 <pid of >> qemu-kvm process>' >> >> >> >> As far as I can tell the VM became unresponsive at around Sep 15 >> 12:30:01; engine logged this at 12:34. Nothing obvious in VDSM logs >> (see >> >> attached). >> >> >> >> Below the engine.log part. >> >> >> >> Versions: >> >> ovirt-engine-3.5.4.2-1.el7.centos.noarch >> >> >> >> vdsm-4.16.26-0.el7.centos >> >> libvirt-1.2.8-16.el7_1.3 >> >> >> >> engine.log (1200 - 1300: >> >> 2015-09-15 12:03:47,949 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-56) [264d502a] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:08:02,708 INFO >> [org.ovirt.engine.core.bll.OvfDataUpdater] >> (DefaultQuartzScheduler_Worker-89) [2e7bf56e] Attempting to update >> >> VMs/Templates Ovf. >> >> 2015-09-15 12:08:02,709 INFO >> [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand] >> (DefaultQuartzScheduler_Worker-89) >> >> [5e9f4ba6] Running command: ProcessOvfUpdateForStoragePoolCommand internal: >> true. Entities affected : ID: >> >> 00000002-0002-0002-0002-000000000088 Type: l >> >> 2015-09-15 12:08:02,780 INFO >> [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand] >> (DefaultQuartzScheduler_Worker-89) >> >> [5e9f4ba6] Lock freed to object EngineLock [exclusiveLocks= key: >> 00000002-0002-0002-0002-000000000088 value: OVF_UPDATE >> >> 2015-09-15 12:08:47,997 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-21) [3fc854a2] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:13:06,998 INFO >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand] >> (org.ovirt.thread.pool-8-thread-48) >> >> [50221cdc] START, GetFileStatsVDSCommand( storagePoolId = >> 00000002-0002-0002-0002-000000000088, ignoreFailoverLimit = false), >> log id: 1503968 >> >> 2015-09-15 12:13:07,137 INFO >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand] >> (org.ovirt.thread.pool-8-thread-48) >> >> [50221cdc] FINISH, GetFileStatsVDSCommand, return: >> {pfSense-2.0-RELEASE-i386.iso={status=0, ctime=1432286887.0, >> size=115709952}, >> >> Fedora-15-i686-Live8 >> >> 2015-09-15 12:13:07,178 INFO >> [org.ovirt.engine.core.bll.IsoDomainListSyncronizer] >> (org.ovirt.thread.pool-8-thread-48) [50221cdc] Finished >> >> automatic refresh process for ISO file type with success, for storage domain >> id 84dcb2fc-fb63-442f-aa77-3e84dc7d5a72. >> >> 2015-09-15 12:13:48,043 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-87) [4fa1bb16] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:18:48,088 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-44) [6345e698] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:23:48,137 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-13) HA reservation >> >> status for cluster Default is OK >> >> 2015-09-15 12:28:48,183 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-76) [154c91d5] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:33:48,229 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-36) [27c73ac6] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:34:49,432 INFO >> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] >> (DefaultQuartzScheduler_Worker-41) [5f2a4b68] VM >> >> foreman 8b57ff1d-2800-48ad-b267-fd8e9e2f6fb2 moved from Up --> >> NotResponding >> >> 2015-09-15 12:34:49,578 WARN >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] >> (DefaultQuartzScheduler_Worker-41) >> >> [5f2a4b68] Correlation ID: null, Call Stack: null, Custom Event ID: -1, >> Message: VM foreman is not responding. >> >> 2015-09-15 12:38:48,273 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-10) [7a800766] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:43:48,320 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-42) [440f1c40] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:48:48,366 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-70) HA reservation >> >> status for cluster Default is OK >> >> 2015-09-15 12:53:48,412 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-12) [50221cdc] HA >> >> reservation status for cluster Default is OK >> >> 2015-09-15 12:58:48,459 INFO >> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling] >> (DefaultQuartzScheduler_Worker-3) HA reservation >> >> status for cluster Default is OK >> >> >> >> >> >> >> >> On 29.08.2015 22:48, Christian Hailer wrote: >> >>> Hello, >> >>> >> >>> last Wednesday I wanted to update my oVirt 3.5 hypervisor. It is a >>> single Centos >> >> >>> 7 server, so I started by suspending the VMs in order to set the >>> oVirt engine >> >>> host to maintenance mode. During the process of suspending the VMs >>> the server >> >>> crashed, kernel panic… >> >>> >> >>> After restarting the server I installed the updates via yum an >>> restarted the >> >>> server again. Afterwards, all the VMs could be started again. Some >>> hours later >> >>> my monitoring system registered some unresponsive hosts, I had a look >>> in the >> >>> oVirt interface, 3 of the VMs were in the state “not responding”, >>> marked by a >> >>> question mark. >> >>>> >>> I tried to shut down the VMs, but oVirt wasn’t able to do so. I tried >>> to reset >> >>> the status in the database with the sql statement >> >>> >> >>> update vm_dynamic set status = 0 where vm_guid = (select vm_guid from >>> vm_static >> >> >>> where vm_name = 'MYVMNAME'); >> >>> >> >>> but that didn’t help, either. Only rebooting the whole hypervisor >>> helped… >> >>> afterwards everything worked again. But only for a few hours, then >>> one of the >> >>> VMs entered the “not responding” state again… again only a reboot helped. >> >>> Yesterday it happened again: >> >>> >> >>> 2015-08-28 17:44:22,664 INFO >> >>> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] >> >>> (DefaultQuartzScheduler_Worker-60) [4ef90b12] VM DC >> >>> 0f3d1f06-e516-48ce-aa6f-7273c33d3491 moved from Up --> NotResponding >> >>> >> >>> 2015-08-28 17:44:22,692 WARN >> >>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector >>> ] >> >>> (DefaultQuartzScheduler_Worker-60) [4ef90b12] Correlation ID: null, Call >>> Stack: >> >> >>> null, Custom Event ID: -1, Message: VM DC is not responding. >> >>> >> >>> Does anybody know what I can do? Where should I have a look? Hints >>> are greatly >> >>> appreciated! >> >>> >> >>> Thanks, >> >>> >> >>> Christian >> >>> >> >> >> > > -- > Daniel Helgenberger > m box bewegtbild GmbH > > P: +49/30/2408781-22 > F: +49/30/2408781-10 > > ACKERSTR. 19 > D-10115 BERLIN > > > www.m-box.de www.monkeymen.tv > > Geschäftsführer: Martin Retschitzegger / Michaela Göllner > Handeslregister: Amtsgericht Charlottenburg / HRB 112767 > -- Daniel Helgenberger m box bewegtbild GmbH P: +49/30/2408781-22 F: +49/30/2408781-10 ACKERSTR. 19 D-10115 BERLIN www.m-box.de www.monkeymen.tv Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht Charlottenburg / HRB 112767 _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users