On Sun, Mar 13, 2016 at 1:14 PM, Christophe TREFOIS < christophe.tref...@uni.lu> wrote:
> Hi Yaniv, > > > > See my answers / questions below under [CT]. > > > > *From:* Yaniv Kaul [mailto:yk...@redhat.com] > *Sent:* dimanche 13 mars 2016 12:08 > *To:* Christophe TREFOIS <christophe.tref...@uni.lu> > *Cc:* users <users@ovirt.org> > *Subject:* Re: [ovirt-users] VM get stuck randomly > > > > > > > > On Sun, Mar 13, 2016 at 9:46 AM, Christophe TREFOIS < > christophe.tref...@uni.lu> wrote: > > Dear all, > > I have a problem since couple of weeks, where randomly 1 VM (not always > the same) becomes completely unresponsive. > We find this out because our Icinga server complains that host is down. > > Upon inspection, we find we can’t open a console to the VM, nor can we > login. > > > > I assume 3.6's console feature, or is it Spice/VNC? > > *[CT] * > > > > This is 3.5, VNC/Spice yes. Sometimes we can connect, but there’s no way > to do anything, eg type or so on. > > > > > In oVirt engine, the VM looks like “up”. The only weird thing is that RAM > usage shows 0% and CPU usage shows 100% or 75% depending on number of cores. > > > > Any chance there's really something bad going on within the VM? Anything > in its journal or /var/log/messages or ... depending on the OS? > > Y. > > *[CT] * > > *It is possible. It seems to be mostly VMs with Ubuntu 14.04 and latest > kernels. I read somewhere, I couldn’t find now, that there’s perhaps a bug > in 3.x kernel with regards to libvirt / vdsm. But my knowledge is too > limited to even know where to begin the investigation **J* > > > > *On the VM logs, we just see normal VM stuff, then nothing, and then when > the VM was rebooted, there’s a couple of lines of ^@^@^@ characters > repeating. But nothing else really.* > > *Initially we thought it’s a bug with aufs on Docker, but the machines > getting stuck now don’t run either.* > > > > *From your answer, I deduce that if vdsm or libvirt or spm would see a > problem with storage / memory / cpu, it would suspend the VM and provide > that info to ovirt-engine? * > > *Since this is not happening, you think it could be related to the inside > of the VM rather than the oVirt environment, correct?* > Either that, or to libvirt/QEMU. I suggest, if possible, to upgrade the components first to newer versions (as Nir suggested). Y. > > *Thank you for your help (especially on a Sunday) **J* > > > > > > The only way to recover is to force shutdown the VM via 2-times shutdown > from the engine. > > Could you please help me to start debugging this? > I can provide any logs, but I’m not sure which ones, because I couldn’t > see anything with ERROR in the vdsm logs on the host. > > The host is running > > OS Version: RHEL - 7 - 1.1503.el7.centos.2.8 > Kernel Version: 3.10.0 - 229.14.1.el7.x86_64 > KVM Version: 2.1.2 - 23.el7_1.8.1 > LIBVIRT Version: libvirt-1.2.8-16.el7_1.4 > VDSM Version: vdsm-4.16.26-0.el7.centos > SPICE Version: 0.12.4 - 9.el7_1.3 > GlusterFS Version: glusterfs-3.7.5-1.el7 > > We use a locally exported gluster as storage domain (eg, storage is on the > same machine exposed via gluster). No replica. > We run around 50 VMs on that host. > > Thank you for your help in this, > > — > Christophe > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users