Il giorno mar 17 nov 2020 alle ore 16:01 Anton Louw < anton.l...@voxtelecom.co.za> ha scritto:
> > > Hi Sandro, > > > > Have you perhaps seen anything in the SOS report that could shed some > light on the issues? > Sadly no. I see it's oVirt Node 4.3.8, I can suggest to upgrade to 4.3.10 at least and consider upgrading to 4.4.3 the whole datacenter. I had the feeling watchdog was the trigger of the reboot but couldn't find any evidence. I also don't see anything suspicious in the logs. > > > Thanks > > > > *Anton Louw* > *Cloud Engineer: Storage and Virtualization* at *Vox* > ------------------------------ > *T:* 087 805 0000 | *D:* 087 805 1572 > *M:* N/A > *E:* anton.l...@voxtelecom.co.za > *A:* Rutherford Estate, 1 Scott Street, Waverley, Johannesburg > www.vox.co.za > > [image: F] <https://www.facebook.com/voxtelecomZA> > [image: T] <https://www.twitter.com/voxtelecom> > [image: I] <https://www.instagram.com/voxtelecomza/> > [image: L] <https://www.linkedin.com/company/voxtelecom> > [image: Y] <https://www.youtube.com/user/VoxTelecom> > > *From:* Anton Louw > *Sent:* 16 November 2020 07:30 > *To:* Sandro Bonazzola <sbona...@redhat.com>; Arik Hadas < > aha...@redhat.com>; Dominik Holler <dhol...@redhat.com> > *Cc:* users@ovirt.org; Johan Koen <johan.k...@voxtelecom.co.za> > *Subject:* RE: [ovirt-users] oVirt Node Crash > > > > I have also attached the SOS report as requested > > > > *From:* Anton Louw > *Sent:* 16 November 2020 06:54 > *To:* Sandro Bonazzola <sbona...@redhat.com>; Arik Hadas < > aha...@redhat.com>; Dominik Holler <dhol...@redhat.com> > *Cc:* users@ovirt.org; Johan Koen <johan.k...@voxtelecom.co.za> > *Subject:* RE: [ovirt-users] oVirt Node Crash > > > > Hi Sandro, > > > > Thanks for the response. I logged onto oVirt this morning, and I see the > node is in a “Unassigned” state. I can ping it, but cannot SSH, so there is > something that is causing the host to be unresponsive. > > > > On Saturday after I sent the mail, I opened a console to the node, and I > saw the below entries before logging in: > > > > audit:backlog limit exceeded > > > > I the tried the solution of increasing the buffer size in the audit.rules > file in /etc/audit/rules.d/ , as per below, but it did not resolve the > issue. > > > > ## First rule - delete all > > -D > > > > ## Increase the buffers to survive stress events. > > ## Make this bigger for busy systems > > -b 8192 > > > > ## Set failure mode to syslog > > -f 1 > > > > Is it possible to upgrade the node to 4.4 while the engine is still on 4.3? > > > > Thanks > > > > *From:* Sandro Bonazzola <sbona...@redhat.com> > *Sent:* 13 November 2020 18:39 > *To:* Anton Louw <anton.l...@voxtelecom.co.za>; Arik Hadas < > aha...@redhat.com>; Dominik Holler <dhol...@redhat.com> > *Cc:* users@ovirt.org; Johan Koen <johan.k...@voxtelecom.co.za> > *Subject:* Re: [ovirt-users] oVirt Node Crash > > > > > > > > Il giorno ven 13 nov 2020 alle ore 17:37 Sandro Bonazzola < > sbona...@redhat.com> ha scritto: > > > > > > Il giorno ven 13 nov 2020 alle ore 13:38 Anton Louw via Users < > users@ovirt.org> ha scritto: > > > > Hi Everybody, > > > > I have built a new host which has been running fine for the last couple of > days. I noticed today that the host crashed, but it is not giving me a > reason as to why. > > > > It happened at 13:45 today, but I have given time before that on the logs > as well. > > > > Is there something I am missing here? > > > > Not related to the crash, but I see in the logs that 5 out of 20 guests > have qemu guest agent not responding. > > > > Also you seem to have some issues with some firewalld rules. (Maybe +Dominik > Holler <dhol...@redhat.com> would like to have a look) > > > > I don't see anything explaining why the host got rebooted. > > > > Still related to guest agent I find a bit alarming the following lines: > > Nov 13 13:29:34 jb2-node03 libvirtd: 2020-11-13 11:29:34.294+0000: 12603: > error : qemuDomainAgentAvailable:9144 : Guest agent is not responding: QEMU > guest agent is not connected > Nov 13 13:29:34 jb2-node03 vdsm[13843]: ERROR Shutdown by QEMU Guest Agent > failed#012Traceback (most recent call last):#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5304, in > qemuGuestAgentShutdown#012 > self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in > f#012 ret = attr(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line > 131, in wrapper#012 ret = f(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in > wrapper#012 return func(inst, *args, **kwargs)#012 File > "/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in > shutdownFlags#012 if ret == -1: raise libvirtError > ('virDomainShutdownFlags() failed', dom=self)#012libvirtError: Guest agent > is not responding: QEMU guest agent is not connected > Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered > disabled state > Nov 13 13:29:42 jb2-node03 kernel: device vnet15 left promiscuous mode > Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered > disabled state > Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info> [1605266982.6539] > device (vnet15): state change: disconnected -> unmanaged (reason > 'unmanaged', sys-iface-state: 'removed') > Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info> [1605266982.6550] > device (vnet15): released from master device vlan0077 > Nov 13 13:29:42 jb2-node03 libvirtd: 2020-11-13 11:29:42.669+0000: 12557: > error : qemuMonitorIO:718 : internal error: End of file from qemu monitor > > > > +Arik Hadas <aha...@redhat.com> any clue? > > > > About the crash, can you please provide full sos report from the host? the > log you provided is not enough to understand what caused the reported crash > > > > Also, given python2 is used here, I assume you're on 4.3 or older. I would > recommend to upgrade to 4.4 as soon as practical. > > > > > > > > > > > > > > Thanks > > > > *Anton Louw* > > *Cloud Engineer: Storage and Virtualization* at *Vox* > ------------------------------ > > *T:* 087 805 0000 | *D:* 087 805 1572 > *M:* N/A > *E:* anton.l...@voxtelecom.co.za > *A:* Rutherford Estate, 1 Scott Street, Waverley, Johannesburg > www.vox.co.za > > > > [image: F] <https://www.facebook.com/voxtelecomZA> > > > > [image: T] <https://www.twitter.com/voxtelecom> > > > > [image: I] <https://www.instagram.com/voxtelecomza/> > > > > [image: L] <https://www.linkedin.com/company/voxtelecom> > > > > [image: Y] <https://www.youtube.com/user/VoxTelecom> > > > > > > [image: #VoxBrand] > <https://www.vox.co.za/fibre/fibre-to-the-home/?prod=HOME> > > > *Disclaimer* > > The contents of this email are confidential to the sender and the intended > recipient. Unless the contents are clearly and entirely of a personal > nature, they are subject to copyright in favour of the holding company of > the Vox group of companies. Any recipient who receives this email in error > should immediately report the error to the sender and permanently delete > this email from all storage devices. > > This email has been scanned for viruses and malware, and may have been > automatically archived by *Mimecast Ltd*, an innovator in Software as a > Service (SaaS) for business. Providing a *safer* and *more useful* place > for your human generated data. Specializing in; Security, archiving and > compliance. To find out more Click Here > <https://www.voxtelecom.co.za/security/mimecast/?prod=Enterprise>. > > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/XMRUDMRBYZKUJQXVPPAEAJIP7N3JPRLY/ > > > > > -- > > *Sandro Bonazzola* > > MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV > > Red Hat EMEA <https://www.redhat.com/> > > sbona...@redhat.com > > <https://www.redhat.com/> > > > *Red Hat respects your work life balance. Therefore there is no need to > answer this email out of your office hours. > <https://mojo.redhat.com/docs/DOC-1199578>* > > > > > > > -- > > *Sandro Bonazzola* > > MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV > > Red Hat EMEA <https://www.redhat.com/> > > sbona...@redhat.com > > <https://www.redhat.com/> > > > *Red Hat respects your work life balance. Therefore there is no need to > answer this email out of your office hours. > <https://mojo.redhat.com/docs/DOC-1199578>* > > > > -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbona...@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ALICE2DCOBLTDSDAGJEDM6KY36YKJZTS/