Hi Sandro,

Thanks for the response.

If I upgrade my datacenter to 4.4.3, will I first need to upgrade my engine? I 
see my only options now in the datacenter is:

[cid:image007.jpg@01D6BE69.D16C5670]

Also, if the data center is upgraded, will it still be compatible with my other 
hosts, some running 4.3.3?

Thanks


Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.l...@voxtelecom.co.za

www.vox.co.za



From: Sandro Bonazzola <sbona...@redhat.com>
Sent: 19 November 2020 10:00
To: Anton Louw <anton.l...@voxtelecom.co.za>
Cc: Arik Hadas <aha...@redhat.com>; Dominik Holler <dhol...@redhat.com>; 
users@ovirt.org; Johan Koen <johan.k...@voxtelecom.co.za>
Subject: Re: [ovirt-users] oVirt Node Crash



Il giorno mar 17 nov 2020 alle ore 16:01 Anton Louw 
<anton.l...@voxtelecom.co.za<mailto:anton.l...@voxtelecom.co.za>> ha scritto:

Hi Sandro,

Have you perhaps seen anything in the SOS report that could shed some light on 
the issues?

Sadly no. I see it's oVirt Node 4.3.8, I can suggest to upgrade to 4.3.10 at 
least and consider upgrading to 4.4.3 the whole datacenter.
I had the feeling watchdog was the trigger of the reboot but couldn't find any 
evidence.
I also don't see anything suspicious in the logs.




Thanks


Anton Louw
Cloud Engineer: Storage and Virtualization at Vox
________________________________
T:  087 805 0000 | D: 087 805 1572
M: N/A
E: anton.l...@voxtelecom.co.za<mailto:anton.l...@voxtelecom.co.za>
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za<http://www.vox.co.za>

[F]<https://www.facebook.com/voxtelecomZA>

[T]<https://www.twitter.com/voxtelecom>

[I]<https://www.instagram.com/voxtelecomza/>

[L]<https://www.linkedin.com/company/voxtelecom>

[Y]<https://www.youtube.com/user/VoxTelecom>


From: Anton Louw
Sent: 16 November 2020 07:30
To: Sandro Bonazzola <sbona...@redhat.com<mailto:sbona...@redhat.com>>; Arik 
Hadas <aha...@redhat.com<mailto:aha...@redhat.com>>; Dominik Holler 
<dhol...@redhat.com<mailto:dhol...@redhat.com>>
Cc: users@ovirt.org<mailto:users@ovirt.org>; Johan Koen 
<johan.k...@voxtelecom.co.za<mailto:johan.k...@voxtelecom.co.za>>
Subject: RE: [ovirt-users] oVirt Node Crash

I have also attached the SOS report as requested

From: Anton Louw
Sent: 16 November 2020 06:54
To: Sandro Bonazzola <sbona...@redhat.com<mailto:sbona...@redhat.com>>; Arik 
Hadas <aha...@redhat.com<mailto:aha...@redhat.com>>; Dominik Holler 
<dhol...@redhat.com<mailto:dhol...@redhat.com>>
Cc: users@ovirt.org<mailto:users@ovirt.org>; Johan Koen 
<johan.k...@voxtelecom.co.za<mailto:johan.k...@voxtelecom.co.za>>
Subject: RE: [ovirt-users] oVirt Node Crash

Hi Sandro,

Thanks for the response. I logged onto oVirt this morning, and I see the node 
is in a “Unassigned” state. I can ping it, but cannot SSH, so there is 
something that is causing the host to be unresponsive.

On Saturday after I sent the mail, I opened a console to the node, and I saw 
the below entries before logging in:

audit:backlog limit exceeded

I the tried the solution of increasing the buffer size in the audit.rules file 
in /etc/audit/rules.d/ , as per below, but it did not resolve the issue.

## First rule - delete all
-D

## Increase the buffers to survive stress events.
## Make this bigger for busy systems
-b 8192

## Set failure mode to syslog
-f 1

Is it possible to upgrade the node to 4.4 while the engine is still on 4.3?

Thanks

From: Sandro Bonazzola <sbona...@redhat.com<mailto:sbona...@redhat.com>>
Sent: 13 November 2020 18:39
To: Anton Louw 
<anton.l...@voxtelecom.co.za<mailto:anton.l...@voxtelecom.co.za>>; Arik Hadas 
<aha...@redhat.com<mailto:aha...@redhat.com>>; Dominik Holler 
<dhol...@redhat.com<mailto:dhol...@redhat.com>>
Cc: users@ovirt.org<mailto:users@ovirt.org>; Johan Koen 
<johan.k...@voxtelecom.co.za<mailto:johan.k...@voxtelecom.co.za>>
Subject: Re: [ovirt-users] oVirt Node Crash



Il giorno ven 13 nov 2020 alle ore 17:37 Sandro Bonazzola 
<sbona...@redhat.com<mailto:sbona...@redhat.com>> ha scritto:


Il giorno ven 13 nov 2020 alle ore 13:38 Anton Louw via Users 
<users@ovirt.org<mailto:users@ovirt.org>> ha scritto:

Hi Everybody,

I have built a new host which has been running fine for the last couple of 
days. I noticed today that the host crashed, but it is not giving me a reason 
as to why.

It happened at 13:45 today, but I have given time before that on the logs as 
well.

Is there something I am missing here?

Not related to the crash, but I see in the logs that 5 out of 20 guests have 
qemu guest agent not responding.

Also you seem to have some issues with some firewalld rules. (Maybe +Dominik 
Holler<mailto:dhol...@redhat.com> would like to have a look)

I don't see anything explaining why the host got rebooted.

Still related to guest agent I find a bit alarming the following lines:
Nov 13 13:29:34 jb2-node03 libvirtd: 2020-11-13 11:29:34.294+0000: 12603: error 
: qemuDomainAgentAvailable:9144 : Guest agent is not responding: QEMU guest 
agent is not connected
Nov 13 13:29:34 jb2-node03 vdsm[13843]: ERROR Shutdown by QEMU Guest Agent 
failed#012Traceback (most recent call last):#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5304, in 
qemuGuestAgentShutdown#012    
self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f#012   
 ret = attr(*args, **kwargs)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, 
in wrapper#012    ret = f(*args, **kwargs)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in 
wrapper#012    return func(inst, *args, **kwargs)#012  File 
"/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in 
shutdownFlags#012    if ret == -1: raise libvirtError 
('virDomainShutdownFlags() failed', dom=self)#012libvirtError: Guest agent is 
not responding: QEMU guest agent is not connected
Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered disabled 
state
Nov 13 13:29:42 jb2-node03 kernel: device vnet15 left promiscuous mode
Nov 13 13:29:42 jb2-node03 kernel: vlan0077: port 11(vnet15) entered disabled 
state
Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info>  [1605266982.6539] 
device (vnet15): state change: disconnected -> unmanaged (reason 'unmanaged', 
sys-iface-state: 'removed')
Nov 13 13:29:42 jb2-node03 NetworkManager[6027]: <info>  [1605266982.6550] 
device (vnet15): released from master device vlan0077
Nov 13 13:29:42 jb2-node03 libvirtd: 2020-11-13 11:29:42.669+0000: 12557: error 
: qemuMonitorIO:718 : internal error: End of file from qemu monitor

+Arik Hadas<mailto:aha...@redhat.com> any clue?

About the crash, can you please provide full sos report from the host? the log 
you provided is not enough to understand what caused the reported crash

Also, given python2 is used here, I assume you're on 4.3 or older. I would 
recommend to upgrade to 4.4 as soon as practical.






Thanks

Anton Louw
Cloud Engineer: Storage and Virtualization at Vox
________________________________
T:  087 805 0000 | D: 087 805 1572
M: N/A
E: anton.l...@voxtelecom.co.za<mailto:anton.l...@voxtelecom.co.za>
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
www.vox.co.za<http://www.vox.co.za>

[F]<https://www.facebook.com/voxtelecomZA>

[T]<https://www.twitter.com/voxtelecom>

[I]<https://www.instagram.com/voxtelecomza/>

[L]<https://www.linkedin.com/company/voxtelecom>

[Y]<https://www.youtube.com/user/VoxTelecom>


[#VoxBrand]<https://www.vox.co.za/fibre/fibre-to-the-home/?prod=HOME>

Disclaimer

The contents of this email are confidential to the sender and the intended 
recipient. Unless the contents are clearly and entirely of a personal nature, 
they are subject to copyright in favour of the holding company of the Vox group 
of companies. Any recipient who receives this email in error should immediately 
report the error to the sender and permanently delete this email from all 
storage devices.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more Click 
Here<https://www.voxtelecom.co.za/security/mimecast/?prod=Enterprise>.

_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: 
https://www.ovirt.org/privacy-policy.html<https://www.ovirt.org/privacy-policy.html>
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/<https://www.ovirt.org/community/about/community-guidelines/>
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XMRUDMRBYZKUJQXVPPAEAJIP7N3JPRLY/<https://lists.ovirt.org/archives/list/users@ovirt.org/message/XMRUDMRBYZKUJQXVPPAEAJIP7N3JPRLY/>


--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA<https://www.redhat.com/>

sbona...@redhat.com<mailto:sbona...@redhat.com>
[https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]<https://www.redhat.com/>
Red Hat respects your work life balance. Therefore there is no need to answer 
this email out of your office hours.
<https://mojo.redhat.com/docs/DOC-1199578>



--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA<https://www.redhat.com/>

sbona...@redhat.com<mailto:sbona...@redhat.com>
[https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]<https://www.redhat.com/>
Red Hat respects your work life balance. Therefore there is no need to answer 
this email out of your office hours.
<https://mojo.redhat.com/docs/DOC-1199578>




--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA<https://www.redhat.com/>

sbona...@redhat.com<mailto:sbona...@redhat.com>
[https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]<https://www.redhat.com/>
Red Hat respects your work life balance. Therefore there is no need to answer 
this email out of your office hours.
<https://mojo.redhat.com/docs/DOC-1199578>



_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/72EPXMTPM4ZDWLRZQ2N3WW2AXWGZ2IQL/

Reply via email to