On Thu, Dec 30, 2021 at 8:02 PM Diggy Mc <d...@bornfree.org> wrote: > > I have oVirt Node v4.4.8.3 running on several HP ProLiant Gen8 servers. I > receive the following error under certain circumstances: > "An Unrecoverable System Error (NMI) has occurred (iLO application > watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)" > > When a host starts taking a load (but nowhere near a threshold), I > encounter the above iLO-logged error and the host locks-up. I have had to > grossly under-utilize my hosts to avoid this problem. I'm hoping for a > better fix or work-around. > > I've had the same problem beginning with my oVirt 4.3.x hosts, so it isn't > oVirt version specific. > > The little information I could find on the error wasn't helpful. Red Hat > acknowledges the issue, but limited to shutdown/reboot operations; not > during "normal" operations. > > Anyone else experienced this problem? How did you fix it or work around > it? I'd like to better utilize my servers if possible. > > In advance, thank you to anyone and everyone who offers help. > > NMI errors are usually hardware related or kernel / system related. (E.g. memory failure, hardware health check watchdog, etc) They are not oVirt related per-say.
That said, I'm seeing an HPE report with the same NMI service code. https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/Proliant-dl360p-gen8An-Unrecoverable-SystemError-NMI-has/td-p/7043891#.YdHHOduxUik - Gilboa
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MXADE3ZVXA3VNQISODECP5XQEBEUYA4Y/