Hi! In the last few days I am having problem with Hosted-Engine, it keeps restarting. Sometimes after few minutes, sometimes after few hours.. I haven't done any changes on oVirt or network in that time. Version is 4.4.10.7-1.el8. (this was also installation version)
Here are the logs: Agent.log------------------------------ MainThread::INFO::2022-08-21 09:48:36,200::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 2440) MainThread::INFO::2022-08-21 09:48:36,200::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Best remote host node3.ovirt.example.org (id: 3, score: 2440) MainThread::ERROR::2022-08-21 09:48:46,212::states::398::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host node3.ovirt.example.org (id 3) score is significantly better than local score, shutting down VM on this host MainThread::INFO::2022-08-21 09:48:46,641::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineUp-EngineStop) sent? ignored MainThread::INFO::2022-08-21 09:48:46,706::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStop (score: 3400) MainThread::INFO::2022-08-21 09:48:46,706::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Best remote host node3.ovirt.example.org (id: 3, score: 3400) MainThread::INFO::2022-08-21 09:48:56,714::hosted_engine::934::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) Shutting down vm using `/usr/sbin/hosted-engine --vm-shutdown` MainThread::INFO::2022-08-21 09:48:56,871::hosted_engine::941::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) stdout: MainThread::INFO::2022-08-21 09:48:56,871::hosted_engine::942::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) stderr: MainThread::ERROR::2022-08-21 09:48:56,871::hosted_engine::950::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) Engine VM stopped on localhost MainThread::INFO::2022-08-21 09:48:56,880::state_decorators::102::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Sun Aug 21 09:53:56 2022 while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStop'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineStop'> MainThread::INFO::2022-08-21 09:48:56,959::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStop (score: 3400) MainThread::INFO::2022-08-21 09:49:06,977::states::537::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm not running on local host MainThread::INFO::2022-08-21 09:49:06,983::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout cleared while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStop'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineDown'> MainThread::INFO::2022-08-21 09:49:07,173::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStop-EngineDown) sent? ignored MainThread::INFO::2022-08-21 09:49:07,795::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) MainThread::INFO::2022-08-21 09:49:16,811::states::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2022-08-21 09:49:16,998::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored MainThread::INFO::2022-08-21 09:49:17,179::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::INFO::2022-08-21 09:49:17,195::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2022-08-21 09:49:17,200::hosted_engine::915::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Cleaning state for non-running VM MainThread::INFO::2022-08-21 09:49:18,211::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean MainThread::INFO::2022-08-21 09:49:18,212::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2022-08-21 09:49:18,814::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: VM in WaitForLaunch MainThread::INFO::2022-08-21 09:49:18,814::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': 'cc7931ff-8124-4724-9242-abea2ab5bf42'} failed: (code=1, message=Virtual machine does not exist: {'vmId': 'cc7931ff-8124-4724-9242-abea2ab5bf42'}) MainThread::INFO::2022-08-21 09:49:18,814::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2022-08-21 09:49:18,999::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored MainThread::INFO::2022-08-21 09:49:19,008::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2022-08-21 09:49:29,027::states::741::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2022-08-21 09:49:29,033::state_decorators::102::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Sun Aug 21 09:59:29 2022 while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> MainThread::INFO::2022-08-21 09:49:29,109::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2022-08-21 09:49:38,121::states::741::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2022-08-21 09:49:38,195::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2022-08-21 09:49:48,218::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout cleared while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineUp'> MainThread::INFO::2022-08-21 09:49:48,403::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStarting-EngineUp) sent? ignored MainThread::INFO::2022-08-21 09:49:48,713::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400) MainThread::INFO::2022-08-21 09:49:58,725::states::406::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost Broker.log------------------------------ Thread-4::INFO::2022-08-21 09:47:59,342::cpu_load_no_engine::142::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) System load total=0.0241, engine=0.0013, non-engine=0.0228 Thread-3::INFO::2022-08-21 09:48:01,311::mem_free::51::mem_free.MemFree::(action) memFree: 96106 Thread-5::INFO::2022-08-21 09:48:05,612::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine Thread-2::INFO::2022-08-21 09:48:08,591::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt in up state Thread-1::WARNING::2022-08-21 09:48:10,352::network::121::network.Network::(_dns) DNS query failed: ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp ;; global options: +cmd ;; connection timed out; no servers could be reached Thread-1::WARNING::2022-08-21 09:48:10,352::network::92::network.Network::(action) Failed to verify network status, (4 out of 5) Thread-3::INFO::2022-08-21 09:48:11,389::mem_free::51::mem_free.MemFree::(action) memFree: 96089 Thread-5::INFO::2022-08-21 09:48:15,707::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine Thread-2::INFO::2022-08-21 09:48:18,662::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt in up state Thread-1::WARNING::2022-08-21 09:48:18,879::network::121::network.Network::(_dns) DNS query failed: ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp ;; global options: +cmd ;; connection timed out; no servers could be reached Thread-3::INFO::2022-08-21 09:48:21,467::mem_free::51::mem_free.MemFree::(action) memFree: 96072 Thread-1::WARNING::2022-08-21 09:48:24,904::network::121::network.Network::(_dns) DNS query failed: ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp ;; global options: +cmd ;; connection timed out; no servers could be reached Thread-5::INFO::2022-08-21 09:48:25,808::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine Thread-2::INFO::2022-08-21 09:48:28,740::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt in up state Thread-1::WARNING::2022-08-21 09:48:30,416::network::121::network.Network::(_dns) DNS query failed: ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp ;; global options: +cmd ;; connection timed out; no servers could be reached Thread-1::WARNING::2022-08-21 09:48:30,416::network::92::network.Network::(action) Failed to verify network status, (2 out of 5) Thread-3::INFO::2022-08-21 09:48:31,545::mem_free::51::mem_free.MemFree::(action) memFree: 96064 Thread-5::INFO::2022-08-21 09:48:35,909::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine Thread-1::WARNING::2022-08-21 09:48:35,940::network::121::network.Network::(_dns) DNS query failed: ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp ;; global options: +cmd ;; connection timed out; no servers could be reached Thread-1::WARNING::2022-08-21 09:48:37,480::network::92::network.Network::(action) Failed to verify network status, (4 out of 5) Thread-2::INFO::2022-08-21 09:48:38,809::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt in up state Thread-3::INFO::2022-08-21 09:48:41,623::mem_free::51::mem_free.MemFree::(action) memFree: 96014 Thread-1::INFO::2022-08-21 09:48:42,549::network::88::network.Network::(action) Successfully verified network status Thread-5::INFO::2022-08-21 09:48:46,011::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine Listener::ERROR::2022-08-21 09:48:46,639::notifications::42::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) (530, b'5.7.1 Authentication required', 'ale...@example.org.hr') At first I thought that it is related to this bugs. https://lists.ovirt.org/archives/list/users@ovirt.org/thread/2HTD5WR43M5MUTEDMM4HRFBADIXEQNB4/ https://bugzilla.redhat.com/show_bug.cgi?id=1984356 But in this oVirt version that bug should already be solved. I was trying to monitor network, but this error keeps happening even if network load is low. I did try to do continuous dig and ping commands on VM-s running on same host as Hosted Engine, and did not have any network problems, not even one connection drop. Any solutions or next steps i should try? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GXQUEKBJ5DPNMDJVMRFEE5OJNX2RBEZS/