Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list. In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: broker.log: MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors agent.log: MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}] MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86...@yahoo.com> wrote: > On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > mat...@ltresources.co.uk> wrote: > >On the host you tried to restart the engine on: > > > >Add an alias to virsh (authenticates with virsh_auth.conf) > > > >alias virsh='virsh -c > >qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > > > >Then run virsh: > > > >virsh > > > >virsh # list > > Id Name State > >---------------------------------------------------- > > xx HostedEngine Paused > > xx ********** running > > ... > > xx ********** running > > > >HostedEngine should be in the list, try and resume the engine: > > > >virsh # resume HostedEngine > > > >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shar...@jalloq.co.uk> > >wrote: > > > >> Thanks! > >> > >> The status hangs due to, I guess, the VM being down.... > >> > >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >> VM exists and is down, cleaning up and restarting > >> VM in WaitForLaunch > >> > >> but this doesn't seem to do anything. OK, after a while I get a > >status of > >> it being barfed... > >> > >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >> > >> conf_on_shared_storage : True > >> Status up-to-date : False > >> Hostname : ovirt-node-00.phoelex.com > >> Host ID : 1 > >> Engine status : unknown stale-data > >> Score : 3400 > >> stopped : False > >> Local maintenance : False > >> crc32 : 9c4a034b > >> local_conf_timestamp : 523362 > >> Host timestamp : 523608 > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >> host-id=1 > >> score=3400 > >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=EngineDown > >> stopped=False > >> > >> > >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >> > >> conf_on_shared_storage : True > >> Status up-to-date : True > >> Hostname : ovirt-node-01.phoelex.com > >> Host ID : 2 > >> Engine status : {"reason": "bad vm status", > >"health": > >> "bad", "vm": "down_unexpected", "detail": "Down"} > >> Score : 0 > >> stopped : False > >> Local maintenance : False > >> crc32 : 5045f2eb > >> local_conf_timestamp : 1737037 > >> Host timestamp : 1737283 > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >> host-id=2 > >> score=0 > >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=EngineUnexpectedlyDown > >> stopped=False > >> > >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > ><mat...@ltresources.co.uk> > >> wrote: > >> > >>> First steps, on one of your hosts as root: > >>> > >>> To get information: > >>> hosted-engine --vm-status > >>> > >>> To start the engine: > >>> hosted-engine --vm-start > >>> > >>> > >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shar...@jalloq.co.uk> > >wrote: > >>> > >>>> So my engine has gone down and I can't ssh into it either. If I > >try to > >>>> log into the web-ui of the node it is running on, I get redirected > >because > >>>> the node can't reach the engine. > >>>> > >>>> What are my next steps? > >>>> > >>>> Shareef. > >>>> _______________________________________________ > >>>> Users mailing list -- users@ovirt.org > >>>> To unsubscribe send an email to users-le...@ovirt.org > >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>> oVirt Code of Conduct: > >>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>> List Archives: > >>>> > > > https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/ > >>>> > >>> > > This has to be resolved: > > Engine status : unknown stale-data > > Run again 'hosted-engine --vm-status'. If it remains the same, restart > ovirt-ha-broker.service & ovirt-ha-agent.service > > Verify that the engine's storage is available. Then monitor the broker & > agent logs in /var/log/ovirt-hosted-engine-ha > > Best Regards, > Strahil Nikolov >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VXQJNRBNBR4UYH3RFL4DPXA2CMKGYW3F/