Right, still down.  I've run virsh and it doesn't know anything about the
engine vm.

I've restarted the broker and agent services and I still get nothing in
virsh->list.

In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:

broker.log:

MainThread::INFO::2020-04-08
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started

MainThread::INFO::2020-04-08
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors

MainThread::INFO::2020-04-08
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network

MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine

MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge

MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network

MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load

MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health

MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge

MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine

MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load

MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free

MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain

MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain

MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free

MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health

MainThread::INFO::2020-04-08
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors

MainThread::INFO::2020-04-08
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage

MainThread::INFO::2020-04-08
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server

MainThread::INFO::2020-04-08
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server

MainThread::INFO::2020-04-08
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain

MainThread::WARNING::2020-04-08
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args
{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:

(code=350, message=Error in storage domain action:
(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))

MainThread::INFO::2020-04-08
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started

MainThread::INFO::2020-04-08
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors

agent.log:

MainThread::ERROR::2020-04-08
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent

MainThread::INFO::2020-04-08
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down

MainThread::INFO::2020-04-08
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.3.6 started

MainThread::INFO::2020-04-08
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt-node-01.phoelex.com

MainThread::INFO::2020-04-08
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection

MainThread::INFO::2020-04-08
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor network, options {'tcp_t_address': '', 'network_test':
'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}

MainThread::ERROR::2020-04-08
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors

MainThread::ERROR::2020-04-08
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Traceback (most recent call last):

  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent

    return action(he)

  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper

    return he.start_monitoring()

  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 432, in start_monitoring

    self._initialize_broker()

  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 556, in _initialize_broker

    m.get('options', {}))

  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 89, in start_monitor

    ).format(t=type, o=options, e=e)

RequestError: brokerlink - failed to start monitor via ovirt-ha-broker:
[Errno 2] No such file or directory, [monitor: 'network', options:
{'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr':
'192.168.1.99'}]


MainThread::ERROR::2020-04-08
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent

MainThread::INFO::2020-04-08
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down

On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86...@yahoo.com>
wrote:

> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" <
> mat...@ltresources.co.uk> wrote:
> >On the host you tried to restart the engine on:
> >
> >Add an alias to virsh (authenticates with virsh_auth.conf)
> >
> >alias virsh='virsh -c
> >qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >
> >Then run virsh:
> >
> >virsh
> >
> >virsh # list
> > Id    Name                           State
> >----------------------------------------------------
> > xx    HostedEngine                   Paused
> > xx    **********                     running
> > ...
> > xx     **********                     running
> >
> >HostedEngine should be in the list, try and resume the engine:
> >
> >virsh # resume HostedEngine
> >
> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shar...@jalloq.co.uk>
> >wrote:
> >
> >> Thanks!
> >>
> >> The status hangs due to, I guess, the VM being down....
> >>
> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start
> >> VM exists and is down, cleaning up and restarting
> >> VM in WaitForLaunch
> >>
> >> but this doesn't seem to do anything.  OK, after a while I get a
> >status of
> >> it being barfed...
> >>
> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==--
> >>
> >> conf_on_shared_storage             : True
> >> Status up-to-date                  : False
> >> Hostname                           : ovirt-node-00.phoelex.com
> >> Host ID                            : 1
> >> Engine status                      : unknown stale-data
> >> Score                              : 3400
> >> stopped                            : False
> >> Local maintenance                  : False
> >> crc32                              : 9c4a034b
> >> local_conf_timestamp               : 523362
> >> Host timestamp                     : 523608
> >> Extra metadata (valid at timestamp):
> >> metadata_parse_version=1
> >> metadata_feature_version=1
> >> timestamp=523608 (Wed Apr  8 16:17:11 2020)
> >> host-id=1
> >> score=3400
> >> vm_conf_refresh_time=523362 (Wed Apr  8 16:13:06 2020)
> >> conf_on_shared_storage=True
> >> maintenance=False
> >> state=EngineDown
> >> stopped=False
> >>
> >>
> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==--
> >>
> >> conf_on_shared_storage             : True
> >> Status up-to-date                  : True
> >> Hostname                           : ovirt-node-01.phoelex.com
> >> Host ID                            : 2
> >> Engine status                      : {"reason": "bad vm status",
> >"health":
> >> "bad", "vm": "down_unexpected", "detail": "Down"}
> >> Score                              : 0
> >> stopped                            : False
> >> Local maintenance                  : False
> >> crc32                              : 5045f2eb
> >> local_conf_timestamp               : 1737037
> >> Host timestamp                     : 1737283
> >> Extra metadata (valid at timestamp):
> >> metadata_parse_version=1
> >> metadata_feature_version=1
> >> timestamp=1737283 (Wed Apr  8 16:16:17 2020)
> >> host-id=2
> >> score=0
> >> vm_conf_refresh_time=1737037 (Wed Apr  8 16:12:11 2020)
> >> conf_on_shared_storage=True
> >> maintenance=False
> >> state=EngineUnexpectedlyDown
> >> stopped=False
> >>
> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett
> ><mat...@ltresources.co.uk>
> >> wrote:
> >>
> >>> First steps, on one of your hosts as root:
> >>>
> >>> To get information:
> >>> hosted-engine --vm-status
> >>>
> >>> To start the engine:
> >>> hosted-engine --vm-start
> >>>
> >>>
> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shar...@jalloq.co.uk>
> >wrote:
> >>>
> >>>> So my engine has gone down and I can't ssh into it either.  If I
> >try to
> >>>> log into the web-ui of the node it is running on, I get redirected
> >because
> >>>> the node can't reach the engine.
> >>>>
> >>>> What are my next steps?
> >>>>
> >>>> Shareef.
> >>>> _______________________________________________
> >>>> Users mailing list -- users@ovirt.org
> >>>> To unsubscribe send an email to users-le...@ovirt.org
> >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >>>> oVirt Code of Conduct:
> >>>> https://www.ovirt.org/community/about/community-guidelines/
> >>>> List Archives:
> >>>>
> >
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/
> >>>>
> >>>
>
> This has  to be resolved:
>
> Engine status                      : unknown stale-data
>
> Run again 'hosted-engine --vm-status'. If it remains the same, restart
> ovirt-ha-broker.service & ovirt-ha-agent.service
>
> Verify that the engine's storage is available. Then monitor the broker  &
> agent logs in /var/log/ovirt-hosted-engine-ha
>
> Best Regards,
> Strahil Nikolov
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VXQJNRBNBR4UYH3RFL4DPXA2CMKGYW3F/

Reply via email to