Hmmm, we're not using ipv6.  Is that the issue?

On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86...@yahoo.com>
wrote:

> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq <
> shar...@jalloq.co.uk> wrote:
> >Right, I've given up on recovering the HE so want to try and redeploy
> >it.
> >There doesn't seem to be enough information to debug why the
> >broker/agent
> >won't start cleanly.
> >
> >In running 'hosted-engine --deploy', I'm seeing the following error in
> >the
> >setup validation phase:
> >
> >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human
> >dialog.__logString:204 DIALOG:SEND                 Please provide the
> >hostname of this host on the management network
> >[ovirt-node-00.phoelex.com]:
> >
> >
> >2020-04-14 09:46:12,831+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge
> >hostname.getResolvedAddresses:432
> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
> >
> >2020-04-14 09:46:12,832+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge
> >hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com
> >resolves
> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
> >
> >2020-04-14 09:46:12,832+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
> >execute:
> >['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com',
> >'ANY'],
> >executable='None', cwd='None', env=None
> >
> >2020-04-14 09:46:12,871+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
> >execute-result: ['/usr/bin/dig', '+noall', '+answer', '
> >ovirt-node-00.phoelex.com', 'ANY'], rc=0
> >
> >2020-04-14 09:46:12,872+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921
> >execute-output: ['/usr/bin/dig', '+noall', '+answer', '
> >ovirt-node-00.phoelex.com', 'ANY'] stdout:
> >
> >ovirt-node-00.phoelex.com. 86400 IN     A       192.168.1.61
> >
> >
> >2020-04-14 09:46:12,872+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926
> >execute-output: ['/usr/bin/dig', '+noall', '+answer', '
> >ovirt-node-00.phoelex.com', 'ANY'] stderr:
> >
> >
> >
> >2020-04-14 09:46:12,872+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
> >execute:
> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
> >
> >2020-04-14 09:46:12,876+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0
> >
> >2020-04-14 09:46:12,876+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921
> >execute-output: ('/usr/sbin/ip', 'addr') stdout:
> >
> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >group
> >default qlen 1000
> >
> >    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >
> >    inet 127.0.0.1/8 scope host lo
> >
> >       valid_lft forever preferred_lft forever
> >
> >    inet6 ::1/128 scope host
> >
> >       valid_lft forever preferred_lft forever
> >
> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> >ovirtmgmt state UP group default qlen 1000
> >
> >    link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
> >
> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> >DOWN
> >group default qlen 1000
> >
> >    link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
> >
> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >group
> >default qlen 1000
> >
> >    link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
> >
> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> >default qlen 1000
> >
> >    link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
> >
> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> >state UP group default qlen 1000
> >
> >    link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
> >
> >    inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
> >
> >       valid_lft forever preferred_lft forever
> >
> >    inet6 fe80::ae1f:6bff:febc:326a/64 scope link
> >
> >       valid_lft forever preferred_lft forever
> >
> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >group
> >default qlen 1000
> >
> >    link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
> >
> >
> >2020-04-14 09:46:12,876+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926
> >execute-output: ('/usr/sbin/ip', 'addr') stderr:
> >
> >
> >
> >2020-04-14 09:46:12,877+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge
> >hostname.getLocalAddresses:251
> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
> >
> >2020-04-14 09:46:12,877+0000 DEBUG
> >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464
> >test_hostname exception
> >
> >Traceback (most recent call last):
> >
> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
> >line
> >460, in test_hostname
> >
> >    not_local_text,
> >
> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
> >line
> >342, in _validateFQDNresolvability
> >
> >    addresses=resolvedAddressesAsString
> >
> >RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d
> >192.168.1.61 and not all of them can be mapped to non loopback devices
> >on
> >this host
> >
> >2020-04-14 09:46:12,884+0000 ERROR
> >otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host
> >name
> >is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d
> >192.168.1.61 and not all of them can be mapped to non loopback devices
> >on
> >this host
> >
> >The node I'm running on has an IP address of .61 and resolves
> >correctly.
> >
> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shar...@jalloq.co.uk>
> >wrote:
> >
> >> Where should I be checking if there are any files/folder not owned by
> >> vdsm:kvm?  I checked on the mount the HA sits on and it's fine.
> >>
> >> How would I go about checking vdsm can access those images?  If I run
> >> virsh, it lists them and they were running yesterday even though the
> >HA was
> >> down.  I've since restarted both hosts but the broker is still
> >spitting out
> >> the same error (copied below).  How do I find the reason the broker
> >can't
> >> connect to the storage?  The conf file is already at DEBUG verbosity:
> >>
> >> [handler_logfile]
> >>
> >> class=logging.handlers.TimedRotatingFileHandler
> >>
> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
> >>
> >> level=DEBUG
> >>
> >> formatter=long
> >>
> >> And what are all these .prob-<num> files that are being created?
> >There
> >> are over 250K of them now on the mount I'm using for the Data domain.
> >> They're all of 0 size and of the form,
> >> /rhev/data-center/mnt/nas-01.phoelex.com:
> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
> >>
> >> @eevans:  The volume I have the Data Domain on has TB's free.  The HA
> >is
> >> dead so I can't ssh in.  No idea what started these errors and the
> >other
> >> VMs were still running happily although they're on a different Data
> >Domain.
> >>
> >> Shareef.
> >>
> >> MainThread::INFO::2020-04-10
> >>
>
> >07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> Connecting the storage
> >>
> >> MainThread::INFO::2020-04-10
> >>
>
> >07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> Connecting storage server
> >>
> >> MainThread::INFO::2020-04-10
> >>
>
> >07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> Connecting storage server
> >>
> >> MainThread::INFO::2020-04-10
> >>
>
> >07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> Refreshing the storage domain
> >>
> >> MainThread::WARNING::2020-04-10
> >>
>
> >07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> Can't connect vdsm storage: Command StorageDomain.getInfo with args
> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
> >>
> >> (code=350, message=Error in storage domain action:
> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >>
> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov
> ><hunter86...@yahoo.com>
> >> wrote:
> >>
> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq <
> >>> shar...@jalloq.co.uk> wrote:
> >>> >OK, let's go through this.  I'm looking at the node that at least
> >still
> >>> >has
> >>> >some VMs running.  virsh also tells me that the HostedEngine VM is
> >>> >running
> >>> >but it's unresponsive and I can't shut it down.
> >>> >
> >>> >1. All storage domains exist and are mounted.
> >>> >2. The ha_agent exists:
> >>> >
> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls
> >/rhev/data-center/mnt/
> >>> >nas-01.phoelex.com
> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
> >>> >
> >>> >dom_md  ha_agent  images  master
> >>> >
> >>> >3.  There are two links
> >>> >
> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll
> >/rhev/data-center/mnt/
> >>> >nas-01.phoelex.com
> >>> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
> >>> >
> >>> >total 8
> >>> >
> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr  2 14:50 hosted-engine.lockspace ->
> >>>
> >>>
>
> >>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
> >>> >
> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr  2 14:50 hosted-engine.metadata ->
> >>>
> >>>
>
> >>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
> >>> >
> >>> >4. The services exist but all seem to have some sort of warning:
> >>> >
> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]:
> >*2020-04-08
> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
> >>> >
> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]:
> >*failed
> >>> >to
> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object
> >file:
> >>> >No
> >>> >such file or directory*
> >>> >
> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR
> >failed
> >>> >to
> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or
> >directory'Is
> >>> >the
> >>> >Hosted Engine setup finished?*
> >>> >
> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]:
> >2020-04-08
> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 :
> >cannot
> >>> >parse
> >>> >process status data
> >>> >
> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]:
> >2020-04-08
> >>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 :
> >>> >internal
> >>> >error: /proc/net/dev: Interface not found
> >>> >
> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]:
> >2020-04-08
> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End
> >of
> >>> >file
> >>> >while reading data: Input/output error
> >>> >
> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]:
> >2020-04-09
> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End
> >of
> >>> >file
> >>> >while reading data: Input/output error
> >>> >
> >>> >5 & 6.  The broker log is continually printing this error:
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>> >ovirt-hosted-engine-ha broker 2.3.6 started
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>> >Running broker
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
> >>> >Starting monitor
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Searching for submonitors in
> >>> >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
> >>> >
> >>> >/submonitors
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor network
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load-no-engine
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mgmt-bridge
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor network
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor engine-health
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mgmt-bridge
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load-no-engine
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mem-free
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor storage-domain
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor storage-domain
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mem-free
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor engine-health
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Finished loading submonitors
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
> >>> >Starting storage broker
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >>> >Connecting to VDSM
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
> >>> >Creating a new json-rpc connection to VDSM
> >>> >
> >>> >Client localhost:54321::DEBUG::2020-04-09
> >>> >08:07:31,453::concurrent::258::root::(run) START thread
> ><Thread(Client
> >>> >localhost:54321, started daemon 139992488138496)> (func=<bound
> >method
> >>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor
> >object at
> >>> >0x7f528acabc90>>, args=(), kwargs={})
> >>> >
> >>> >Client localhost:54321::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
> >>> >Stomp connection established
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>> >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >Sending
> >>> >response
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >>> >Connecting the storage
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Connecting storage server
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>> >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >Sending
> >>> >response
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>> >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >Sending
> >>> >response
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not
> >available
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Connecting storage server
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>> >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >Sending
> >>> >response
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
> >>> >
> >>> >MainThread::INFO::2020-04-09
> >>>
> >>>
>
> >>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Refreshing the storage domain
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>> >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >Sending
> >>> >response
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Error refreshing storage domain: Command StorageDomain.getStats
> >with
> >>> >args
> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
> >>> >
> >>> >(code=350, message=Error in storage domain action:
> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>> >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >Sending
> >>> >response
> >>> >
> >>> >MainThread::DEBUG::2020-04-09
> >>>
> >>>
>
> >>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
> >>> >Command StorageDomain.getInfo with args {'storagedomainID':
> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
> >>> >
> >>> >(code=350, message=Error in storage domain action:
> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >>> >
> >>> >MainThread::WARNING::2020-04-09
> >>>
> >>>
>
> >>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo with args
> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
> >>> >
> >>> >(code=350, message=Error in storage domain action:
> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >>> >
> >>> >
> >>> >The UUID it is moaning about is indeed the one that the HA sits on
> >and
> >>> >is
> >>> >the one I listed the contents of in step 2 above.
> >>> >
> >>> >
> >>> >So why can't it see this domain?
> >>> >
> >>> >
> >>> >Thanks, Shareef.
> >>> >
> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov
> ><hunter86...@yahoo.com>
> >>> >wrote:
> >>> >
> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq <
> >>> >> shar...@jalloq.co.uk> wrote:
> >>> >> >Don't know if this is useful or not, but I just tried to
> >shutdown
> >>> >and
> >>> >> >start
> >>> >> >another VM on one of the hosts and get the following error:
> >>> >> >
> >>> >> >virsh # start scratch
> >>> >> >
> >>> >> >error: Failed to start domain scratch
> >>> >> >
> >>> >> >error: Network not found: no network with matching name
> >>> >> >'vdsm-ovirtmgmt'
> >>> >> >
> >>> >> >Is this not referring to the interface name as the network is
> >called
> >>> >> >'ovirtmgnt'.
> >>> >> >
> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq
> >>> ><shar...@jalloq.co.uk>
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come up
> >and
> >>> >the
> >>> >> >> agent.log is full of the same errors.
> >>> >> >>
> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq
> >>> ><shar...@jalloq.co.uk>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> Ah hah!  Ok, so I've managed to start it using virsh on the
> >>> >second
> >>> >> >host
> >>> >> >>> but my first host is still dead.
> >>> >> >>>
> >>> >> >>> First of all, what are these 56,317 .prob- files that get
> >dumped
> >>> >to
> >>> >> >the
> >>> >> >>> NFS mounts?
> >>> >> >>>
> >>> >> >>> Secondly, why doesn't the node mount the NFS directories at
> >boot?
> >>> >> >Is
> >>> >> >>> that the issue with this particular node?
> >>> >> >>>
> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eev...@digitaldatatechs.com>
> >>> >wrote:
> >>> >> >>>
> >>> >> >>>> Did you try virsh list --inactive
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> Eric Evans
> >>> >> >>>>
> >>> >> >>>> Digital Data Services LLC.
> >>> >> >>>>
> >>> >> >>>> 304.660.9080
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> *From:* Shareef Jalloq <shar...@jalloq.co.uk>
> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM
> >>> >> >>>> *To:* Strahil Nikolov <hunter86...@yahoo.com>
> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org>
> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how
> >to
> >>> >> >rescue?
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> I've now shut down the VMs on one host and rebooted it but
> >the
> >>> >> >agent
> >>> >> >>>> service doesn't start.  If I run 'hosted-engine --vm-status'
> >I
> >>> >get:
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> The hosted engine configuration has not been retrieved from
> >>> >shared
> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and
> >the
> >>> >> >storage
> >>> >> >>>> server is reachable.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt,
> >>> >only
> >>> >> >one of
> >>> >> >>>> the directories is mounted.  I have 3 NFS mounts, one ISO
> >Domain
> >>> >> >and two
> >>> >> >>>> Data Domains.  Only one Data Domain has mounted and this has
> >>> >lots
> >>> >> >of .prob
> >>> >> >>>> files in.  So why haven't the other NFS exports been
> >mounted?
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> Manually mounting them doesn't seem to have helped much
> >either.
> >>> >I
> >>> >> >can
> >>> >> >>>> start the broker service but the agent service says no.
> >Same
> >>> >error
> >>> >> >as the
> >>> >> >>>> one in my last email.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> Shareef.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq
> >>> >> ><shar...@jalloq.co.uk>
> >>> >> >>>> wrote:
> >>> >> >>>>
> >>> >> >>>> Right, still down.  I've run virsh and it doesn't know
> >anything
> >>> >> >about
> >>> >> >>>> the engine vm.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> I've restarted the broker and agent services and I still get
> >>> >> >nothing in
> >>> >> >>>> virsh->list.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots
> >of
> >>> >> >errors:
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> broker.log:
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Searching for submonitors in
> >>> >> >>>>
> >>> >>
> >>>
> >>>
>
> >>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor network
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor cpu-load-no-engine
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor mgmt-bridge
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor network
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor cpu-load
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor engine-health
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor mgmt-bridge
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor cpu-load-no-engine
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor cpu-load
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor mem-free
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor storage-domain
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor storage-domain
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor mem-free
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Loaded submonitor engine-health
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Finished loading submonitors
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >>> >> >>>> Connecting the storage
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >> >>>> Connecting storage server
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >> >>>> Connecting storage server
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >> >>>> Refreshing the storage domain
> >>> >> >>>>
> >>> >> >>>> MainThread::WARNING::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo
> >with
> >>> >args
> >>> >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
> >>> >failed:
> >>> >> >>>>
> >>> >> >>>> (code=350, message=Error in storage domain action:
> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >> >>>> Searching for submonitors in
> >>> >> >>>>
> >>> >>
> >>>
> >>>
>
> >>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> agent.log:
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> MainThread::ERROR::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >>> >> >>>> Trying to restart agent
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>>
> >>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>> >> >>>> Agent shutting down
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>>
> >>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> >>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >>> >> >>>> Initializing ha-broker connection
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> >>> >> >>>> Starting monitor network, options {'tcp_t_address': '',
> >>> >> >'network_test':
> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
> >>> >> >>>>
> >>> >> >>>> MainThread::ERROR::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >>> >> >>>> Failed to start necessary monitors
> >>> >> >>>>
> >>> >> >>>> MainThread::ERROR::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >>> >> >>>> Traceback (most recent call last):
> >>> >> >>>>
> >>> >> >>>>   File
> >>> >> >>>>
> >>> >>
> >>>
> >>>
>
> >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >>> >> >>>> line 131, in _run_agent
> >>> >> >>>>
> >>> >> >>>>     return action(he)
> >>> >> >>>>
> >>> >> >>>>   File
> >>> >> >>>>
> >>> >>
> >>>
> >>>
>
> >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >>> >> >>>> line 55, in action_proper
> >>> >> >>>>
> >>> >> >>>>     return he.start_monitoring()
> >>> >> >>>>
> >>> >> >>>>   File
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >>> >> >>>> line 432, in start_monitoring
> >>> >> >>>>
> >>> >> >>>>     self._initialize_broker()
> >>> >> >>>>
> >>> >> >>>>   File
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >>> >> >>>> line 556, in _initialize_broker
> >>> >> >>>>
> >>> >> >>>>     m.get('options', {}))
> >>> >> >>>>
> >>> >> >>>>   File
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >>> >> >>>> line 89, in start_monitor
> >>> >> >>>>
> >>> >> >>>>     ).format(t=type, o=options, e=e)
> >>> >> >>>>
> >>> >> >>>> RequestError: brokerlink - failed to start monitor via
> >>> >> >ovirt-ha-broker:
> >>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network',
> >>> >options:
> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port':
> >'',
> >>> >> >'addr':
> >>> >> >>>> '192.168.1.99'}]
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> MainThread::ERROR::2020-04-08
> >>> >> >>>>
> >>> >>
> >>> >>
> >>>
> >>>
>
> >>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >>> >> >>>> Trying to restart agent
> >>> >> >>>>
> >>> >> >>>> MainThread::INFO::2020-04-08
> >>> >> >>>>
> >>> >>
> >>>
> >>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>> >> >>>> Agent shutting down
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov
> >>> >> ><hunter86...@yahoo.com>
> >>> >> >>>> wrote:
> >>> >> >>>>
> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" <
> >>> >> >>>> mat...@ltresources.co.uk> wrote:
> >>> >> >>>> >On the host you tried to restart the engine on:
> >>> >> >>>> >
> >>> >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf)
> >>> >> >>>> >
> >>> >> >>>> >alias virsh='virsh -c
> >>> >> >>>>
> >>> >>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >>> >> >>>> >
> >>> >> >>>> >Then run virsh:
> >>> >> >>>> >
> >>> >> >>>> >virsh
> >>> >> >>>> >
> >>> >> >>>> >virsh # list
> >>> >> >>>> > Id    Name                           State
> >>> >> >>>> >----------------------------------------------------
> >>> >> >>>> > xx    HostedEngine                   Paused
> >>> >> >>>> > xx    **********                     running
> >>> >> >>>> > ...
> >>> >> >>>> > xx     **********                     running
> >>> >> >>>> >
> >>> >> >>>> >HostedEngine should be in the list, try and resume the
> >engine:
> >>> >> >>>> >
> >>> >> >>>> >virsh # resume HostedEngine
> >>> >> >>>> >
> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq
> >>> ><shar...@jalloq.co.uk>
> >>> >> >>>> >wrote:
> >>> >> >>>> >
> >>> >> >>>> >> Thanks!
> >>> >> >>>> >>
> >>> >> >>>> >> The status hangs due to, I guess, the VM being down....
> >>> >> >>>> >>
> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start
> >>> >> >>>> >> VM exists and is down, cleaning up and restarting
> >>> >> >>>> >> VM in WaitForLaunch
> >>> >> >>>> >>
> >>> >> >>>> >> but this doesn't seem to do anything.  OK, after a while
> >I
> >>> >get a
> >>> >> >>>> >status of
> >>> >> >>>> >> it being barfed...
> >>> >> >>>> >>
> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==--
> >>> >> >>>> >>
> >>> >> >>>> >> conf_on_shared_storage             : True
> >>> >> >>>> >> Status up-to-date                  : False
> >>> >> >>>> >> Hostname                           :
> >>> >ovirt-node-00.phoelex.com
> >>> >> >>>> >> Host ID                            : 1
> >>> >> >>>> >> Engine status                      : unknown stale-data
> >>> >> >>>> >> Score                              : 3400
> >>> >> >>>> >> stopped                            : False
> >>> >> >>>> >> Local maintenance                  : False
> >>> >> >>>> >> crc32                              : 9c4a034b
> >>> >> >>>> >> local_conf_timestamp               : 523362
> >>> >> >>>> >> Host timestamp                     : 523608
> >>> >> >>>> >> Extra metadata (valid at timestamp):
> >>> >> >>>> >> metadata_parse_version=1
> >>> >> >>>> >> metadata_feature_version=1
> >>> >> >>>> >> timestamp=523608 (Wed Apr  8 16:17:11 2020)
> >>> >> >>>> >> host-id=1
> >>> >> >>>> >> score=3400
> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr  8 16:13:06 2020)
> >>> >> >>>> >> conf_on_shared_storage=True
> >>> >> >>>> >> maintenance=False
> >>> >> >>>> >> state=EngineDown
> >>> >> >>>> >> stopped=False
> >>> >> >>>> >>
> >>> >> >>>> >>
> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==--
> >>> >> >>>> >>
> >>> >> >>>> >> conf_on_shared_storage             : True
> >>> >> >>>> >> Status up-to-date                  : True
> >>> >> >>>> >> Hostname                           :
> >>> >ovirt-node-01.phoelex.com
> >>> >> >>>> >> Host ID                            : 2
> >>> >> >>>> >> Engine status                      : {"reason": "bad vm
> >>> >status",
> >>> >> >>>> >"health":
> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"}
> >>> >> >>>> >> Score                              : 0
> >>> >> >>>> >> stopped                            : False
> >>> >> >>>> >> Local maintenance                  : False
> >>> >> >>>> >> crc32                              : 5045f2eb
> >>> >> >>>> >> local_conf_timestamp               : 1737037
> >>> >> >>>> >> Host timestamp                     : 1737283
> >>> >> >>>> >> Extra metadata (valid at timestamp):
> >>> >> >>>> >> metadata_parse_version=1
> >>> >> >>>> >> metadata_feature_version=1
> >>> >> >>>> >> timestamp=1737283 (Wed Apr  8 16:16:17 2020)
> >>> >> >>>> >> host-id=2
> >>> >> >>>> >> score=0
> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr  8 16:12:11 2020)
> >>> >> >>>> >> conf_on_shared_storage=True
> >>> >> >>>> >> maintenance=False
> >>> >> >>>> >> state=EngineUnexpectedlyDown
> >>> >> >>>> >> stopped=False
> >>> >> >>>> >>
> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett
> >>> >> >>>> ><mat...@ltresources.co.uk>
> >>> >> >>>> >> wrote:
> >>> >> >>>> >>
> >>> >> >>>> >>> First steps, on one of your hosts as root:
> >>> >> >>>> >>>
> >>> >> >>>> >>> To get information:
> >>> >> >>>> >>> hosted-engine --vm-status
> >>> >> >>>> >>>
> >>> >> >>>> >>> To start the engine:
> >>> >> >>>> >>> hosted-engine --vm-start
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq
> >>> >> ><shar...@jalloq.co.uk>
> >>> >> >>>> >wrote:
> >>> >> >>>> >>>
> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it
> >either.
> >>> >If
> >>> >> >I
> >>> >> >>>> >try to
> >>> >> >>>> >>>> log into the web-ui of the node it is running on, I get
> >>> >> >redirected
> >>> >> >>>> >because
> >>> >> >>>> >>>> the node can't reach the engine.
> >>> >> >>>> >>>>
> >>> >> >>>> >>>> What are my next steps?
> >>> >> >>>> >>>>
> >>> >> >>>> >>>> Shareef.
> >>> >> >>>> >>>> _______________________________________________
> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org
> >>> >> >>>> >>>> To unsubscribe send an email to users-le...@ovirt.org
> >>> >> >>>> >>>> Privacy Statement:
> >>> >https://www.ovirt.org/privacy-policy.html
> >>> >> >>>> >>>> oVirt Code of Conduct:
> >>> >> >>>> >>>>
> >https://www.ovirt.org/community/about/community-guidelines/
> >>> >> >>>> >>>> List Archives:
> >>> >> >>>> >>>>
> >>> >> >>>> >
> >>> >> >>>>
> >>> >> >
> >>> >>
> >>> >
> >>>
> >
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/
> >>> >> >>>> >>>>
> >>> >> >>>> >>>
> >>> >> >>>>
> >>> >> >>>> This has  to be resolved:
> >>> >> >>>>
> >>> >> >>>> Engine status                      : unknown stale-data
> >>> >> >>>>
> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the
> >same,
> >>> >> >restart
> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service
> >>> >> >>>>
> >>> >> >>>> Verify that the engine's storage is available. Then monitor
> >the
> >>> >> >broker
> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha
> >>> >> >>>>
> >>> >> >>>> Best Regards,
> >>> >> >>>> Strahil Nikolov
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >>
> >>> >> Hi Shareef,
> >>> >>
> >>> >> The flow of activation oVirt is more complex than a plain KVM.
> >>> >> Mounting of the domains happen during the activation of the node
> >(
> >>> >the
> >>> >> HostedEngine is activating everything needed).
> >>> >>
> >>> >> Focus on the HostedEngine VM.
> >>> >> Is it running properly ?
> >>> >>
> >>> >> If not,try:
> >>> >> 1. Verify that the storage domain exists
> >>> >> 2. Check if  it has 'ha_agents' directory
> >>> >> 3. Check if the links are  OK, if not you can safely remove the
> >links
> >>> >>
> >>> >> 4. Next check the services are running:
> >>> >> A) sanlock
> >>> >> B) supervdsmd
> >>> >> C) vdsmd
> >>> >> D) libvirtd
> >>> >>
> >>> >> 5. Increase the log level for broker  and agent services:
> >>> >>
> >>> >> cd  /etc/ovirt-hosted-engine-ha
> >>> >> vim *-log.conf
> >>> >>
> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent
> >>> >>
> >>> >> 6. Check what they are complaining about
> >>> >> Keep in mind that agent will keep throwing errors  untill the
> >broker
> >>> >stops
> >>> >> doing it (agent depends  on broker),  so broker must be OK before
> >>> >> peoceeding with the agent log.
> >>> >>
> >>> >> About the manual VM start, you need  2 things:
> >>> >>
> >>> >> 1.  Define the VM network
> >>> >> # cat vdsm-ovirtmgmt.xml <network>
> >>> >>   <name>vdsm-ovirtmgmt</name>
> >>> >>   <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid>
> >>> >>   <forward mode='bridge'/>
> >>> >>   <bridge name='ovirtmgmt'/>
> >>> >> </network>
> >>> >>
> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define
> >vdsm-ovirtmgmt.xml
> >>> >>
> >>> >> 2. Get an xml definition which can be found in the vdsm log.
> >Every VM
> >>> >at
> >>> >> start up has it's configuration printed out  in vdsm log  on the
> >host
> >>> >it
> >>> >> starts.
> >>> >> Save to file and then:
> >>> >> A) virsh define myvm.xml
> >>> >> B) virsh start myvm
> >>> >>
> >>> >> It seems there is/was a problem with your NFS shares.
> >>> >>
> >>> >>
> >>> >> Best Regards,
> >>> >> Strahil Nikolov
> >>> >>
> >>>
> >>> Hey Shareef,
> >>>
> >>> Check if there are any files or folders not owned by vdsm:kvm .
> >Something
> >>> like this:
> >>>
> >>> find . -not -user 36 -not  -group 36 -print
> >>>
> >>> Also check if vdsm can access the images in the
> >>> '<vol-mount-point>/images' directories.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>
>
> And the IPv6 address  '64:ff9b::c0a8:13d' ?
>
> I  don't see  in the log output.
>
> Best Regards,
> Strahil Nikolov
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RIFNS65DOEOAEV6ZUDVQ6OULKAFIHJ5U/

Reply via email to