Hmmm, we're not using ipv6. Is that the issue? On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86...@yahoo.com> wrote:
> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < > shar...@jalloq.co.uk> wrote: > >Right, I've given up on recovering the HE so want to try and redeploy > >it. > >There doesn't seem to be enough information to debug why the > >broker/agent > >won't start cleanly. > > > >In running 'hosted-engine --deploy', I'm seeing the following error in > >the > >setup validation phase: > > > >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human > >dialog.__logString:204 DIALOG:SEND Please provide the > >hostname of this host on the management network > >[ovirt-node-00.phoelex.com]: > > > > > >2020-04-14 09:46:12,831+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge > >hostname.getResolvedAddresses:432 > >getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > > > >2020-04-14 09:46:12,832+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge > >hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com > >resolves > >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > > > >2020-04-14 09:46:12,832+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 > >execute: > >['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', > >'ANY'], > >executable='None', cwd='None', env=None > > > >2020-04-14 09:46:12,871+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 > >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' > >ovirt-node-00.phoelex.com', 'ANY'], rc=0 > > > >2020-04-14 09:46:12,872+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >ovirt-node-00.phoelex.com', 'ANY'] stdout: > > > >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 > > > > > >2020-04-14 09:46:12,872+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >ovirt-node-00.phoelex.com', 'ANY'] stderr: > > > > > > > >2020-04-14 09:46:12,872+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 > >execute: > >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None > > > >2020-04-14 09:46:12,876+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 > >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 > > > >2020-04-14 09:46:12,876+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >execute-output: ('/usr/sbin/ip', 'addr') stdout: > > > >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN > >group > >default qlen 1000 > > > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > > > inet 127.0.0.1/8 scope host lo > > > > valid_lft forever preferred_lft forever > > > > inet6 ::1/128 scope host > > > > valid_lft forever preferred_lft forever > > > >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > >ovirtmgmt state UP group default qlen 1000 > > > > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > > > >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state > >DOWN > >group default qlen 1000 > > > > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff > > > >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >group > >default qlen 1000 > > > > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff > > > >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group > >default qlen 1000 > > > > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff > > > >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue > >state UP group default qlen 1000 > > > > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > > > > inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt > > > > valid_lft forever preferred_lft forever > > > > inet6 fe80::ae1f:6bff:febc:326a/64 scope link > > > > valid_lft forever preferred_lft forever > > > >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >group > >default qlen 1000 > > > > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff > > > > > >2020-04-14 09:46:12,876+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >execute-output: ('/usr/sbin/ip', 'addr') stderr: > > > > > > > >2020-04-14 09:46:12,877+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge > >hostname.getLocalAddresses:251 > >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] > > > >2020-04-14 09:46:12,877+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 > >test_hostname exception > > > >Traceback (most recent call last): > > > >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >line > >460, in test_hostname > > > > not_local_text, > > > >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >line > >342, in _validateFQDNresolvability > > > > addresses=resolvedAddressesAsString > > > >RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d > >192.168.1.61 and not all of them can be mapped to non loopback devices > >on > >this host > > > >2020-04-14 09:46:12,884+0000 ERROR > >otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host > >name > >is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d > >192.168.1.61 and not all of them can be mapped to non loopback devices > >on > >this host > > > >The node I'm running on has an IP address of .61 and resolves > >correctly. > > > >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shar...@jalloq.co.uk> > >wrote: > > > >> Where should I be checking if there are any files/folder not owned by > >> vdsm:kvm? I checked on the mount the HA sits on and it's fine. > >> > >> How would I go about checking vdsm can access those images? If I run > >> virsh, it lists them and they were running yesterday even though the > >HA was > >> down. I've since restarted both hosts but the broker is still > >spitting out > >> the same error (copied below). How do I find the reason the broker > >can't > >> connect to the storage? The conf file is already at DEBUG verbosity: > >> > >> [handler_logfile] > >> > >> class=logging.handlers.TimedRotatingFileHandler > >> > >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7) > >> > >> level=DEBUG > >> > >> formatter=long > >> > >> And what are all these .prob-<num> files that are being created? > >There > >> are over 250K of them now on the mount I'm using for the Data domain. > >> They're all of 0 size and of the form, > >> /rhev/data-center/mnt/nas-01.phoelex.com: > >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 > >> > >> @eevans: The volume I have the Data Domain on has TB's free. The HA > >is > >> dead so I can't ssh in. No idea what started these errors and the > >other > >> VMs were still running happily although they're on a different Data > >Domain. > >> > >> Shareef. > >> > >> MainThread::INFO::2020-04-10 > >> > > >07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> Connecting the storage > >> > >> MainThread::INFO::2020-04-10 > >> > > >07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> Connecting storage server > >> > >> MainThread::INFO::2020-04-10 > >> > > >07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> Connecting storage server > >> > >> MainThread::INFO::2020-04-10 > >> > > >07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> Refreshing the storage domain > >> > >> MainThread::WARNING::2020-04-10 > >> > > >07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> Can't connect vdsm storage: Command StorageDomain.getInfo with args > >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >> > >> (code=350, message=Error in storage domain action: > >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> > >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov > ><hunter86...@yahoo.com> > >> wrote: > >> > >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > >>> shar...@jalloq.co.uk> wrote: > >>> >OK, let's go through this. I'm looking at the node that at least > >still > >>> >has > >>> >some VMs running. virsh also tells me that the HostedEngine VM is > >>> >running > >>> >but it's unresponsive and I can't shut it down. > >>> > > >>> >1. All storage domains exist and are mounted. > >>> >2. The ha_agent exists: > >>> > > >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls > >/rhev/data-center/mnt/ > >>> >nas-01.phoelex.com > >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > >>> > > >>> >dom_md ha_agent images master > >>> > > >>> >3. There are two links > >>> > > >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll > >/rhev/data-center/mnt/ > >>> >nas-01.phoelex.com > >>> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ > >>> > > >>> >total 8 > >>> > > >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> > >>> > >>> > > >>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 > >>> > > >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> > >>> > >>> > > >>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 > >>> > > >>> >4. The services exist but all seem to have some sort of warning: > >>> > > >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: > >*2020-04-08 > >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* > >>> > > >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: > >*failed > >>> >to > >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object > >file: > >>> >No > >>> >such file or directory* > >>> > > >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR > >failed > >>> >to > >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or > >directory'Is > >>> >the > >>> >Hosted Engine setup finished?* > >>> > > >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-08 > >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : > >cannot > >>> >parse > >>> >process status data > >>> > > >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-08 > >>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : > >>> >internal > >>> >error: /proc/net/dev: Interface not found > >>> > > >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-08 > >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End > >of > >>> >file > >>> >while reading data: Input/output error > >>> > > >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-09 > >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End > >of > >>> >file > >>> >while reading data: Input/output error > >>> > > >>> >5 & 6. The broker log is continually printing this error: > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>> >ovirt-hosted-engine-ha broker 2.3.6 started > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>> >Running broker > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) > >>> >Starting monitor > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Searching for submonitors in > >>> >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > >>> > > >>> >/submonitors > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor network > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor cpu-load-no-engine > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor mgmt-bridge > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor network > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor cpu-load > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor engine-health > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor mgmt-bridge > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor cpu-load-no-engine > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor cpu-load > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor mem-free > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor storage-domain > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor storage-domain > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor mem-free > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Loaded submonitor engine-health > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >Finished loading submonitors > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) > >>> >Starting storage broker > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>> >Connecting to VDSM > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) > >>> >Creating a new json-rpc connection to VDSM > >>> > > >>> >Client localhost:54321::DEBUG::2020-04-09 > >>> >08:07:31,453::concurrent::258::root::(run) START thread > ><Thread(Client > >>> >localhost:54321, started daemon 139992488138496)> (func=<bound > >method > >>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor > >object at > >>> >0x7f528acabc90>>, args=(), kwargs={}) > >>> > > >>> >Client localhost:54321::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) > >>> >Stomp connection established > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>> >Connecting the storage > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >Connecting storage server > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) > >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not > >available > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >Connecting storage server > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > > >>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >Refreshing the storage domain > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >Error refreshing storage domain: Command StorageDomain.getStats > >with > >>> >args > >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>> > > >>> >(code=350, message=Error in storage domain action: > >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > > >>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) > >>> >Command StorageDomain.getInfo with args {'storagedomainID': > >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>> > > >>> >(code=350, message=Error in storage domain action: > >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> > > >>> >MainThread::WARNING::2020-04-09 > >>> > >>> > > >>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >>> >Can't connect vdsm storage: Command StorageDomain.getInfo with args > >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>> > > >>> >(code=350, message=Error in storage domain action: > >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> > > >>> > > >>> >The UUID it is moaning about is indeed the one that the HA sits on > >and > >>> >is > >>> >the one I listed the contents of in step 2 above. > >>> > > >>> > > >>> >So why can't it see this domain? > >>> > > >>> > > >>> >Thanks, Shareef. > >>> > > >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov > ><hunter86...@yahoo.com> > >>> >wrote: > >>> > > >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > >>> >> shar...@jalloq.co.uk> wrote: > >>> >> >Don't know if this is useful or not, but I just tried to > >shutdown > >>> >and > >>> >> >start > >>> >> >another VM on one of the hosts and get the following error: > >>> >> > > >>> >> >virsh # start scratch > >>> >> > > >>> >> >error: Failed to start domain scratch > >>> >> > > >>> >> >error: Network not found: no network with matching name > >>> >> >'vdsm-ovirtmgmt' > >>> >> > > >>> >> >Is this not referring to the interface name as the network is > >called > >>> >> >'ovirtmgnt'. > >>> >> > > >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > >>> ><shar...@jalloq.co.uk> > >>> >> >wrote: > >>> >> > > >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come up > >and > >>> >the > >>> >> >> agent.log is full of the same errors. > >>> >> >> > >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > >>> ><shar...@jalloq.co.uk> > >>> >> >> wrote: > >>> >> >> > >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh on the > >>> >second > >>> >> >host > >>> >> >>> but my first host is still dead. > >>> >> >>> > >>> >> >>> First of all, what are these 56,317 .prob- files that get > >dumped > >>> >to > >>> >> >the > >>> >> >>> NFS mounts? > >>> >> >>> > >>> >> >>> Secondly, why doesn't the node mount the NFS directories at > >boot? > >>> >> >Is > >>> >> >>> that the issue with this particular node? > >>> >> >>> > >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eev...@digitaldatatechs.com> > >>> >wrote: > >>> >> >>> > >>> >> >>>> Did you try virsh list --inactive > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> Eric Evans > >>> >> >>>> > >>> >> >>>> Digital Data Services LLC. > >>> >> >>>> > >>> >> >>>> 304.660.9080 > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> *From:* Shareef Jalloq <shar...@jalloq.co.uk> > >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >>> >> >>>> *To:* Strahil Nikolov <hunter86...@yahoo.com> > >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how > >to > >>> >> >rescue? > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> I've now shut down the VMs on one host and rebooted it but > >the > >>> >> >agent > >>> >> >>>> service doesn't start. If I run 'hosted-engine --vm-status' > >I > >>> >get: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> The hosted engine configuration has not been retrieved from > >>> >shared > >>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and > >the > >>> >> >storage > >>> >> >>>> server is reachable. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, > >>> >only > >>> >> >one of > >>> >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO > >Domain > >>> >> >and two > >>> >> >>>> Data Domains. Only one Data Domain has mounted and this has > >>> >lots > >>> >> >of .prob > >>> >> >>>> files in. So why haven't the other NFS exports been > >mounted? > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> Manually mounting them doesn't seem to have helped much > >either. > >>> >I > >>> >> >can > >>> >> >>>> start the broker service but the agent service says no. > >Same > >>> >error > >>> >> >as the > >>> >> >>>> one in my last email. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> Shareef. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >>> >> ><shar...@jalloq.co.uk> > >>> >> >>>> wrote: > >>> >> >>>> > >>> >> >>>> Right, still down. I've run virsh and it doesn't know > >anything > >>> >> >about > >>> >> >>>> the engine vm. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> I've restarted the broker and agent services and I still get > >>> >> >nothing in > >>> >> >>>> virsh->list. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots > >of > >>> >> >errors: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> broker.log: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Searching for submonitors in > >>> >> >>>> > >>> >> > >>> > >>> > > >>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor network > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load-no-engine > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mgmt-bridge > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor network > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor engine-health > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mgmt-bridge > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load-no-engine > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mem-free > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor storage-domain > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor storage-domain > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mem-free > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor engine-health > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Finished loading submonitors > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>> >> >>>> Connecting the storage > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >> >>>> Connecting storage server > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >> >>>> Connecting storage server > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >> >>>> Refreshing the storage domain > >>> >> >>>> > >>> >> >>>> MainThread::WARNING::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo > >with > >>> >args > >>> >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >>> >failed: > >>> >> >>>> > >>> >> >>>> (code=350, message=Error in storage domain action: > >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Searching for submonitors in > >>> >> >>>> > >>> >> > >>> > >>> > > >>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> agent.log: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>> >> >>>> Trying to restart agent > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> > >>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>> >> >>>> Agent shutting down > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> > >>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >>> >> >>>> Initializing ha-broker connection > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > >>> >> >>>> Starting monitor network, options {'tcp_t_address': '', > >>> >> >'network_test': > >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >>> >> >>>> Failed to start necessary monitors > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>> >> >>>> Traceback (most recent call last): > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> > >>> > > >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>> >> >>>> line 131, in _run_agent > >>> >> >>>> > >>> >> >>>> return action(he) > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> > >>> > > >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>> >> >>>> line 55, in action_proper > >>> >> >>>> > >>> >> >>>> return he.start_monitoring() > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >>> >> >>>> line 432, in start_monitoring > >>> >> >>>> > >>> >> >>>> self._initialize_broker() > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >>> >> >>>> line 556, in _initialize_broker > >>> >> >>>> > >>> >> >>>> m.get('options', {})) > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >>> >> >>>> line 89, in start_monitor > >>> >> >>>> > >>> >> >>>> ).format(t=type, o=options, e=e) > >>> >> >>>> > >>> >> >>>> RequestError: brokerlink - failed to start monitor via > >>> >> >ovirt-ha-broker: > >>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network', > >>> >options: > >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': > >'', > >>> >> >'addr': > >>> >> >>>> '192.168.1.99'}] > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > > >>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>> >> >>>> Trying to restart agent > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> > >>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>> >> >>>> Agent shutting down > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >>> >> ><hunter86...@yahoo.com> > >>> >> >>>> wrote: > >>> >> >>>> > >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > >>> >> >>>> mat...@ltresources.co.uk> wrote: > >>> >> >>>> >On the host you tried to restart the engine on: > >>> >> >>>> > > >>> >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) > >>> >> >>>> > > >>> >> >>>> >alias virsh='virsh -c > >>> >> >>>> > >>> >>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > >>> >> >>>> > > >>> >> >>>> >Then run virsh: > >>> >> >>>> > > >>> >> >>>> >virsh > >>> >> >>>> > > >>> >> >>>> >virsh # list > >>> >> >>>> > Id Name State > >>> >> >>>> >---------------------------------------------------- > >>> >> >>>> > xx HostedEngine Paused > >>> >> >>>> > xx ********** running > >>> >> >>>> > ... > >>> >> >>>> > xx ********** running > >>> >> >>>> > > >>> >> >>>> >HostedEngine should be in the list, try and resume the > >engine: > >>> >> >>>> > > >>> >> >>>> >virsh # resume HostedEngine > >>> >> >>>> > > >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > >>> ><shar...@jalloq.co.uk> > >>> >> >>>> >wrote: > >>> >> >>>> > > >>> >> >>>> >> Thanks! > >>> >> >>>> >> > >>> >> >>>> >> The status hangs due to, I guess, the VM being down.... > >>> >> >>>> >> > >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >>> >> >>>> >> VM exists and is down, cleaning up and restarting > >>> >> >>>> >> VM in WaitForLaunch > >>> >> >>>> >> > >>> >> >>>> >> but this doesn't seem to do anything. OK, after a while > >I > >>> >get a > >>> >> >>>> >status of > >>> >> >>>> >> it being barfed... > >>> >> >>>> >> > >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >>> >> >>>> >> > >>> >> >>>> >> conf_on_shared_storage : True > >>> >> >>>> >> Status up-to-date : False > >>> >> >>>> >> Hostname : > >>> >ovirt-node-00.phoelex.com > >>> >> >>>> >> Host ID : 1 > >>> >> >>>> >> Engine status : unknown stale-data > >>> >> >>>> >> Score : 3400 > >>> >> >>>> >> stopped : False > >>> >> >>>> >> Local maintenance : False > >>> >> >>>> >> crc32 : 9c4a034b > >>> >> >>>> >> local_conf_timestamp : 523362 > >>> >> >>>> >> Host timestamp : 523608 > >>> >> >>>> >> Extra metadata (valid at timestamp): > >>> >> >>>> >> metadata_parse_version=1 > >>> >> >>>> >> metadata_feature_version=1 > >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >>> >> >>>> >> host-id=1 > >>> >> >>>> >> score=3400 > >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) > >>> >> >>>> >> conf_on_shared_storage=True > >>> >> >>>> >> maintenance=False > >>> >> >>>> >> state=EngineDown > >>> >> >>>> >> stopped=False > >>> >> >>>> >> > >>> >> >>>> >> > >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >>> >> >>>> >> > >>> >> >>>> >> conf_on_shared_storage : True > >>> >> >>>> >> Status up-to-date : True > >>> >> >>>> >> Hostname : > >>> >ovirt-node-01.phoelex.com > >>> >> >>>> >> Host ID : 2 > >>> >> >>>> >> Engine status : {"reason": "bad vm > >>> >status", > >>> >> >>>> >"health": > >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >>> >> >>>> >> Score : 0 > >>> >> >>>> >> stopped : False > >>> >> >>>> >> Local maintenance : False > >>> >> >>>> >> crc32 : 5045f2eb > >>> >> >>>> >> local_conf_timestamp : 1737037 > >>> >> >>>> >> Host timestamp : 1737283 > >>> >> >>>> >> Extra metadata (valid at timestamp): > >>> >> >>>> >> metadata_parse_version=1 > >>> >> >>>> >> metadata_feature_version=1 > >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >>> >> >>>> >> host-id=2 > >>> >> >>>> >> score=0 > >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) > >>> >> >>>> >> conf_on_shared_storage=True > >>> >> >>>> >> maintenance=False > >>> >> >>>> >> state=EngineUnexpectedlyDown > >>> >> >>>> >> stopped=False > >>> >> >>>> >> > >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >>> >> >>>> ><mat...@ltresources.co.uk> > >>> >> >>>> >> wrote: > >>> >> >>>> >> > >>> >> >>>> >>> First steps, on one of your hosts as root: > >>> >> >>>> >>> > >>> >> >>>> >>> To get information: > >>> >> >>>> >>> hosted-engine --vm-status > >>> >> >>>> >>> > >>> >> >>>> >>> To start the engine: > >>> >> >>>> >>> hosted-engine --vm-start > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >>> >> ><shar...@jalloq.co.uk> > >>> >> >>>> >wrote: > >>> >> >>>> >>> > >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it > >either. > >>> >If > >>> >> >I > >>> >> >>>> >try to > >>> >> >>>> >>>> log into the web-ui of the node it is running on, I get > >>> >> >redirected > >>> >> >>>> >because > >>> >> >>>> >>>> the node can't reach the engine. > >>> >> >>>> >>>> > >>> >> >>>> >>>> What are my next steps? > >>> >> >>>> >>>> > >>> >> >>>> >>>> Shareef. > >>> >> >>>> >>>> _______________________________________________ > >>> >> >>>> >>>> Users mailing list -- users@ovirt.org > >>> >> >>>> >>>> To unsubscribe send an email to users-le...@ovirt.org > >>> >> >>>> >>>> Privacy Statement: > >>> >https://www.ovirt.org/privacy-policy.html > >>> >> >>>> >>>> oVirt Code of Conduct: > >>> >> >>>> >>>> > >https://www.ovirt.org/community/about/community-guidelines/ > >>> >> >>>> >>>> List Archives: > >>> >> >>>> >>>> > >>> >> >>>> > > >>> >> >>>> > >>> >> > > >>> >> > >>> > > >>> > > > https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/ > >>> >> >>>> >>>> > >>> >> >>>> >>> > >>> >> >>>> > >>> >> >>>> This has to be resolved: > >>> >> >>>> > >>> >> >>>> Engine status : unknown stale-data > >>> >> >>>> > >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the > >same, > >>> >> >restart > >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >>> >> >>>> > >>> >> >>>> Verify that the engine's storage is available. Then monitor > >the > >>> >> >broker > >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >>> >> >>>> > >>> >> >>>> Best Regards, > >>> >> >>>> Strahil Nikolov > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> > >>> >> Hi Shareef, > >>> >> > >>> >> The flow of activation oVirt is more complex than a plain KVM. > >>> >> Mounting of the domains happen during the activation of the node > >( > >>> >the > >>> >> HostedEngine is activating everything needed). > >>> >> > >>> >> Focus on the HostedEngine VM. > >>> >> Is it running properly ? > >>> >> > >>> >> If not,try: > >>> >> 1. Verify that the storage domain exists > >>> >> 2. Check if it has 'ha_agents' directory > >>> >> 3. Check if the links are OK, if not you can safely remove the > >links > >>> >> > >>> >> 4. Next check the services are running: > >>> >> A) sanlock > >>> >> B) supervdsmd > >>> >> C) vdsmd > >>> >> D) libvirtd > >>> >> > >>> >> 5. Increase the log level for broker and agent services: > >>> >> > >>> >> cd /etc/ovirt-hosted-engine-ha > >>> >> vim *-log.conf > >>> >> > >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >>> >> > >>> >> 6. Check what they are complaining about > >>> >> Keep in mind that agent will keep throwing errors untill the > >broker > >>> >stops > >>> >> doing it (agent depends on broker), so broker must be OK before > >>> >> peoceeding with the agent log. > >>> >> > >>> >> About the manual VM start, you need 2 things: > >>> >> > >>> >> 1. Define the VM network > >>> >> # cat vdsm-ovirtmgmt.xml <network> > >>> >> <name>vdsm-ovirtmgmt</name> > >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >>> >> <forward mode='bridge'/> > >>> >> <bridge name='ovirtmgmt'/> > >>> >> </network> > >>> >> > >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define > >vdsm-ovirtmgmt.xml > >>> >> > >>> >> 2. Get an xml definition which can be found in the vdsm log. > >Every VM > >>> >at > >>> >> start up has it's configuration printed out in vdsm log on the > >host > >>> >it > >>> >> starts. > >>> >> Save to file and then: > >>> >> A) virsh define myvm.xml > >>> >> B) virsh start myvm > >>> >> > >>> >> It seems there is/was a problem with your NFS shares. > >>> >> > >>> >> > >>> >> Best Regards, > >>> >> Strahil Nikolov > >>> >> > >>> > >>> Hey Shareef, > >>> > >>> Check if there are any files or folders not owned by vdsm:kvm . > >Something > >>> like this: > >>> > >>> find . -not -user 36 -not -group 36 -print > >>> > >>> Also check if vdsm can access the images in the > >>> '<vol-mount-point>/images' directories. > >>> > >>> Best Regards, > >>> Strahil Nikolov > >>> > >> > > And the IPv6 address '64:ff9b::c0a8:13d' ? > > I don't see in the log output. > > Best Regards, > Strahil Nikolov >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RIFNS65DOEOAEV6ZUDVQ6OULKAFIHJ5U/