On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq <shar...@jalloq.co.uk> wrote: >Oh this is painful. It seems to progress if you have both >he_force_ipv4 >set and run the deployment with the '--4' switch. > >But then I get a failure when the ansible script checks for >firewalld-zones >and doesn't get anything back. Should the deployment flow not be >setting >any zones it needs? > >2020-04-15 10:57:25,439+0000 INFO >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get >active list of active firewalld zones] > >2020-04-15 10:57:26,641+0000 DEBUG >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': >True, >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd >--get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': >1, >u'invocation': {u'module_args': {u'creates': None, u'executable': None, >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set >-euo >pipefail && firewall-cmd --get-active-zones | grep -v >"^\\s*interfaces"', >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], >u'msg': >u'non-zero return code'} > >2020-04-15 10:57:26,741+0000 ERROR >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": >"", >"stderr_lines": [], "stdout": "", "stdout_lines": []} > >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shar...@jalloq.co.uk> >wrote: > >> Ha, spoke too soon. It's now stuck in a loop and a google points me >at >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 >> >> However, forcing ipv4 doesn't seem to have fixed the loop. >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shar...@jalloq.co.uk> >> wrote: >> >>> OK, that seems to have fixed it, thanks. Is this a side effect of >>> redeploying the HE over a first time install? Nothing has changed in >our >>> setup and I didn't need to do this when I initially set up our >nodes. >>> >>> >>> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov ><hunter86...@yahoo.com> >>> wrote: >>> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >>>> shar...@jalloq.co.uk> wrote: >>>> >Hmmm, we're not using ipv6. Is that the issue? >>>> > >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov ><hunter86...@yahoo.com> >>>> >wrote: >>>> > >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >>>> >> shar...@jalloq.co.uk> wrote: >>>> >> >Right, I've given up on recovering the HE so want to try and >>>> >redeploy >>>> >> >it. >>>> >> >There doesn't seem to be enough information to debug why the >>>> >> >broker/agent >>>> >> >won't start cleanly. >>>> >> > >>>> >> >In running 'hosted-engine --deploy', I'm seeing the following >error >>>> >in >>>> >> >the >>>> >> >setup validation phase: >>>> >> > >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG >otopi.plugins.otopi.dialog.human >>>> >> >dialog.__logString:204 DIALOG:SEND Please >provide >>>> >the >>>> >> >hostname of this host on the management network >>>> >> >[ovirt-node-00.phoelex.com]: >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getResolvedAddresses:432 >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', >'192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname._validateFQDNresolvability:289 >ovirt-node-00.phoelex.com >>>> >> >resolves >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >['/usr/bin/dig', '+noall', '+answer', >'ovirt-node-00.phoelex.com', >>>> >> >'ANY'], >>>> >> >executable='None', cwd='None', env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >>>> >> > >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', >env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >>>> >> > >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state >UNKNOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>>> >> > >>>> >> > inet 127.0.0.1/8 scope host lo >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 ::1/128 scope host >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq >master >>>> >> >ovirtmgmt state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq >state >>>> >> >DOWN >>>> >> >group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >>>> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >>>> >noqueue >>>> >> >state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global >ovirtmgmt >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >state >>>> >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getLocalAddresses:251 >>>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >hostname.test_hostname:464 >>>> >> >test_hostname exception >>>> >> > >>>> >> >Traceback (most recent call last): >>>> >> > >>>> >> >File >"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >>>> >> >line >>>> >> >460, in test_hostname >>>> >> > >>>> >> > not_local_text, >>>> >> > >>>> >> >File >"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >>>> >> >line >>>> >> >342, in _validateFQDNresolvability >>>> >> > >>>> >> > addresses=resolvedAddressesAsString >>>> >> > >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >2020-04-14 09:46:12,884+0000 ERROR >>>> >> >otopi.plugins.gr_he_common.network.bridge >dialog.queryEnvKey:120 >>>> >Host >>>> >> >name >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >The node I'm running on has an IP address of .61 and resolves >>>> >> >correctly. >>>> >> > >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >>>> ><shar...@jalloq.co.uk> >>>> >> >wrote: >>>> >> > >>>> >> >> Where should I be checking if there are any files/folder not >owned >>>> >by >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's >fine. >>>> >> >> >>>> >> >> How would I go about checking vdsm can access those images? >If I >>>> >run >>>> >> >> virsh, it lists them and they were running yesterday even >though >>>> >the >>>> >> >HA was >>>> >> >> down. I've since restarted both hosts but the broker is >still >>>> >> >spitting out >>>> >> >> the same error (copied below). How do I find the reason the >>>> >broker >>>> >> >can't >>>> >> >> connect to the storage? The conf file is already at DEBUG >>>> >verbosity: >>>> >> >> >>>> >> >> [handler_logfile] >>>> >> >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler >>>> >> >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, >7) >>>> >> >> >>>> >> >> level=DEBUG >>>> >> >> >>>> >> >> formatter=long >>>> >> >> >>>> >> >> And what are all these .prob-<num> files that are being >created? >>>> >> >There >>>> >> >> are over 250K of them now on the mount I'm using for the Data >>>> >domain. >>>> >> >> They're all of 0 size and of the form, >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >>>> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >>>> >> >> >>>> >> >> @eevans: The volume I have the Data Domain on has TB's free. > The >>>> >HA >>>> >> >is >>>> >> >> dead so I can't ssh in. No idea what started these errors >and the >>>> >> >other >>>> >> >> VMs were still running happily although they're on a >different >>>> >Data >>>> >> >Domain. >>>> >> >> >>>> >> >> Shareef. >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >> Connecting the storage >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Refreshing the storage domain >>>> >> >> >>>> >> >> MainThread::WARNING::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >> >>>> >> >> (code=350, message=Error in storage domain action: >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >>>> >> ><hunter86...@yahoo.com> >>>> >> >> wrote: >>>> >> >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> shar...@jalloq.co.uk> wrote: >>>> >> >>> >OK, let's go through this. I'm looking at the node that at >>>> >least >>>> >> >still >>>> >> >>> >has >>>> >> >>> >some VMs running. virsh also tells me that the >HostedEngine VM >>>> >is >>>> >> >>> >running >>>> >> >>> >but it's unresponsive and I can't shut it down. >>>> >> >>> > >>>> >> >>> >1. All storage domains exist and are mounted. >>>> >> >>> >2. The ha_agent exists: >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >>>> >> >>> > >>>> >> >>> >dom_md ha_agent images master >>>> >> >>> > >>>> >> >>> >3. There are two links >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> >>>> >>\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ >>>> >> >>> > >>>> >> >>> >total 8 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.lockspace >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.metadata >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >>>> >> >>> > >>>> >> >>> >4. The services exist but all seem to have some sort of >warning: >>>> >> >>> > >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: >>>> >> >*2020-04-08 >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time >10 >>>> >sec* >>>> >> >>> > >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com >supervdsmd[29409]: >>>> >> >*failed >>>> >> >>> >to >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared >object >>>> >> >file: >>>> >> >>> >No >>>> >> >>> >such file or directory* >>>> >> >>> > >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: >*ERROR >>>> >> >failed >>>> >> >>> >to >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or >>>> >> >directory'Is >>>> >> >>> >the >>>> >> >>> >Hosted Engine setup finished?* >>>> >> >>> > >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com >libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 >: >>>> >> >cannot >>>> >> >>> >parse >>>> >> >>> >process status data >>>> >> >>> > >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: error : >virNetDevTapInterfaceStats:764 >>>> >: >>>> >> >>> >internal >>>> >> >>> >error: /proc/net/dev: Interface not found >>>> >> >>> > >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-09 >>>> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >5 & 6. The broker log is continually printing this error: >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >Running broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >>>> >> >>> >Starting monitor >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Searching for submonitors in >>>> >> >>> >>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >>>> >> >>> > >>>> >> >>> >/submonitors >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Finished loading submonitors >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >>>> >> >>> >Starting storage broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting to VDSM >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >>>> >> >>> >Creating a new json-rpc connection to VDSM >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread >>>> >> ><Thread(Client >>>> >> >>> >localhost:54321, started daemon 139992488138496)> >(func=<bound >>>> >> >method >>>> >> >>> >Reactor.process_requests of ><yajsonrpc.betterAsyncore.Reactor >>>> >> >object at >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >>>> >> >>> >Stomp connection established >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting the storage >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not >>>> >> >available >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >[{u'status': 0, u'id': >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Refreshing the storage domain >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Error refreshing storage domain: Command >StorageDomain.getStats >>>> >> >with >>>> >> >>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >>>> >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::WARNING::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >The UUID it is moaning about is indeed the one that the HA >sits >>>> >on >>>> >> >and >>>> >> >>> >is >>>> >> >>> >the one I listed the contents of in step 2 above. >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >So why can't it see this domain? >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >Thanks, Shareef. >>>> >> >>> > >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >>>> >> ><hunter86...@yahoo.com> >>>> >> >>> >wrote: >>>> >> >>> > >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> >> shar...@jalloq.co.uk> wrote: >>>> >> >>> >> >Don't know if this is useful or not, but I just tried to >>>> >> >shutdown >>>> >> >>> >and >>>> >> >>> >> >start >>>> >> >>> >> >another VM on one of the hosts and get the following >error: >>>> >> >>> >> > >>>> >> >>> >> >virsh # start scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Failed to start domain scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Network not found: no network with matching name >>>> >> >>> >> >'vdsm-ovirtmgmt' >>>> >> >>> >> > >>>> >> >>> >> >Is this not referring to the interface name as the >network is >>>> >> >called >>>> >> >>> >> >'ovirtmgnt'. >>>> >> >>> >> > >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >>>> >> >>> ><shar...@jalloq.co.uk> >>>> >> >>> >> >wrote: >>>> >> >>> >> > >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't >come >>>> >up >>>> >> >and >>>> >> >>> >the >>>> >> >>> >> >> agent.log is full of the same errors. >>>> >> >>> >> >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >>>> >> >>> ><shar...@jalloq.co.uk> >>>> >> >>> >> >> wrote: >>>> >> >>> >> >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh >on >>>> >the >>>> >> >>> >second >>>> >> >>> >> >host >>>> >> >>> >> >>> but my first host is still dead. >>>> >> >>> >> >>> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- files that >get >>>> >> >dumped >>>> >> >>> >to >>>> >> >>> >> >the >>>> >> >>> >> >>> NFS mounts? >>>> >> >>> >> >>> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS >directories >>>> >at >>>> >> >boot? >>>> >> >>> >> >Is >>>> >> >>> >> >>> that the issue with this particular node? >>>> >> >>> >> >>> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >>>> ><eev...@digitaldatatechs.com> >>>> >> >>> >wrote: >>>> >> >>> >> >>> >>>> >> >>> >> >>>> Did you try virsh list --inactive >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Eric Evans >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Digital Data Services LLC. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> 304.660.9080 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shar...@jalloq.co.uk> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86...@yahoo.com> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine >unresponsive - >>>> >how >>>> >> >to >>>> >> >>> >> >rescue? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted >it >>>> >but >>>> >> >the >>>> >> >>> >> >agent >>>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine >>>> >--vm-status' >>>> >> >I >>>> >> >>> >get: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> The hosted engine configuration has not been >retrieved >>>> >from >>>> >> >>> >shared >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is >running and >>>> >> >the >>>> >> >>> >> >storage >>>> >> >>> >> >>>> server is reachable. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> and indeed if I list the mounts under >>>> >/rhev/data-center/mnt, >>>> >> >>> >only >>>> >> >>> >> >one of >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, >one ISO >>>> >> >Domain >>>> >> >>> >> >and two >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and >this >>>> >has >>>> >> >>> >lots >>>> >> >>> >> >of .prob >>>> >> >>> >> >>>> files in. So why haven't the other NFS exports been >>>> >> >mounted? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped >much >>>> >> >either. >>>> >> >>> >I >>>> >> >>> >> >can >>>> >> >>> >> >>>> start the broker service but the agent service says >no. >>>> >> >Same >>>> >> >>> >error >>>> >> >>> >> >as the >>>> >> >>> >> >>>> one in my last email. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Shareef. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >>>> >> >>> >> ><shar...@jalloq.co.uk> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't >know >>>> >> >anything >>>> >> >>> >> >about >>>> >> >>> >> >>>> the engine vm. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've restarted the broker and agent services and I >still >>>> >get >>>> >> >>> >> >nothing in >>>> >> >>> >> >>>> virsh->list. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I >see >>>> >lots >>>> >> >of >>>> >> >>> >> >errors: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> broker.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Finished loading submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >> >>>> Connecting the storage >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Refreshing the storage domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >> >>>> Can't connect vdsm storage: Command >StorageDomain.getInfo >>>> >> >with >>>> >> >>> >args >>>> >> >>> >> >>>> {'storagedomainID': >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >> >>> >failed: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain action: >>>> >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> agent.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>> >>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>> >>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >>>> >> >>> >> >>>> Found certificate common name: >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Initializing ha-broker connection >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>> >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': >'', >>>> >> >>> >> >'network_test': >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Failed to start necessary monitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Traceback (most recent call last): >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 131, in _run_agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return action(he) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 55, in action_proper >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return he.start_monitoring() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 432, in start_monitoring >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> self._initialize_broker() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 556, in _initialize_broker >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> m.get('options', {})) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>> >> >>> >> >>>> line 89, in start_monitor >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor >via >>>> >> >>> >> >ovirt-ha-broker: >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: >'network', >>>> >> >>> >options: >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >>>> >'tcp_t_port': >>>> >> >'', >>>> >> >>> >> >'addr': >>>> >> >>> >> >>>> '192.168.1.99'}] >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>> >>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >>>> >> >>> >> ><hunter86...@yahoo.com> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, >Brett" < >>>> >> >>> >> >>>> mat...@ltresources.co.uk> wrote: >>>> >> >>> >> >>>> >On the host you tried to restart the engine on: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >>>> >virsh_auth.conf) >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >alias virsh='virsh -c >>>> >> >>> >> >>>> >>>> >> >>> >>>> >>>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Then run virsh: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # list >>>> >> >>> >> >>>> > Id Name State >>>> >> >>> >> >>>> >>---------------------------------------------------- >>>> >> >>> >> >>>> > xx HostedEngine Paused >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > ... >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and resume >the >>>> >> >engine: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # resume HostedEngine >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >>>> >> >>> ><shar...@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >> Thanks! >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being >>>> >down.... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting >>>> >> >>> >> >>>> >> VM in WaitForLaunch >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after >a >>>> >while >>>> >> >I >>>> >> >>> >get a >>>> >> >>> >> >>>> >status of >>>> >> >>> >> >>>> >> it being barfed... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) >status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : False >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-00.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 1 >>>> >> >>> >> >>>> >> Engine status : unknown >>>> >stale-data >>>> >> >>> >> >>>> >> Score : 3400 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 9c4a034b >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 >>>> >> >>> >> >>>> >> Host timestamp : 523608 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >>>> >> >>> >> >>>> >> host-id=1 >>>> >> >>> >> >>>> >> score=3400 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) >status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : True >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 2 >>>> >> >>> >> >>>> >> Engine status : {"reason": >"bad >>>> >vm >>>> >> >>> >status", >>>> >> >>> >> >>>> >"health": >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>>> >> >>> >> >>>> >> Score : 0 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 5045f2eb >>>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 >>>> >> >>> >> >>>> >> Host timestamp : 1737283 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >>>> >> >>> >> >>>> >> host-id=2 >>>> >> >>> >> >>>> >> score=0 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 >>>> >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>>> >> >>> >> >>>> ><mat...@ltresources.co.uk> >>>> >> >>> >> >>>> >> wrote: >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To get information: >>>> >> >>> >> >>>> >>> hosted-engine --vm-status >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To start the engine: >>>> >> >>> >> >>>> >>> hosted-engine --vm-start >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >>>> >> >>> >> ><shar...@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into >it >>>> >> >either. >>>> >> >>> >If >>>> >> >>> >> >I >>>> >> >>> >> >>>> >try to >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is running >on, I >>>> >get >>>> >> >>> >> >redirected >>>> >> >>> >> >>>> >because >>>> >> >>> >> >>>> >>>> the node can't reach the engine. >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> What are my next steps? >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> Shareef. >>>> >> >>> >> >>>> >>>> _______________________________________________ >>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to >>>> >users-le...@ovirt.org >>>> >> >>> >> >>>> >>>> Privacy Statement: >>>> >> >>> >https://www.ovirt.org/privacy-policy.html >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >>>> >> >>> >> >>>> >>>> >>>> >> >https://www.ovirt.org/community/about/community-guidelines/ >>>> >> >>> >> >>>> >>>> List Archives: >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >>>> >> >>> >> > >>>> >> >>> >> >>>> >> >>> > >>>> >> >>> >>>> >> > >>>> >> >>>> > >>>> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/ >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> This has to be resolved: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Engine status : unknown >stale-data >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains >the >>>> >> >same, >>>> >> >>> >> >restart >>>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Verify that the engine's storage is available. Then >>>> >monitor >>>> >> >the >>>> >> >>> >> >broker >>>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Best Regards, >>>> >> >>> >> >>>> Strahil Nikolov >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> Hi Shareef, >>>> >> >>> >> >>>> >> >>> >> The flow of activation oVirt is more complex than a plain >KVM. >>>> >> >>> >> Mounting of the domains happen during the activation of >the >>>> >node >>>> >> >( >>>> >> >>> >the >>>> >> >>> >> HostedEngine is activating everything needed). >>>> >> >>> >> >>>> >> >>> >> Focus on the HostedEngine VM. >>>> >> >>> >> Is it running properly ? >>>> >> >>> >> >>>> >> >>> >> If not,try: >>>> >> >>> >> 1. Verify that the storage domain exists >>>> >> >>> >> 2. Check if it has 'ha_agents' directory >>>> >> >>> >> 3. Check if the links are OK, if not you can safely >remove >>>> >the >>>> >> >links >>>> >> >>> >> >>>> >> >>> >> 4. Next check the services are running: >>>> >> >>> >> A) sanlock >>>> >> >>> >> B) supervdsmd >>>> >> >>> >> C) vdsmd >>>> >> >>> >> D) libvirtd >>>> >> >>> >> >>>> >> >>> >> 5. Increase the log level for broker and agent services: >>>> >> >>> >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >>>> >> >>> >> vim *-log.conf >>>> >> >>> >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >>>> >> >>> >> >>>> >> >>> >> 6. Check what they are complaining about >>>> >> >>> >> Keep in mind that agent will keep throwing errors untill >the >>>> >> >broker >>>> >> >>> >stops >>>> >> >>> >> doing it (agent depends on broker), so broker must be >OK >>>> >before >>>> >> >>> >> peoceeding with the agent log. >>>> >> >>> >> >>>> >> >>> >> About the manual VM start, you need 2 things: >>>> >> >>> >> >>>> >> >>> >> 1. Define the VM network >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >>>> >> >>> >> <forward mode='bridge'/> >>>> >> >>> >> <bridge name='ovirtmgmt'/> >>>> >> >>> >> </network> >>>> >> >>> >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >>>> >> >vdsm-ovirtmgmt.xml >>>> >> >>> >> >>>> >> >>> >> 2. Get an xml definition which can be found in the vdsm >log. >>>> >> >Every VM >>>> >> >>> >at >>>> >> >>> >> start up has it's configuration printed out in vdsm log >on >>>> >the >>>> >> >host >>>> >> >>> >it >>>> >> >>> >> starts. >>>> >> >>> >> Save to file and then: >>>> >> >>> >> A) virsh define myvm.xml >>>> >> >>> >> B) virsh start myvm >>>> >> >>> >> >>>> >> >>> >> It seems there is/was a problem with your NFS shares. >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> Best Regards, >>>> >> >>> >> Strahil Nikolov >>>> >> >>> >> >>>> >> >>> >>>> >> >>> Hey Shareef, >>>> >> >>> >>>> >> >>> Check if there are any files or folders not owned by >vdsm:kvm . >>>> >> >Something >>>> >> >>> like this: >>>> >> >>> >>>> >> >>> find . -not -user 36 -not -group 36 -print >>>> >> >>> >>>> >> >>> Also check if vdsm can access the images in the >>>> >> >>> '<vol-mount-point>/images' directories. >>>> >> >>> >>>> >> >>> Best Regards, >>>> >> >>> Strahil Nikolov >>>> >> >>> >>>> >> >> >>>> >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >>>> >> >>>> >> I don't see in the log output. >>>> >> >>>> >> Best Regards, >>>> >> Strahil Nikolov >>>> >> >>>> >>>> Based on your output , you got a PTR record for IPv4 & IPv6 ... >most >>>> probably it's the reason. >>>> >>>> Set the IPv6 on the interface and try again. >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>>
Do you have firewalld up and running on the host ? Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJODBWEEXVZVTKZ3DU6A77FJS46B73DE/