On Tue, Jan 19, 2021 at 12:59 PM James Freeman <james.free...@a24.io> wrote: > > So grateful for your help here - I ran tcpdump on the host, and I saw > the connection requests to the host from the hosted-engine on 54321/tcp, > so I was kind of getting there on the whole vdsm thing. > > The install just fell over again (same issue - the 120 second timeout > you described). Taking a step back here, I think something is wrong very > early on in my upgrade process. My environment is: > > 2 x RHEL based hosts (previously RHEL 7 - to be re-installed with RHEL 8 > as per install documentation) > NFS based storage > Self-hosted engine > > I have been following the documentation here: > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#SHE_Upgrading_from_4-3 > > And specifically here: > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#Upgrading_the_Manager_to_4-4_4-3_SHE > > All pre-requisite steps are done - the 4.3 engine was upgraded to the > latest version before the backup was taken and it was shut down. > > Now, I note that on my RHEL 8 host (newly installed), vdsmd is not > configured or running. The deploy script is not opening the firewall for > the temporary manager to talk to the host on 54321, but it wouldn't > matter if it did - even if were open up the firewall, there's no > configured vdsmd running for it to talk to anyway. > > I suddenly have the feeling that I've missed an important step that > would have configured the freshly installed RHEL 8 host for the > hosted-engine to be installed on - but I can't see what this might be. > I've been back and forth through the documentation but I can't see where > vdsmd would have been configured on the host. In short (ignoring all the
This should happen automatically, does not require a manual step on your side. > failed attempts), my commands to install on a fresh RHEL 8 host have been: > > dnf module reset virt > dnf module list virt > dnf module enable virt:8.3 > dnf distro-sync --nobest > dnf install rhvm-appliance > reboot > dnf install ovirt-hosted-engine-setup Just to make sure, perhaps try also 'dnf install ovirt-host'. If this does carry on additional requirements, perhaps that's a bug somewhere. But I do not think this is what is failing you. > dnf install firewalld > systemctl status firewalld > systemctl enable firewalld > systemctl start firewalld I do not think these are needed - the deploy process should do this. Should be harmless, though. > systemctl status firewalld > hosted-engine --deploy --restore-from-file=backup.bck > > Am I missing something fundamental, or is there another step that's not > working where vdsmd would have been configured? Sorry, I ignored the fact that it's an upgrade/restore. In this case, it's expected that the restored engine will not have access to all other hosts during deploy, until it's started on the external network. So I suggest to ignore most errors in engine.log and check only those related to the host you deploy on. And check host-deploy/* logs. For a general overview of the hosted-engine deploy process, you might want to check 'Simone Tiraboschi - Hosted Engine 4.3 Deep Dive' in: https://www.ovirt.org/community/archived_conferences_presentations.html I think it's still the best presentation slides we have on this. Good luck, > > Many thanks > > James > > Yedidyah Bar David wrote on 19/01/2021 10:36: > > On Tue, Jan 19, 2021 at 12:25 PM James Freeman <james.free...@a24.io> wrote: > >> Thanks Didi > >> > >> Great pointer - I have just performed a fresh deploy - am in the > >> hosted-engine VM, and in /var/log/ovirt-engine/engine-log, I can see the > >> following 3 lines cycling over and over again: > >> > >> 2021-01-19 05:12:11,395-05 INFO > >> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp > >> Reactor) [] Connecting to rhvh1.example.org/192.168.50.31 > >> 2021-01-19 05:12:11,399-05 ERROR > >> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] > >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) > >> [] Unable to RefreshCapabilities: ConnectException: Connection refused > >> 2021-01-19 05:12:11,401-05 ERROR > >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] > >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) > >> [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = rhvh1.example.org, > >> VdsIdAndVdsVDSCommandParametersBase:{hostId='12057f7e-a4cf-46ec-b563-c1037ba5c62d', > >> vds='Host[rhvh1.example.org,12057f7e-a4cf-46ec-b563-c1037ba5c62d]'})' > >> execution failed: java.net.ConnectException: Connection refused > >> > >> I can ping 192.168.50.31 and resolve rhvh1.example.org - however I note > >> that firewalld on the hypervisor host (192.168.50.31) hasn't had > >> anything allowed through it yet apart from SSH and Cockpit. Is this a > >> problem, or a red herring? > > Generally speaking, the deploy process connects first from the engine to > > the host via ssh (22), then (also) configures firewalld to allow access > > to vdsm (the oVirt host-side agent, port 54321), and later the engine > > normally communicates with the host via vdsm. > > > > Whether or not all of this worked, depends on exactly how you configured > > your host's firewalld beforehand. > > > > I suggest to start by not touching it, do the deployment, then see what > > it does/did (and that it worked), then decide how you are going to adapt > > your policy/tooling/whatever for later deployments, assuming you want to > > harden your hosts before deploying. > > > >> It seems that the hosted-engine is coming up and being installed and > >> configured ok. The engine health page looks ok (as validated by > >> Ansible). It looks like the hosted-engine is waiting for something to > >> happen on the host itself, but this never completed - which I suspect it > >> never will given that it cannot connect to the host. > > The deploy process runs on the host, connects to the engine, asks it to > > add the host, then waits until it sees the host in the engine with status > > 'Up'. It indeed does not try to further diagnose failures, nor fail more > > quickly - if it's 'Up' it's quick, if it's not, it will wait for a timeout > > (120 times * 10 seconds = 20 minutes). > > > >> Am I on the right track? > > You are :-). > > > > Good luck and best regards, > > > >> Yedidyah Bar David wrote on 19/01/2021 10:06: > >>> On Tue, Jan 19, 2021 at 11:44 AM <james.free...@a24.io> wrote: > >>>> Hi all > >>>> > >>>> I am in the process of migrating a RHV 4.3 setup to RHV 4.4 and > >>>> struggling with the setup. I am installing on RHEL 8.3, using settings > >>>> backed up from the RHV 4.3 install (via 'hosted-engine --deploy > >>>> --restore-from-file=backup.bck'). > >>>> > >>>> The install process always fails at the same point for me at the moment, > >>>> and I can't figure out how to get past it. As far as install progress > >>>> goes, the local hosted-engine comes up and runs on the node. I have been > >>>> able to grep for local_vm_ip in the logs, and can SSH into it with the > >>>> password I set during the setup phase. > >>>> > >>>> However the install playbooks always fail with: > >>>> 2021-01-18 18:38:00,086-0500 ERROR otopi.plugins.gr_he_common.core.misc > >>>> misc._terminate:167 Hosted Engine deployment failed: please check the > >>>> logs for the issue, fix accordingly or re-deploy from scratch. > >>>> > >>>> Earlier in the logs, I note the following: > >>>> 2021-01-18 18:34:51,258-0500 ERROR > >>>> otopi.ovirt_hosted_engine_setup.ansible_utils > >>>> ansible_utils._process_output:109 fatal: [localhost]: FAILED! => > >>>> {"changed": false, "msg": "Host is not up, please check logs, perhaps > >>>> also on the engine machine"} > >>>> 2021-01-18 18:37:16,661-0500 ERROR > >>>> otopi.ovirt_hosted_engine_setup.ansible_utils > >>>> ansible_utils._process_output:109 fatal: [localhost]: FAILED! => > >>>> {"changed": false, "msg": "The system may not be provisioned according > >>>> to the playbook results: please check the logs for the issue, fix > >>>> accordingly or re-deploy from scratch.\n"} > >>>> Traceback (most recent call last): > >>>> File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, > >>>> in _executeMethod > >>>> method['method']() > >>>> File > >>>> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", > >>>> line 435, in _closeup > >>>> raise RuntimeError(_('Failed executing ansible-playbook')) > >>>> RuntimeError: Failed executing ansible-playbook > >>>> 2021-01-18 18:37:18,996-0500 ERROR otopi.context > >>>> context._executeMethod:154 Failed to execute stage 'Closing up': Failed > >>>> executing ansible-playbook > >>>> 2021-01-18 18:37:32,421-0500 ERROR > >>>> otopi.ovirt_hosted_engine_setup.ansible_utils > >>>> ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => > >>>> {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: > >>>> connect to host rhvm.example.org port 22: No route to host", > >>>> "skip_reason": "Host localhost is unreachable", "unreachable": true} > >>>> > >>>> I find the unreachable message a bit odd, as at this stage all that has > >>>> happened is that the local hosted-engine has been brought up to be > >>>> configured, and so it is running on virbr0, not on my actual network. As > >>>> a result, that DNS address will never resolve, and the IP it resolves to > >>>> won't be up. I gave the installation script permission to modify the > >>>> local /etc/hosts but this hasn't improved things. > >>>> > >>>> I presume I'm missing something in the install process, or earlier on in > >>>> the logs, but I've been scanning for errors and possible clues to no > >>>> avail. > >>>> > >>>> Any and all help greatly appreciated! > >>> Please check/share, on the engine machine under /var/log/ovirt-engine, > >>> or, if inaccessible, on the host, under > >>> /var/log/ovirt-hosted-engine-setup/engine-logs-*: > >>> > >>> engine.log > >>> > >>> host-deploy/* > >>> > >>> Good luck and best regards, > > > -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YTDGOGHFVIE7GXFGOJ5PHGGRI4FZE4AK/