On Tue, Jan 19, 2021 at 12:59 PM James Freeman <james.free...@a24.io> wrote:
>
> So grateful for your help here - I ran tcpdump on the host, and I saw
> the connection requests to the host from the hosted-engine on 54321/tcp,
> so I was kind of getting there on the whole vdsm thing.
>
> The install just fell over again (same issue - the 120 second timeout
> you described). Taking a step back here, I think something is wrong very
> early on in my upgrade process. My environment is:
>
> 2 x RHEL based hosts (previously RHEL 7 - to be re-installed with RHEL 8
> as per install documentation)
> NFS based storage
> Self-hosted engine
>
> I have been following the documentation here:
>
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#SHE_Upgrading_from_4-3
>
> And specifically here:
>
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#Upgrading_the_Manager_to_4-4_4-3_SHE
>
> All pre-requisite steps are done - the 4.3 engine was upgraded to the
> latest version before the backup was taken and it was shut down.
>
> Now, I note that on my RHEL 8 host (newly installed), vdsmd is not
> configured or running. The deploy script is not opening the firewall for
> the temporary manager to talk to the host on 54321, but it wouldn't
> matter if it did - even if were open up the firewall, there's no
> configured vdsmd running for it to talk to anyway.
>
> I suddenly have the feeling that I've missed an important step that
> would have configured the freshly installed RHEL 8 host for the
> hosted-engine to be installed on - but I can't see what this might be.
> I've been back and forth through the documentation but I can't see where
> vdsmd would have been configured on the host. In short (ignoring all the

This should happen automatically, does not require a manual step on your side.

> failed attempts), my commands to install on a fresh RHEL 8 host have been:
>
> dnf module reset virt
> dnf module list virt
> dnf module enable virt:8.3
> dnf distro-sync --nobest
> dnf install rhvm-appliance
> reboot
> dnf install ovirt-hosted-engine-setup

Just to make sure, perhaps try also 'dnf install ovirt-host'.
If this does carry on additional requirements, perhaps that's a bug
somewhere. But I do not think this is what is failing you.

> dnf install firewalld
> systemctl status firewalld
> systemctl enable firewalld
> systemctl start firewalld

I do not think these are needed - the deploy process should do this.
Should be harmless, though.

> systemctl status firewalld
> hosted-engine --deploy --restore-from-file=backup.bck
>
> Am I missing something fundamental, or is there another step that's not
> working where vdsmd would have been configured?

Sorry, I ignored the fact that it's an upgrade/restore. In this case,
it's expected that the restored engine will not have access to all
other hosts during deploy, until it's started on the external network.
So I suggest to ignore most errors in engine.log and check only those
related to the host you deploy on. And check host-deploy/* logs.

For a general overview of the hosted-engine deploy process, you might
want to check 'Simone Tiraboschi - Hosted Engine 4.3 Deep Dive' in:

https://www.ovirt.org/community/archived_conferences_presentations.html

I think it's still the best presentation slides we have on this.

Good luck,

>
> Many thanks
>
> James
>
> Yedidyah Bar David wrote on 19/01/2021 10:36:
> > On Tue, Jan 19, 2021 at 12:25 PM James Freeman <james.free...@a24.io> wrote:
> >> Thanks Didi
> >>
> >> Great pointer - I have just performed a fresh deploy - am in the
> >> hosted-engine VM, and in /var/log/ovirt-engine/engine-log, I can see the
> >> following 3 lines cycling over and over again:
> >>
> >> 2021-01-19 05:12:11,395-05 INFO
> >> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
> >> Reactor) [] Connecting to rhvh1.example.org/192.168.50.31
> >> 2021-01-19 05:12:11,399-05 ERROR
> >> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96)
> >> [] Unable to RefreshCapabilities: ConnectException: Connection refused
> >> 2021-01-19 05:12:11,401-05 ERROR
> >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand]
> >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96)
> >> [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = rhvh1.example.org,
> >> VdsIdAndVdsVDSCommandParametersBase:{hostId='12057f7e-a4cf-46ec-b563-c1037ba5c62d',
> >> vds='Host[rhvh1.example.org,12057f7e-a4cf-46ec-b563-c1037ba5c62d]'})'
> >> execution failed: java.net.ConnectException: Connection refused
> >>
> >> I can ping 192.168.50.31 and resolve rhvh1.example.org - however I note
> >> that firewalld on the hypervisor host (192.168.50.31) hasn't had
> >> anything allowed through it yet apart from SSH and Cockpit. Is this a
> >> problem, or a red herring?
> > Generally speaking, the deploy process connects first from the engine to
> > the host via ssh (22), then (also) configures firewalld to allow access
> > to vdsm (the oVirt host-side agent, port 54321), and later the engine
> > normally communicates with the host via vdsm.
> >
> > Whether or not all of this worked, depends on exactly how you configured
> > your host's firewalld beforehand.
> >
> > I suggest to start by not touching it, do the deployment, then see what
> > it does/did (and that it worked), then decide how you are going to adapt
> > your policy/tooling/whatever for later deployments, assuming you want to
> > harden your hosts before deploying.
> >
> >> It seems that the hosted-engine is coming up and being installed and
> >> configured ok. The engine health page looks ok (as validated by
> >> Ansible). It looks like the hosted-engine is waiting for something to
> >> happen on the host itself, but this never completed - which I suspect it
> >> never will given that it cannot connect to the host.
> > The deploy process runs on the host, connects to the engine, asks it to
> > add the host, then waits until it sees the host in the engine with status
> > 'Up'. It indeed does not try to further diagnose failures, nor fail more
> > quickly - if it's 'Up' it's quick, if it's not, it will wait for a timeout
> > (120 times * 10 seconds = 20 minutes).
> >
> >> Am I on the right track?
> > You are :-).
> >
> > Good luck and best regards,
> >
> >> Yedidyah Bar David wrote on 19/01/2021 10:06:
> >>> On Tue, Jan 19, 2021 at 11:44 AM <james.free...@a24.io> wrote:
> >>>> Hi all
> >>>>
> >>>> I am in the process of migrating a RHV 4.3 setup to RHV 4.4 and 
> >>>> struggling with the setup. I am installing on RHEL 8.3, using settings 
> >>>> backed up from the RHV 4.3 install (via 'hosted-engine --deploy 
> >>>> --restore-from-file=backup.bck').
> >>>>
> >>>> The install process always fails at the same point for me at the moment, 
> >>>> and I can't figure out how to get past it. As far as install progress 
> >>>> goes, the local hosted-engine comes up and runs on the node. I have been 
> >>>> able to grep for local_vm_ip in the logs, and can SSH into it with the 
> >>>> password I set during the setup phase.
> >>>>
> >>>> However the install playbooks always fail with:
> >>>> 2021-01-18 18:38:00,086-0500 ERROR otopi.plugins.gr_he_common.core.misc 
> >>>> misc._terminate:167 Hosted Engine deployment failed: please check the 
> >>>> logs for the issue, fix accordingly or re-deploy from scratch.
> >>>>
> >>>> Earlier in the logs, I note the following:
> >>>> 2021-01-18 18:34:51,258-0500 ERROR 
> >>>> otopi.ovirt_hosted_engine_setup.ansible_utils 
> >>>> ansible_utils._process_output:109 fatal: [localhost]: FAILED! => 
> >>>> {"changed": false, "msg": "Host is not up, please check logs, perhaps 
> >>>> also on the engine machine"}
> >>>> 2021-01-18 18:37:16,661-0500 ERROR 
> >>>> otopi.ovirt_hosted_engine_setup.ansible_utils 
> >>>> ansible_utils._process_output:109 fatal: [localhost]: FAILED! => 
> >>>> {"changed": false, "msg": "The system may not be provisioned according 
> >>>> to the playbook results: please check the logs for the issue, fix 
> >>>> accordingly or re-deploy from scratch.\n"}
> >>>> Traceback (most recent call last):
> >>>>     File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, 
> >>>> in _executeMethod
> >>>>       method['method']()
> >>>>     File 
> >>>> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py",
> >>>>  line 435, in _closeup
> >>>>       raise RuntimeError(_('Failed executing ansible-playbook'))
> >>>> RuntimeError: Failed executing ansible-playbook
> >>>> 2021-01-18 18:37:18,996-0500 ERROR otopi.context 
> >>>> context._executeMethod:154 Failed to execute stage 'Closing up': Failed 
> >>>> executing ansible-playbook
> >>>> 2021-01-18 18:37:32,421-0500 ERROR 
> >>>> otopi.ovirt_hosted_engine_setup.ansible_utils 
> >>>> ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => 
> >>>> {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: 
> >>>> connect to host rhvm.example.org port 22: No route to host", 
> >>>> "skip_reason": "Host localhost is unreachable", "unreachable": true}
> >>>>
> >>>> I find the unreachable message a bit odd, as at this stage all that has 
> >>>> happened is that the local hosted-engine has been brought up to be 
> >>>> configured, and so it is running on virbr0, not on my actual network. As 
> >>>> a result, that DNS address will never resolve, and the IP it resolves to 
> >>>> won't be up. I gave the installation script permission to modify the 
> >>>> local /etc/hosts but this hasn't improved things.
> >>>>
> >>>> I presume I'm missing something in the install process, or earlier on in 
> >>>> the logs, but I've been scanning for errors and possible clues to no 
> >>>> avail.
> >>>>
> >>>> Any and all help greatly appreciated!
> >>> Please check/share, on the engine machine under /var/log/ovirt-engine,
> >>> or, if inaccessible, on the host, under
> >>> /var/log/ovirt-hosted-engine-setup/engine-logs-*:
> >>>
> >>> engine.log
> >>>
> >>> host-deploy/*
> >>>
> >>> Good luck and best regards,
> >
>


-- 
Didi
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YTDGOGHFVIE7GXFGOJ5PHGGRI4FZE4AK/

Reply via email to