Hi Gianluca, On Fri, Sep 10, 2021 at 10:04 AM Gianluca Cecchi <gianluca.cec...@gmail.com> wrote: > > > On Wed, Sep 1, 2021 at 4:26 PM Gianluca Cecchi <gianluca.cec...@gmail.com> > wrote: >> >> On Wed, Sep 1, 2021 at 4:00 PM Yedidyah Bar David <d...@redhat.com> wrote: >>> >>> >>> > >>> > So I think there was something wrong with my system or probably a >>> > regression on this in 4.4.8. >>> > >>> > I see these lines in ansible steps of deploy of RHV 4.3 -> 4.4 >>> > >>> > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Remove host used to >>> > redeploy] >>> > [ INFO ] changed: [localhost -> 192.168.222.170] >>> > >>> > possibly this step should remove the host that I'm reinstalling...? >>> >>> It should. From the DB, before adding it again. Matches on the uuid >>> (search the code for unique_id_out if you want the details). Why? >>> >>> (I didn't follow all this thread, ignoring the rest for now...) >>> >>> Best regards, >>> >>> >> >> It was the step I suspect there was a regression for in 4.4.8 (comparing >> with 4.4.7) when updating the first hosted-engine host during the upgrade >> flow and retaining its hostname details.
What's the regression? >> I'm going to test with latest async 2 4.4.8 and see if it solves the >> problem. Otherwise I'm going to open a bugzilla sending the logs. Can you clarify what the bug is? >> >> Gianluca > > > So tried with 4.4.8 async 2 but the same problem > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Check actual cluster > location] > [ INFO ] skipping: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Enable GlusterFS at cluster > level] > [ INFO ] skipping: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Set VLAN ID at datacenter > level] > [ INFO ] skipping: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Get active list of active > firewalld zones] > [ INFO ] changed: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Configure libvirt firewalld > zone] > [ INFO ] changed: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add host] > [ INFO ] changed: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Include after_add_host > tasks files] > [ INFO ] You can now connect to > https://novirt2.localdomain.local:6900/ovirt-engine/ and check the status of > this host and eventually remediate it, please continue only when the host is > listed as 'up' > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] > [ INFO ] ok: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] > [ INFO ] changed: [localhost -> localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until > /tmp/ansible.wy3ichvk_he_setup_lock is removed, delete it once ready to > proceed] > > the host keeps remaining as NoNResponsive in local engine and in engine.log > the same > > 2021-09-10 08:44:51,481+02 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [] > Command 'GetCapabilitiesAsyncVDSCommand(HostName = novirt2.localdomain.local, > VdsIdAndVdsVDSCommandParametersBase:{hostId='ca9ff6f7-5a7c-4168-9632-998c52f76cfa', > > vds='Host[novirt2.localdomain.local,ca9ff6f7-5a7c-4168-9632-998c52f76cfa]'})' > execution failed: java.net.ConnectException: Connection refused > > so the initial install/config of novirt2 doesn't start > > So the scenario is > > initial 4.3.10 with 2 hosts (novirt1 and novirt2) and 1 she engine (novmgr) > iSCSI based storage: hosted_engine storage domain and one data storage domain > > This is nested env so that through snapshots I can try and repeat steps. > novirt1 and novirt2 are two VMS under one oVirt 4.4 env composed by one > single host and an external engine > > the steps: > 1 vm running under novirt1 and hosted engine running under novir2 at the > beginning > . global maintenance > . stop engine > . backup > . shutdown engine vm and scratch novirt2 > actually I simulate scenario where I deploy novirt2 on a new hw, that is a > clone of novirt2 VM > Already tested (in previous version of 4.4.8) that if I go through a > different hostname it works Correct > As novirt2 and novirt1 (in 4.3) are VMS running on the same hypervisor I see > that in their hw details I have the same serial number and the usual random > uuid Same serial number? Doesn't sound right. Any idea why it's the same? > > novirt1 > uuid B1EF9AFF-D4BD-41A1-B26E-7DD0CC440963 > serial number 00fa984c-d5a1-e811-906e-00163566263e > > novirt2 > uuid D584E962-5461-4FA5-AFFA-DB413E17590C > serial number 00fa984c-d5a1-e811-906e-00163566263e > > and the new novirt2 that has a different uuid, being a clone has (from > dmidecode) > uuid: 10b9031d-a475-4b41-a134-bad2ede3cf11 > serial Number: 00fa984c-d5a1-e811-906e-00163566263e > > Unfortunately I cannot try at the moment the scenario where I deploy the new > novirt2 on the same virtual hw, because in the first 4.3 install I configured > the OS disk as 50Gb and with this size 4.4.8 complains about insufficient > space. And having the snapshot active in preview I cannot resize the disk > Eventually I can reinstall 4.3 on an 80Gb disk and try the same, maintaining > the same hw ... but this would imply that in general I cannot upgrade using > different hw and reusing the same hostnames.... correct? Yes. Either reuse a host and keep its name (what we recommend in the upgrade guide) or use a new host and a new name (backup/restore guide). The condition to remove the host prior to adding it is based on unique_id_out, which is set in (see also bz 1642440, 1654697): - name: Get host unique id shell: | if [ -e /etc/vdsm/vdsm.id ]; then cat /etc/vdsm/vdsm.id; elif [ -e /proc/device-tree/system-id ]; then cat /proc/device-tree/system-id; #ppc64le else dmidecode -s system-uuid; fi; environment: "{{ he_cmd_lang }}" changed_when: true register: unique_id_out So if you want to "make this work", you can set the uuid (either in your (virtual) BIOS, to affect the /proc value, or in /etc/vdsm/vdsm.id) to match the one of the old host (the one you want to reuse its name). I didn't test this myself, though. Some time ago we had a similar case, and as a result I filed this doc bug (fixed since, so docs are updated - both admin/upgrade and backup/restore guides - also oVirt, despite the bug being on RHV and despite RHV and oVirt docs not being automatically in sync): https://bugzilla.redhat.com/show_bug.cgi?id=1921048 > > anyway if you want to check generated logs at local engine side and novirt2 > side here they are: > > Contents under /var/log of novmgr (tar.gz format) > https://drive.google.com/file/d/1e4WwN4D8GDBpsGqwpwM40MGcLISeOzGO/view?usp=sharing > > Contents under /var/log/of novirt2 (tar.gz format) > https://drive.google.com/file/d/1uQxlsbPVclW4xcAbCP8dXyIF2HlLqaR-/view?usp=sharing 2021-09-10 00:52:00,950+0200 INFO ansible task start {'status': 'OK', 'ansible_type': 'task', 'ansible_playbook': '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_task': 'ovirt.ovirt.hosted_engine_setup : Remove host used to redeploy'} 2021-09-10 00:52:00,951+0200 DEBUG ansible on_any args TASK: ovirt.ovirt.hosted_engine_setup : Remove host used to redeploy kwargs is_conditional:False 2021-09-10 00:52:00,951+0200 DEBUG ansible on_any args localhost TASK: ovirt.ovirt.hosted_engine_setup : Remove host used to redeploy kwargs 2021-09-10 00:52:01,688+0200 INFO ansible ok {'status': 'OK', 'ansible_type': 'task', 'ansible_playbook': '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_host': 'localhost', 'ansible_task': 'Remove host used to redeploy', 'task_duration': 0} 2021-09-10 00:52:01,688+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f26931196a0> kwargs 2021-09-10 00:52:01,868+0200 DEBUG var changed: host "localhost" var "db_remove_he_host" type "<class 'dict'>" value: "{ "changed": true, "cmd": [ "psql", "-d", "engine", "-c", "SELECT deletevds(vds_id) FROM (SELECT vds_id FROM vds WHERE upper(vds_unique_id)=upper('10b9031d-a475-4b41-a134-bad2ede3cf11')) t" ], "delta": "0:00:00.021140", "end": "2021-09-10 00:52:01.467764", "failed": false, "rc": 0, "start": "2021-09-10 00:52:01.446624", "stderr": "", "stderr_lines": [], "stdout": " deletevds \n-----------\n(0 rows)", "stdout_lines": [ " deletevds ", "-----------", "(0 rows)" ] }" Meaning, it wasn't removed. Perhaps, if you do want to open a bug, it should say something like: "HE deploy should remove the old host based on its name, and not its UUID". However, it's not completely clear to me that this won't introduce new regressions. I admit I didn't completely understand your flow, and especially your considerations there. If you think the current behavior prevents an important flow, please clarify. Best regards, -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/562FRV6U7RCNEUZ5YDICJLN2VJ2OOVQN/