This is the result of "hosted-engine --vm-status" on the first node, which currently runs the hosted-engine:
--== Host ipc1.dc (id: 1) status ==-- Host ID : 1 Host timestamp : 89980 Score : 3400 Engine status : {"vm": "up", "health": "good", "detail": "Up"} Hostname : ipc1.dc Local maintenance : False stopped : False crc32 : 256cb440 conf_on_shared_storage : True local_conf_timestamp : 89980 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=89980 (Tue Sep 15 16:17:00 2020) host-id=1 score=3400 vm_conf_refresh_time=89980 (Tue Sep 15 16:17:00 2020) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host ipc3.dc (id: 2) status ==-- Host ID : 2 Host timestamp : 65213 Score : 3400 Engine status : unknown stale-data Hostname : ipc3.dc Local maintenance : False stopped : False crc32 : c4f62c8b conf_on_shared_storage : True local_conf_timestamp : 65213 Status up-to-date : False Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=65213 (Wed Sep 9 11:01:18 2020) host-id=2 score=3400 vm_conf_refresh_time=65213 (Wed Sep 9 11:01:18 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host ipc2.dc (id: 3) status ==-- Host ID : 3 Host timestamp : 93167 Score : 3400 Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"} Hostname : ipc2.dc Local maintenance : False stopped : False crc32 : f02f19b0 conf_on_shared_storage : True local_conf_timestamp : 93167 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=93167 (Tue Sep 15 16:16:58 2020) host-id=3 score=3400 vm_conf_refresh_time=93167 (Tue Sep 15 16:16:58 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False For the new added node it is: "The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. " But the mentioned service status seem to be ok, too. But Actually I've noticed it restarting from time to time. ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-09-15 10:13:11 EDT; 2min 11s ago Main PID: 23971 (ovirt-ha-broker) Tasks: 11 (limit: 100744) Memory: 29.3M CGroup: /system.slice/ovirt-ha-broker.service └─23971 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker Sep 15 10:13:11 ipc3.dc systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-09-15 10:13:22 EDT; 2min 1s ago Main PID: 24165 (ovirt-ha-agent) Tasks: 2 (limit: 100744) Memory: 27.2M CGroup: /system.slice/ovirt-ha-agent.service └─24165 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent Sometimes it says: ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Tue 2020-09-15 10:23:15 EDT; 4s ago Process: 28372 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 28372 (code=exited, status=157) And sometimes it's: ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-09-15 10:23:14 EDT; 5min ago Main PID: 28370 (ovirt-ha-broker) Tasks: 11 (limit: 100744) Memory: 29.7M CGroup: /system.slice/ovirt-ha-broker.service └─28370 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker Sep 15 10:23:14 ipc3.dc systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Sep 15 10:27:31 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR Failed to start monitoring domain (sd_uuid=e83f0c32-bb91-4909-8e80-6fa974b61968, >Sep 15 10:27:31 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.Action.start_domain_monitor ERROR Error in RPC call: Failed to start monitoring domain (sd_uuid=e83f0c32-bb>Sep 15 10:28:02 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection refused Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 29, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python3.6/smtplib.py", line 251, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python3.6/smtplib.py", line 336, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket self.source_address) File "/usr/lib64/python3.6/socket.py", line 724, in create_connection raise err File "/usr/lib64/python3.6/socket.py", line 713, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-09-15 10:23:25 EDT; 5min ago Main PID: 28520 (ovirt-ha-agent) Tasks: 2 (limit: 100744) Memory: 27.8M CGroup: /system.slice/ovirt-ha-agent.service └─28520 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent Sep 15 10:23:25 ipc3.dc systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Sep 15 10:28:02 ipc3.dc ovirt-ha-agent[28520]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'volu> (code=100, message=Cannot inquire Lease(name='66b004b7-504c-4376-acc1-27890b17213b', path='/rhev/data-center/mnt/glusterSD/ipc1.dc:_engine/e83f0c32-bb91-4909-8e80-> I think at this point we've even managed to make it worse. Now we got several different problems on all 3 nodes, like: - HSMGetTaskStatusVDS failed - SpmStopVDS failed - HSMGetAllTasksStatusesVDS failed - Sync errors We're going to reinstall the whole cluster from scratch. But I think the initial issue/scenario to replace (add) a host and make it run the hosted-engine is still not solved at this point. Thanks and greetings Marcus -----Ursprüngliche Nachricht----- Von: Yedidyah Bar David <d...@redhat.com> Gesendet: Dienstag, 15. September 2020 14:04 An: Rapsilber, Marcus <marcus.rapsil...@isotravel.com> Cc: users <users@ovirt.org> Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted engine On Tue, Sep 15, 2020 at 2:40 PM Rapsilber, Marcus <marcus.rapsil...@isotravel.com> wrote: > > I'm not sure if this log files tells anything about the problem why the node > "ipc3.dc" isn't capable to run the hosted engine. > Today we tried the whole procedure again. But this time we didn't install the > new node via single node cluster setup. It was a manual setup of the cluster > storage. When we've added the host ("New Host") we made sure that "Hosted > engine deployment action" was set to "deploy". Nevertheless we're still not > able to allow the new node to run the hosted engine. The grey crown is > missing. What's the output of 'hosted-engine --vm-status' on this host, and on other hosts (that are ok)? > > What is the criteria for a host to be able to run the hosted engine? Is some > special service required? > Do we have to install another package? Or is there an Ansible script that > does the required setup? Generally speaking, it should be fully automatic, if you mark the checkbox in "Add host", and AFAICT, the log you attached looks ok. Also: - The host needs to be in the same DC/cluster, needs to have access to the shared storage, etc. You can try to start the services manually, if they are not up: systemctl status ovirt-ha-broker ovirt-ha-agent systemctl start ovirt-ha-broker ovirt-ha-agent - and/or check their logs, in /var/log/ovirt-hosted-engine-ha . Best regards, > > Thanks and greetings > Marcus > > -----Ursprüngliche Nachricht----- > Von: Yedidyah Bar David <d...@redhat.com> > Gesendet: Dienstag, 15. September 2020 09:33 > An: Rapsilber, Marcus <marcus.rapsil...@isotravel.com> > Cc: users <users@ovirt.org> > Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted > engine > > On Tue, Sep 15, 2020 at 10:10 AM Rapsilber, Marcus > <marcus.rapsil...@isotravel.com> wrote: > > > > Hello again, > > > > to answer your question, how did I make a clean install and reintegrate the > > node in the cluster? Maybe my approach was a bit awkward/inconvenient, but > > this is what I did: > > - Install CentOS 8 > > - Install oVirt Repository and packages: cockpit-ovirt-dashboard, > > vdsm-gluster, ovirt-host > > - Remove the Gluster bricks of the old node from the > > data/engine/vmstore volumes > > - Process a single cluster node installation on the new node via the > > oVirt Dashboard, in order to setup Gluster and the bricks > > (hosted-engine setup was skipped) > > - On the new node: Delete the vmstore/engine/data volumes and the > > file metadata in the bricks folder > > - Added the bricks to the volumes of the existing cluster again > > - Added the host to the cluster > > > > Would you suggest a better approach to setup a new node for an existing > > cluster? > > Sorry, I have no experience with gluster, so can't comment on your particular > steps, although they sound reasonable. > the main missing thing is enabling hosted-engine when adding the host to the > engine. > > > > > At this point I'm not sure if I just overlooked the "hosted engine > > deployment action" when I've added the new host. Unfortunately I cannot try > > to edit the host anymore since my colleague did another reinstall of the > > node. > > Very well. > > If this happens again, please tell us. > > Best regards, > > > > > Thanks so far and greetings, > > Marcus > > > > -----Ursprüngliche Nachricht----- > > Von: Yedidyah Bar David <d...@redhat.com> > > Gesendet: Montag, 14. September 2020 10:56 > > An: Rapsilber, Marcus <marcus.rapsil...@isotravel.com> > > Cc: users <users@ovirt.org> > > Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted > > engine > > > > On Mon, Sep 14, 2020 at 11:18 AM <r...@isogmbh.de> wrote: > > > > > > Hi there, > > > > > > currently my team is evaluating oVirt and we're also testing several fail > > > scenarios, backup and so on. > > > One scenario was: > > > - hyperconverged oVirt cluster with 3 nodes > > > - self-hosted engine > > > - simulate the break down of one of the nodes by power off > > > - to replace it make a clean install of a new node and reintegrate > > > it in the cluster > > > > How exactly did you do that? > > > > > > > > Actually everything worked out fine. The new installed node and related > > > bricks (vmstore, data, engine) were added to the existing Gluster storage > > > and it was added to the oVirt cluster (as host). > > > > > > But there's one remaining problem: The new host doesn't have the grey > > > crown, which means it's unable to run the hosted engine. How can I > > > achieve that? > > > I also found out that the ovirt-ha-agent and ovirt-ha-broker isn't > > > started/enabled on that node. Reason is that the > > > /etc/ovirt-hosted-engine/hosted-engine.conf doesn't exist. I guess this > > > is not only a problem concerning the hosted engine, but also for HA VM's. > > > > When you add a host to the engine, one of the options in the dialog is to > > deploy it as a hosted-engine. > > If you don't, you won't get this crown, nor these services, nor its status > > in 'hosted-engine --vm-status'. > > > > If you didn't, perhaps try to move to maintenance and reinstall, adding > > this option. > > > > If you did choose it, that's perhaps a bug - please check/share relevant > > logs (e.g. in /var/log/ovirt-engine, including host-deploy/). > > > > Best regards, > > > > > > > > Thank you for any advice and greetings, Marcus > > > _______________________________________________ > > > Users mailing list -- users@ovirt.org To unsubscribe send an email > > > to users-le...@ovirt.org Privacy > > > Statement: > > > https://protection.retarus.com/v1?u=https%3A%2F%2Fwww.ovirt.org%2F > > > pr > > > iv > > > acy-policy.html&c=3ilYjgr&r=338RVlOwLz6SWhhP16s8RO&k=7s1&s=i9ZtAxZ > > > 4H jh a7cyQljzYgZoSsOuJ5qnJkh0cU75rfgL oVirt Code of Conduct: > > > https://protection.retarus.com/v1?u=https%3A%2F%2Fwww.ovirt.org%2F > > > co > > > mm > > > unity%2Fabout%2Fcommunity-guidelines%2F&c=3ilYjgr&r=5VIqhyv90pUj07 > > > OG Zz 9qix&k=7s1&s=UiOTbmf9BSOB46ff91IjO7G8dkMWHzi2GOIcveqAySn > > > List Archives: > > > > > > > > -- > > Didi > > > > > -- > Didi > -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HCR7PBUGOPDIZYLXETAJTCZUG3FMCPZB/