This is the result of "hosted-engine --vm-status" on the first node, which 
currently runs the hosted-engine:

--== Host ipc1.dc (id: 1) status ==--

Host ID                            : 1
Host timestamp                     : 89980
Score                              : 3400
Engine status                      : {"vm": "up", "health": "good", "detail": 
"Up"}
Hostname                           : ipc1.dc
Local maintenance                  : False
stopped                            : False
crc32                              : 256cb440
conf_on_shared_storage             : True
local_conf_timestamp               : 89980
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=89980 (Tue Sep 15 16:17:00 2020)
        host-id=1
        score=3400
        vm_conf_refresh_time=89980 (Tue Sep 15 16:17:00 2020)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False


--== Host ipc3.dc (id: 2) status ==--

Host ID                            : 2
Host timestamp                     : 65213
Score                              : 3400
Engine status                      : unknown stale-data
Hostname                           : ipc3.dc
Local maintenance                  : False
stopped                            : False
crc32                              : c4f62c8b
conf_on_shared_storage             : True
local_conf_timestamp               : 65213
Status up-to-date                  : False
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=65213 (Wed Sep  9 11:01:18 2020)
        host-id=2
        score=3400
        vm_conf_refresh_time=65213 (Wed Sep  9 11:01:18 2020)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host ipc2.dc (id: 3) status ==--

Host ID                            : 3
Host timestamp                     : 93167
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": 
"unknown", "reason": "vm not running on this host"}
Hostname                           : ipc2.dc
Local maintenance                  : False
stopped                            : False
crc32                              : f02f19b0
conf_on_shared_storage             : True
local_conf_timestamp               : 93167
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=93167 (Tue Sep 15 16:16:58 2020)
        host-id=3
        score=3400
        vm_conf_refresh_time=93167 (Tue Sep 15 16:16:58 2020)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

For the new added node it is:
"The hosted engine configuration has not been retrieved from shared storage. 
Please ensure that ovirt-ha-agent is running and the storage server is 
reachable. "

But the mentioned service status seem to be ok, too. But Actually I've noticed 
it restarting from time to time.
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability 
Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; 
vendor preset: disabled)
   Active: active (running) since Tue 2020-09-15 10:13:11 EDT; 2min 11s ago
 Main PID: 23971 (ovirt-ha-broker)
    Tasks: 11 (limit: 100744)
   Memory: 29.3M
   CGroup: /system.slice/ovirt-ha-broker.service
           └─23971 /usr/libexec/platform-python 
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker

Sep 15 10:13:11 ipc3.dc systemd[1]: Started oVirt Hosted Engine High 
Availability Communications Broker.

● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring 
Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; 
vendor preset: disabled)
   Active: active (running) since Tue 2020-09-15 10:13:22 EDT; 2min 1s ago
 Main PID: 24165 (ovirt-ha-agent)
    Tasks: 2 (limit: 100744)
   Memory: 27.2M
   CGroup: /system.slice/ovirt-ha-agent.service
           └─24165 /usr/libexec/platform-python 
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Sometimes it says:
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring 
Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; 
vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2020-09-15 
10:23:15 EDT; 4s ago
  Process: 28372 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent 
(code=exited, status=157)
 Main PID: 28372 (code=exited, status=157)

And sometimes it's:
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability 
Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; 
vendor preset: disabled)
   Active: active (running) since Tue 2020-09-15 10:23:14 EDT; 5min ago
 Main PID: 28370 (ovirt-ha-broker)
    Tasks: 11 (limit: 100744)
   Memory: 29.7M
   CGroup: /system.slice/ovirt-ha-broker.service
           └─28370 /usr/libexec/platform-python 
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker

Sep 15 10:23:14 ipc3.dc systemd[1]: Started oVirt Hosted Engine High 
Availability Communications Broker.
Sep 15 10:27:31 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker 
ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR Failed to 
start monitoring domain (sd_uuid=e83f0c32-bb91-4909-8e80-6fa974b61968, >Sep 15 
10:27:31 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker 
ovirt_hosted_engine_ha.broker.listener.Action.start_domain_monitor ERROR Error 
in RPC call: Failed to start monitoring domain (sd_uuid=e83f0c32-bb>Sep 15 
10:28:02 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker 
ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] 
Connection refused
                                                Traceback (most recent call 
last):
                                                  File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
 line 29, in send_email
                                                    
timeout=float(cfg["smtp-timeout"]))
                                                  File 
"/usr/lib64/python3.6/smtplib.py", line 251, in __init__
                                                    (code, msg) = 
self.connect(host, port)
                                                  File 
"/usr/lib64/python3.6/smtplib.py", line 336, in connect
                                                    self.sock = 
self._get_socket(host, port, self.timeout)
                                                  File 
"/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket
                                                    self.source_address)
                                                  File 
"/usr/lib64/python3.6/socket.py", line 724, in create_connection
                                                    raise err
                                                  File 
"/usr/lib64/python3.6/socket.py", line 713, in create_connection
                                                    sock.connect(sa)
                                                ConnectionRefusedError: [Errno 
111] Connection refused

● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring 
Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; 
vendor preset: disabled)
   Active: active (running) since Tue 2020-09-15 10:23:25 EDT; 5min ago
 Main PID: 28520 (ovirt-ha-agent)
    Tasks: 2 (limit: 100744)
   Memory: 27.8M
   CGroup: /system.slice/ovirt-ha-agent.service
           └─28520 /usr/libexec/platform-python 
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Sep 15 10:23:25 ipc3.dc systemd[1]: Started oVirt Hosted Engine High 
Availability Monitoring Agent.
Sep 15 10:28:02 ipc3.dc ovirt-ha-agent[28520]: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed 
scanning for OVF_STORE due to Command Volume.getInfo with args {'volu>
                                               (code=100, message=Cannot 
inquire Lease(name='66b004b7-504c-4376-acc1-27890b17213b', 
path='/rhev/data-center/mnt/glusterSD/ipc1.dc:_engine/e83f0c32-bb91-4909-8e80->


I think at this point we've even managed to make it worse. Now we got several 
different problems on all 3 nodes, like:
- HSMGetTaskStatusVDS failed
- SpmStopVDS failed
- HSMGetAllTasksStatusesVDS failed
- Sync errors

We're going to reinstall the whole cluster from scratch.
But I think the initial issue/scenario to replace (add) a host and make it run 
the hosted-engine is still not solved at this point.

Thanks and greetings
Marcus


-----Ursprüngliche Nachricht-----
Von: Yedidyah Bar David <d...@redhat.com> 
Gesendet: Dienstag, 15. September 2020 14:04
An: Rapsilber, Marcus <marcus.rapsil...@isotravel.com>
Cc: users <users@ovirt.org>
Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted engine

On Tue, Sep 15, 2020 at 2:40 PM Rapsilber, Marcus 
<marcus.rapsil...@isotravel.com> wrote:
>
> I'm not sure if this log files tells anything about the problem why the node 
> "ipc3.dc" isn't capable to run the hosted engine.
> Today we tried the whole procedure again. But this time we didn't install the 
> new node via single node cluster setup. It was a manual setup of the cluster 
> storage. When we've added the host ("New Host") we made sure that "Hosted 
> engine deployment action" was set to "deploy". Nevertheless we're still not 
> able to allow the new node to run the hosted engine. The grey crown is 
> missing.

What's the output of 'hosted-engine --vm-status' on this host, and on other 
hosts (that are ok)?

>
> What is the criteria for a host to be able to run the hosted engine? Is some 
> special service required?
> Do we have to install another package? Or is there an Ansible script that 
> does the required setup?

Generally speaking, it should be fully automatic, if you mark the checkbox in 
"Add host", and AFAICT, the log you attached looks ok.

Also:

- The host needs to be in the same DC/cluster, needs to have access to the 
shared storage, etc.

You can try to start the services manually, if they are not up:

systemctl status ovirt-ha-broker ovirt-ha-agent systemctl start ovirt-ha-broker 
ovirt-ha-agent

- and/or check their logs, in /var/log/ovirt-hosted-engine-ha .

Best regards,

>
> Thanks and greetings
> Marcus
>
> -----Ursprüngliche Nachricht-----
> Von: Yedidyah Bar David <d...@redhat.com>
> Gesendet: Dienstag, 15. September 2020 09:33
> An: Rapsilber, Marcus <marcus.rapsil...@isotravel.com>
> Cc: users <users@ovirt.org>
> Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted 
> engine
>
> On Tue, Sep 15, 2020 at 10:10 AM Rapsilber, Marcus 
> <marcus.rapsil...@isotravel.com> wrote:
> >
> > Hello again,
> >
> > to answer your question, how did I make a clean install and reintegrate the 
> > node in the cluster? Maybe my approach was a bit awkward/inconvenient, but 
> > this is what I did:
> > - Install CentOS 8
> > - Install oVirt Repository and packages: cockpit-ovirt-dashboard, 
> > vdsm-gluster, ovirt-host
> > - Remove the Gluster bricks of the old node from the 
> > data/engine/vmstore volumes
> > - Process a single cluster node installation on the new node via the 
> > oVirt Dashboard, in order to setup Gluster and the bricks 
> > (hosted-engine setup was skipped)
> > - On the new node: Delete the vmstore/engine/data volumes and the 
> > file metadata in the bricks folder
> > - Added the bricks to the volumes of the existing cluster again
> > - Added the host to the cluster
> >
> > Would you suggest a better approach to setup a new node for an existing 
> > cluster?
>
> Sorry, I have no experience with gluster, so can't comment on your particular 
> steps, although they sound reasonable.
> the main missing thing is enabling hosted-engine when adding the host to the 
> engine.
>
> >
> > At this point I'm not sure if I just overlooked the "hosted engine 
> > deployment action" when I've added the new host. Unfortunately I cannot try 
> > to edit the host anymore since my colleague did another reinstall of the 
> > node.
>
> Very well.
>
> If this happens again, please tell us.
>
> Best regards,
>
> >
> > Thanks so far and greetings,
> > Marcus
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Yedidyah Bar David <d...@redhat.com>
> > Gesendet: Montag, 14. September 2020 10:56
> > An: Rapsilber, Marcus <marcus.rapsil...@isotravel.com>
> > Cc: users <users@ovirt.org>
> > Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted 
> > engine
> >
> > On Mon, Sep 14, 2020 at 11:18 AM <r...@isogmbh.de> wrote:
> > >
> > > Hi there,
> > >
> > > currently my team is evaluating oVirt and we're also testing several fail 
> > > scenarios, backup and so on.
> > > One scenario was:
> > > - hyperconverged oVirt cluster with 3 nodes
> > > - self-hosted engine
> > > - simulate the break down of one of the nodes by power off
> > > - to replace it make a clean install of a new node and reintegrate 
> > > it in the cluster
> >
> > How exactly did you do that?
> >
> > >
> > > Actually everything worked out fine. The new installed node and related 
> > > bricks (vmstore, data, engine) were added to the existing Gluster storage 
> > > and it was added to the oVirt cluster (as host).
> > >
> > > But there's one remaining problem: The new host doesn't have the grey 
> > > crown, which means it's unable to run the hosted engine. How can I 
> > > achieve that?
> > > I also found out that the ovirt-ha-agent and ovirt-ha-broker isn't 
> > > started/enabled on that node. Reason is that the 
> > > /etc/ovirt-hosted-engine/hosted-engine.conf doesn't exist. I guess this 
> > > is not only a problem concerning the hosted engine, but also for HA VM's.
> >
> > When you add a host to the engine, one of the options in the dialog is to 
> > deploy it as a hosted-engine.
> > If you don't, you won't get this crown, nor these services, nor its status 
> > in 'hosted-engine --vm-status'.
> >
> > If you didn't, perhaps try to move to maintenance and reinstall, adding 
> > this option.
> >
> > If you did choose it, that's perhaps a bug - please check/share relevant 
> > logs (e.g. in /var/log/ovirt-engine, including host-deploy/).
> >
> > Best regards,
> >
> > >
> > > Thank you for any advice and greetings, Marcus 
> > > _______________________________________________
> > > Users mailing list -- users@ovirt.org To unsubscribe send an email 
> > > to users-le...@ovirt.org Privacy
> > > Statement:
> > > https://protection.retarus.com/v1?u=https%3A%2F%2Fwww.ovirt.org%2F
> > > pr
> > > iv
> > > acy-policy.html&c=3ilYjgr&r=338RVlOwLz6SWhhP16s8RO&k=7s1&s=i9ZtAxZ
> > > 4H jh a7cyQljzYgZoSsOuJ5qnJkh0cU75rfgL oVirt Code of Conduct:
> > > https://protection.retarus.com/v1?u=https%3A%2F%2Fwww.ovirt.org%2F
> > > co
> > > mm
> > > unity%2Fabout%2Fcommunity-guidelines%2F&c=3ilYjgr&r=5VIqhyv90pUj07
> > > OG Zz 9qix&k=7s1&s=UiOTbmf9BSOB46ff91IjO7G8dkMWHzi2GOIcveqAySn
> > > List Archives:
> >
> >
> >
> > --
> > Didi
> >
>
>
> --
> Didi
>


--
Didi

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HCR7PBUGOPDIZYLXETAJTCZUG3FMCPZB/

Reply via email to