Hi list!

on a hyperconverged cluster with three hosts I am unable to start the 
ovirt-ha-agent.

The history:

As all three hosts were running Centos 8, I tried to upgrade host3 to Centos 8 
Stream first and left all VMs and host1 and host2 untouched, basically as a 
test. After all migrations of VMs to host3 failed with: 

```
qemu-kvm: error while loading state for instance 0x0 of device 
'0000:00:01.0/pcie-root-port'#0122021-12-24T00:56:49.428234Z
qemu-kvm: load of migration failed: Invalid argument
```

and since I haven't had the time to dig into that, I decided to roll back the 
upgrade and rebooted host3 into Centos 8 again and re-installed host3 through 
the engine appliance. During that process (and the restart of host3) the engine 
appliance became unresponsive and crashed.

The problem:

Currently all ovirt-ha-agent services on all hosts fail with the following 
message in /var/log/ovirt-hosted-engine-ha/agent.log

```
MainThread::INFO::2021-12-24 
03:56:03,500::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 2.4.9 started
MainThread::INFO::2021-12-24 
03:56:03,516::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
 Certificate common name not found, using hostname to identify host
MainThread::INFO::2021-12-24 
03:56:03,575::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Initializing ha-broker connection
MainThread::INFO::2021-12-24 
03:56:03,576::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor network, options {'addr': 'GATEWAY_IP', 'network_test': 
'dns', 'tcp_t_address': '', 'tcp_t_port': ''}
MainThread::ERROR::2021-12-24 
03:56:03,577::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Failed to start necessary monitors
```

Now I've stumbled upon this one 
[1984262](https://bugzilla.redhat.com/show_bug.cgi?id=1984262) but it doesn't 
seem to apply. All hosts resolve properly, all hosts also have proper hostnames 
set, unique /etc/hosts entries and proper A records set (in the form of 
hostname.subdomain.domain.tld).

The versions involved are:

```
[root@host2 ~]# rpm -qa ovirt*
ovirt-hosted-engine-setup-2.5.4-2.el8.noarch
ovirt-imageio-daemon-2.3.0-1.el8.x86_64
ovirt-host-dependencies-4.4.9-2.el8.x86_64
ovirt-vmconsole-1.0.9-1.el8.noarch
ovirt-imageio-client-2.3.0-1.el8.x86_64
ovirt-host-4.4.9-2.el8.x86_64
ovirt-python-openvswitch-2.11-1.el8.noarch
ovirt-openvswitch-ovn-host-2.11-1.el8.noarch
ovirt-provider-ovn-driver-1.2.34-1.el8.noarch
ovirt-openvswitch-ovn-2.11-1.el8.noarch
ovirt-release44-4.4.9.2-1.el8.noarch
ovirt-openvswitch-2.11-1.el8.noarch
ovirt-ansible-collection-1.6.5-1.el8.noarch
ovirt-openvswitch-ovn-common-2.11-1.el8.noarch
ovirt-hosted-engine-ha-2.4.9-1.el8.noarch
ovirt-vmconsole-host-1.0.9-1.el8.noarch
ovirt-imageio-common-2.3.0-1.el8.x86_64
```

Any hint how to fix this is really appreciated. I'd like to get the engine 
appliance back, remove host 3 and re-initialize it since this is a production 
cluster (with hosts 1 and 2 replicating the gluster storage and host 3 acting 
as an arbiter).

Thanks in advance, Martin
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YLJNZRRXPCEMGDCI3BHH733K5UIVQPSP/

Reply via email to