Hi, 
I have a hyperconverged ovirt 4.3.10 installation. One host is unresponsive. 
This has happened in the past but most often resolved itself within a few 
minutes. This time it is a different beast. 
The host does not come back. Putting the host in maintenance is not possible 
because "Host is unresponsive but has running virtual machines". Putting the 
host into local maintenance on the local CLI does not help. Global maintenance 
does not have any effect either. The host has been rebooted several times both 
locally and via the web interface on the engine, which worked but did not 
resolve the issue. 
Logs:
/var/log/ovirt-engine/engine.log
2022-06-07 06:40:17,606+02 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default 
task-34) [26f286ba-58d0-44be-96e0-f874057fe603] EVENT_ID: 
GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host to Maintenance mode.
Host still has running VMs on it and is in Non Responsive state.
2022-06-07 06:40:17,606+02 WARN  
[org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-34) 
[26f286ba-58d0-44be-96e0-f874057fe603] Validation of action 
'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: 
VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_VDS_IS_NOT_RESPONDING_WITH_VMS,$HostNotResponding
 ovirt3.os-s.de
2022-06-07 06:41:23,974+02 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-33) [] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt3.os-s.de command Get Host 
Capabilities failed: Message timeout which can be caused by communication issues
2022-06-07 06:41:23,974+02 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-33) [] Unable to 
RefreshCapabilities: VDSNetworkException: VDSGenericException: 
VDSNetworkException: Message timeout which can be caused by communication issues
2022-06-07 06:44:47,033+02 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-4) [] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt3.os-s.de command Get Host 
Capabilities failed: Message timeout which can be caused by communication issues
2022-06-07 06:44:47,034+02 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-4) [] Unable to 
RefreshCapabilities: VDSNetworkException: VDSGenericException: 
VDSNetworkException: Message timeout which can be caused by communication issues

vdsmd status:

Jun 06 08:51:09 ovirt3.os-s.de vdsm[3567]: WARN Worker blocked: <Worker 
name=jsonrpc/1 running <Task <JsonRpcTask {'params': {}, 'jsonrpc': '2.0', 
'method': u'Host.getStats', 'id': u'ce7ab5d1-3034-4b5a-91c5-9446e17341a2'} at 
0x7fe1646563d0> timeout=60, duration=120.00 at 0x7fe1646562d0> task#=15 at 
0x7fe165f47e50>, traceback:
                                           File: 
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
                                             self.__bootstrap_inner()
                                           File: 
"/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
                                             self.run()
                                           File: 
"/usr/lib64/python2.7/threading.py", line 765, in run
                                             self.__target(*self.__args, 
**self.__kwargs)
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 260, in run
                                             ret = func(*args, **kwargs)
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
                                             self._execute_task()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
                                             task()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
                                             self._callable()
                                           File: 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__
                                             self._handler(self._ctx, self._req)
                                           File: 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in 
_serveRequest
                                             response = 
self._handle_request(req, ctx)
                                           File: 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in 
_handle_request
                                             res = method(**params)
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in 
_dynamicMethod
                                             result = fn(*methodArgs)
                                           File: "<string>", line 2, in getStats
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
                                             ret = func(*args, **kwargs)
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/API.py", line 1435, in getStats
                                             multipath=True)}
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in get_stats
                                             ret['haStats'] = _getHaInfo()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo
                                             stats = 
instance.get_all_stats(timeout=5)
                                           File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 94, in get_all_stats
                                             stats = 
broker.get_stats_from_storage()
                                           File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 146, in get_stats_from_storage
                                             result = self._proxy.get_stats()
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
                                             return self.__send(self.__name, 
args)
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
                                             verbose=self.__verbose
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
                                             return self.single_request(host, 
handler, request_body, verbose)
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
                                             self.send_content(h, request_body)
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
                                             connection.endheaders(request_body)
                                           File: 
"/usr/lib64/python2.7/httplib.py", line 1052, in endheaders
                                             self._send_output(message_body)
                                           File: 
"/usr/lib64/python2.7/httplib.py", line 890, in _send_output
                                             self.send(msg)
                                           File: 
"/usr/lib64/python2.7/httplib.py", line 852, in send
                                             self.connect()
                                           File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 
60, in connect
                                             
self.sock.connect(base64.b16decode(self.host))
                                           File: 
"/usr/lib64/python2.7/socket.py", line 224, in meth
                                             return 
getattr(self._sock,name)(*args)
Jun 06 08:51:14 ovirt3.os-s.de vdsm[3567]: WARN Worker blocked: <Worker 
name=periodic/0 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor 
object at 0x7fe16457ba90> at 0x7fe16457bad0> timeout=15, duration=120.00 at 
0x7fe1644c3910> task#=22 at 0x7fe1646e4550>, traceback:
                                           File: 
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
                                             self.__bootstrap_inner()
                                           File: 
"/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
                                             self.run()
                                           File: 
"/usr/lib64/python2.7/threading.py", line 765, in run
                                             self.__target(*self.__args, 
**self.__kwargs)
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 260, in run
                                             ret = func(*args, **kwargs)
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
                                             self._execute_task()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
                                             task()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
                                             self._callable()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 186, in __call__
                                             self._func()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 481, in __call__
                                             stats = 
hostapi.get_stats(self._cif, self._samples.stats())
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in get_stats
                                             ret['haStats'] = _getHaInfo()
                                           File: 
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo
                                             stats = 
instance.get_all_stats(timeout=5)
                                           File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 94, in get_all_stats
                                             stats = 
broker.get_stats_from_storage()
                                           File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 146, in get_stats_from_storage
                                             result = self._proxy.get_stats()
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
                                             return self.__send(self.__name, 
args)
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
                                             verbose=self.__verbose
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
                                             return self.single_request(host, 
handler, request_body, verbose)
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
                                             self.send_content(h, request_body)
                                           File: 
"/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
                                             connection.endheaders(request_body)
                                           File: 
"/usr/lib64/python2.7/httplib.py", line 1052, in endheaders
                                             self._send_output(message_body)
                                           File: 
"/usr/lib64/python2.7/httplib.py", line 890, in _send_output
                                             self.send(msg)
                                           File: 
"/usr/lib64/python2.7/httplib.py", line 852, in send
                                             self.connect()
                                           File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 
60, in connect
                                             
self.sock.connect(base64.b16decode(self.host))
                                           File: 
"/usr/lib64/python2.7/socket.py", line 224, in meth
                                             return 
getattr(self._sock,name)(*args)

vdsmd log:
2022-06-07 05:02:13,717+0000 INFO  (jsonrpc/3) [vdsm.api] FINISH 
multipath_health return={} from=::1,54704, 
task_id=169b447f-fd52-4a6f-9277-eff9870ffb81 (api:54)
2022-06-07 05:02:13,725+0000 INFO  (jsonrpc/3) [api.host] FINISH getStats 
return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics': 
{'20': {'cpuUser': '0.07', 'nodeIndex': 1, 'cpuSys': '0.00', 'cpuIdle': 
'99.93'}, '21': {'cpuUser': '0.00', 'nodeIndex': 1, 'cpuSys': '0.00', 
'cpuIdle': '100.00'}, '22': {'cpuUser': '0.00', 'nodeIndex': 1, 'cpuSys': 
'0.00', 'cpuIdle': '100.00'}, '23': {'cpuUser': '0.00', 'nodeIndex': 1, 
'cpuSys': '0.07', 'cpuIdle': '99.93'}, '1': {'cpuUser': '0.87', 'nodeIndex': 0, 
'cpuSys': '0.60', 'cpuIdle': '98.53'}, '0': {'cpuUser': '0.40', 'nodeIndex': 0, 
'cpuSys': '0.27', 'cpuIdle': '99.33'}, '3': {'cpuUser': '0.27', 'nodeIndex': 0, 
'cpuSys': '0.33', 'cpuIdle': '99.40'}, '2': {'cpuUser': '0.53', 'nodeIndex': 0, 
'cpuSys': '0.40', 'cpuIdle': '99.07'}, '5': {'cpuUser': '0.33', 'nodeIndex': 0, 
'cpuSys': '0.33', 'cpuIdle': '99.34'}, '4': {'cpuUser': '0.33', 'nodeIndex': 0, 
'cpuSys': '0.40', 'cpuIdle': '99.27'}, '7': {'cpuUser': '0.13', 'nodeIndex': 1, 
 'cpuSys': '0.07', 'cpuIdle': '99.80'}, '6': {'cpuUser': '0.13', 'nodeIndex': 
1, 'cpuSys': '0.07', 'cpuIdle': '99.80'}, '9': {'cpuUser': '0.13', 'nodeIndex': 
1, 'cpuSys': '0.13', 'cpuIdle': '99.74'}, '8': {'cpuUser': '0.07', 'nodeIndex': 
1, 'cpuSys': '0.13', 'cpuIdle': '99.80'}, '11': {'cpuUser': '0.20', 
'nodeIndex': 1, 'cpuSys': '0.13', 'cpuIdle': '99.67'}, '10': {'cpuUser': 
'0.20', 'nodeIndex': 1, 'cpuSys': '0.27', 'cpuIdle': '99.53'}, '13': 
{'cpuUser': '0.80', 'nodeIndex': 0, 'cpuSys': '0.47', 'cpuIdle': '98.73'}, 
'12': {'cpuUser': '0.00', 'nodeIndex': 0, 'cpuSys': '0.00', 'cpuIdle': 
'100.00'}, '15': {'cpuUser': '0.07', 'nodeIndex': 0, 'cpuSys': '0.00', 
'cpuIdle': '99.93'}, '14': {'cpuUser': '0.20', 'nodeIndex': 0, 'cpuSys': 
'0.27', 'cpuIdle': '99.53'}, '17': {'cpuUser': '0.07', 'nodeIndex': 0, 
'cpuSys': '0.00', 'cpuIdle': '99.93'}, '16': {'cpuUser': '0.07', 'nodeIndex': 
0, 'cpuSys': '0.00', 'cpuIdle': '99.93'}, '19': {'cpuUser': '1.46', 
'nodeIndex': 1, 'cpuSys': '0.33', 'cpuIdle'
 : '98.21'}, '18': {'cpuUser': '0.27', 'nodeIndex': 1, 'cpuSys': '0.07', 
'cpuIdle': '99.66'}}, 'numaNodeMemFree': {'1': {'memPercent': 18, 'memFree': 
'52935'}, '0': {'memPercent': 17, 'memFree': '53551'}}, 'memShared': 0, 
'haScore': 0, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 3, 
'memUsed': '3', 'storageDomains': {}, 'incomingVmMigrations': 0, 'network': 
{'bond0': {'rxErrors': '0', 'txErrors': '0', 'speed': '2000', 'rxDropped': '4', 
'name': 'bond0', 'tx': '84766340728', 'txDropped': '0', 'duplex': 'full', 
'sampleTime': 1654578133.717023, 'rx': '102755561238', 'state': 'up'}, 'eno50': 
{'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 
'eno50', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 
1654578133.717023, 'rx': '0', 'state': 'down'}, ';vdsmdummy;': {'rxErrors': 
'0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;', 
'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1654578133.71702
 3, 'rx': '0', 'state': 'down'}, 'ovirtmgmt': {'rxErrors': '0', 'txErrors': 
'0', 'speed': '1000', 'rxDropped': '122', 'name': 'ovirtmgmt', 'tx': 
'81789466633', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 
1654578133.717023, 'rx': '99028291927', 'state': 'up'}, 'lo': {'rxErrors': '0', 
'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx': 
'10214215104', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 
1654578133.717023, 'rx': '10214215104', 'state': 'up'}, 'ovs-system': 
{'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 
'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 
1654578133.717023, 'rx': '0', 'state': 'down'}, 'eno49': {'rxErrors': '0', 
'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eno49', 'tx': '0', 
'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1654578133.717023, 'rx': 
'0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed': 
'1000', 'rxDropped': '0', 'na
 me': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 
1654578133.717023, 'rx': '0', 'state': 'down'}, 'eno1': {'rxErrors': '0', 
'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eno1', 'tx': '0', 
'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1654578133.717023, 'rx': 
'0', 'state': 'down'}, 'eno2': {'rxErrors': '0', 'txErrors': '0', 'speed': 
'1000', 'rxDropped': '0', 'name': 'eno2', 'tx': '0', 'txDropped': '0', 
'duplex': 'unknown', 'sampleTime': 1654578133.717023, 'rx': '0', 'state': 
'down'}, 'eno3': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 
'rxDropped': '1', 'name': 'eno3', 'tx': '42493183527', 'txDropped': '0', 
'duplex': 'full', 'sampleTime': 1654578133.717023, 'rx': '20793574', 'state': 
'up'}, 'eno4': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': 
'1', 'name': 'eno4', 'tx': '42273157201', 'txDropped': '0', 'duplex': 'full', 
'sampleTime': 1654578133.717023, 'rx': '102734767664', 'state': 'up'}, 
'genev_sys_6081': 
 {'rxErrors': '0', 'txErrors': '8', 'speed': '1000', 'rxDropped': '0', 'name': 
'genev_sys_6081', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 
'sampleTime': 1654578133.717023, 'rx': '0', 'state': 'up'}}, 'txDropped': 
'128', 'anonHugePages': '326', 'ksmPages': 100, 'elapsedTime': '391.12', 
'cpuLoad': '0.16', 'cpuSys': '0.18', 'diskStats': {'/var/log': {'free': 
'6478'}, '/var/run/vdsm/': {'free': '64294'}, '/tmp': {'free': '906'}}, 
'cpuUserVdsmd': '0.47', 'netConfigDirty': 'False', 'memCommitted': 28672, 
'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 126975, 
'bootTime': '1654505219', 'haStats': {'active': True, 'configured': True, 
'score': 0, 'localMaintenance': True, 'globalMaintenance': False}, 'momStatus': 
'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0, 
'swapTotal': 32191, 'swapFree': 32191, 'hugepages': defaultdict(<type 'dict'>, 
{1048576: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 
0, 'surplus_hugepages
 ': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}, 
2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 
'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 
'nr_hugepages_mempolicy': 0}}), 'dateTime': '2022-06-07T05:02:13 GMT', 
'cpuUser': '0.27', 'memFree': 126719, 'cpuIdle': '99.55', 'vmActive': 0, 
'v2vJobs': {}, 'cpuSysVdsmd': '0.13'}} from=::1,54704 (api:54)
2022-06-07 05:02:13,726+0000 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call 
Host.getStats succeeded in 0.03 seconds (__init__:312)
2022-06-07 05:02:13,973+0000 INFO  (vmrecovery) [vdsm.api] START 
getConnectedStoragePoolsList(options=None) from=internal, 
task_id=65629949-c74d-4f66-846a-1e2c40be9157 (api:48)
2022-06-07 05:02:13,973+0000 INFO  (vmrecovery) [vdsm.api] FINISH 
getConnectedStoragePoolsList return={'poollist': []} from=internal, 
task_id=65629949-c74d-4f66-846a-1e2c40be9157 (api:54)
2022-06-07 05:02:13,973+0000 INFO  (vmrecovery) [vds] recovery: waiting for 
storage pool to go up (clientIF:723)
2022-06-07 05:02:14,468+0000 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call 
Host.ping2 succeeded in 0.00 seconds (__init__:312)
2022-06-07 05:02:14,471+0000 INFO  (jsonrpc/5) [vdsm.api] START 
repoStats(domains=[u'5d6276cc-08ab-47f6-81e7-4e64aac3d386']) from=::1,54704, 
task_id=6918f715-bbf4-4c7f-af00-1764294ab665 (api:48)
2022-06-07 05:02:14,472+0000 INFO  (jsonrpc/5) [vdsm.api] FINISH repoStats 
return={} from=::1,54704, task_id=6918f715-bbf4-4c7f-af00-1764294ab665 (api:54)
2022-06-07 05:02:14,472+0000 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call 
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312)

Gluster volumes are all online.

systemctl status ovirt-ha-broker:
Jun 06 08:47:08 ovirt3.os-s.de systemd[1]: Started oVirt Hosted Engine High 
Availability Communications Broker.
Jun 06 08:47:59 ovirt3.os-s.de ovirt-ha-broker[1524]: ovirt-ha-broker 
mgmt_bridge.MgmtBridge ERROR Failed to getVdsStats: No 'network' in result

Any ideas? 

Ralf
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XT5WJCZKLIEDNI65IDTBY56VO7X7JOLT/

Reply via email to