Hi, Last night we have an incident of a failed host. Engine issued a fence but did not restart the vms running on that node on other operational hosts. I'd like to know if this is normal or I can tune it somehow.
Here are some relevant logs from engine: 2018-09-05 03:00:51,496+03 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-827644) [] Host 'v3' is not responding. It will stay in Connecting state for a grace period of 63 seconds and after that an attempt to fence the host will be issued. 2018-09-05 03:01:11,945+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (EE-ManagedThreadFactory-engineScheduled-Thread-57) [] Failed to fetch vms info for host 'v3' - skipping VMs monitoring. 2018-09-05 03:01:48,028+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-827679) [] EVENT_ID: VM_SET_TO_UNKNOWN_STATUS(142), VM vm7 was set to the Unknown status. 2018-09-05 03:02:10,033+03 INFO [org.ovirt.engine.core.bll.pm.StopVdsCommand] (EE-ManagedThreadFactory-engine-Thread-827680) [30369e01] Power-Management: STOP of host 'v3' initiated. 2018-09-05 03:02:55,935+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-827680) [3adcac38] EVENT_ID: VM_WAS_SET_DOWN_DUE_TO_HOST_REBOOT_OR_MANUAL_FENCE(143), Vm vm7 was shut down due to v3 host reboot or manual fence 2018-09-05 03:02:56,018+03 INFO [org.ovirt.engine.core.bll.pm.StopVdsCommand] (EE-ManagedThreadFactory-engine-Thread-827680) [ea0f582] Power-Management: STOP host 'v3' succeeded. 2018-09-05 03:08:20,818+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [326878] EVENT_ID: VDS_DETECTED(13), Status of host v3 was set to Up. 2018-09-05 03:08:23,391+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [] VM '3b1262ef-7fff-40af-b85e-9fd01a4f422b'(vm7) was unexpectedly detected as 'Down' on VDS '4970369d-21c2-467d-9247-c73ca2d71b3e'(v3) (expected on 'null') As you can see, engine does a fence on node v3. vm7 as well as the others running on that node did not re-start. any tips? engine is ovirt-engine-4.2.5.3-1.el7.noarch and host is vdsm-4.20.35-1.el7.x86_64 best regards, Giannis _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2FIAENRQOJ7LS5ACX2XJFGT27WOCDU6D/