[ovirt-users] Re: Periodic fall host

Spickiy Nikita Wed, 17 Oct 2018 19:33:02 -0700

I checked vdsm log, if i correctly understand, that it's the bug?

https://paste.fedoraproject.org/paste/tHwLPSnIKI8Px1XBjGw7UA


[root@ovirt3 vdsm]# find /var/log/vdsm/ -name "vdsm*" -mtime -1 -exec xzgrep 
--color "2018-10-17.*ERROR" {} \; | sort -k1
2018-10-17 02:04:49,163+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:05:09,159+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:06:19,155+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:06:39,157+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:14:09,158+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:19:19,163+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:20:39,156+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:21:59,161+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:22:19,154+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:29:49,163+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:30:29,158+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:30:49,158+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 02:32:59,159+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 08:36:19,087+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/192.168.3.11:_ovirt__newstorage/9ce3b963-660d-4a8c-987e-17550b7b28c7/dom_md/metadata
 (monitor:498)
2018-10-17 08:36:19,138+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/qnap.company.ru<http://qnap.company.ru>:_engine/b10b9091-a66e-4c26-a1bf-a79ce66d4df4/dom_md/metadata
 (monitor:498)
2018-10-17 08:36:19,170+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/10.10.10.254:_ovirt__iso/25647e6d-5b55-4d6a-8c49-04b696aa1109/dom_md/metadata
 (monitor:498)
2018-10-17 08:37:09,681+0300 ERROR (monitor/25647e6) [storage.Monitor] Error 
checking domain 25647e6d-5b55-4d6a-8c49-04b696aa1109 (monitor:424)
2018-10-17 08:37:09,681+0300 ERROR (monitor/b10b909) [storage.Monitor] Error 
checking domain b10b9091-a66e-4c26-a1bf-a79ce66d4df4 (monitor:424)
2018-10-17 08:37:11,928+0300 ERROR (monitor/9ce3b96) [storage.Monitor] Error 
checking domain 9ce3b963-660d-4a8c-987e-17550b7b28c7 (monitor:424)
2018-10-17 10:10:59,159+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 10:53:28,082+0300 ERROR (periodic/168) [root] failed to retrieve 
Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted 
Engine setup finished? (api:196)
2018-10-17 10:56:37,635+0300 ERROR (periodic/1) [root] failed to retrieve 
Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted 
Engine setup finished? (api:196)
2018-10-17 10:56:37,966+0300 ERROR (jsonrpc/3) [storage.TaskManager.Task] 
(Task='519e3d58-c68e-4ec2-b144-a54442411dc1') Unexpected error (task:875)
2018-10-17 10:56:37,968+0300 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH 
getStorageDomainInfo error=Storage domain does not exist: 
(u'b10b9091-a66e-4c26-a1bf-a79ce66d4df4',) (dispatcher:82)
2018-10-17 11:48:41,457+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 11:49:31,317+0300 ERROR (monitor/b606bde) [storage.Monitor] Error 
checking domain b606bde9-21da-4c10-b77b-2d8e0c374ea2 (monitor:424)
2018-10-17 11:49:36,837+0300 ERROR (periodic/2) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=60621bbd-6bfe-454d-9088-be7496b24489 at 
0x7f5b44653190> timeout=30.0, duration=60 at 0x7f5b646d0710> (executor:317)
2018-10-17 11:49:36,837+0300 ERROR (periodic/2) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,837+0300 ERROR (periodic/2) [storage.TaskManager.Task] 
(Task='b043e518-308a-45f0-bbfb-6dcb574ffba8') Unexpected error (task:875)
2018-10-17 11:49:36,842+0300 ERROR (periodic/0) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=ddf00567-3842-4982-8035-5cdd8efa2c36 at 
0x7f5b444c42d0> timeout=30.0, duration=60 at 0x7f5b444c4f50> (executor:317)
2018-10-17 11:49:36,842+0300 ERROR (periodic/0) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,842+0300 ERROR (periodic/0) [storage.TaskManager.Task] 
(Task='90f4ae89-b402-4b38-9f1b-ed68e7e92589') Unexpected error (task:875)
2018-10-17 11:49:36,843+0300 ERROR (periodic/3) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=b40f7d38-daf7-499b-988c-1c2d19d115a5 at 
0x7f5b444c4310> timeout=30.0, duration=60 at 0x7f5b842ae7d0> (executor:317)
2018-10-17 11:49:36,843+0300 ERROR (periodic/3) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,843+0300 ERROR (periodic/3) [storage.TaskManager.Task] 
(Task='87116c65-d594-4f4d-a56b-1eea462f68fc') Unexpected error (task:875)
2018-10-17 11:49:36,845+0300 ERROR (periodic/1) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,845+0300 ERROR (periodic/1) [storage.TaskManager.Task] 
(Task='9cc7d57c-92a5-484d-9f4c-7481091b1ebe') Unexpected error (task:875)
2018-10-17 11:49:36,846+0300 ERROR (periodic/1) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=acadd04a-b762-46eb-81b8-bf276758de64 at 
0x7f5b842aef50> timeout=30.0, duration=60 at 0x7f5b444e2590> (executor:317)
2018-10-17 11:50:06,837+0300 ERROR (periodic/4) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=5147652d-8288-4614-a1fd-51372af6a93f at 
0x7f5b6420e5d0> timeout=30.0, duration=60 at 0x7f5b6420e590> (executor:317)
2018-10-17 11:50:06,837+0300 ERROR (periodic/4) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:50:06,837+0300 ERROR (periodic/4) [storage.TaskManager.Task] 
(Task='c2782f6b-59c8-4e16-9bc0-99ed697861a8') Unexpected error (task:875)
2018-10-17 11:50:06,840+0300 ERROR (periodic/5) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=975f16fb-cf7f-4322-bb50-e3fc8b616378 at 
0x7f5b6420ec90> timeout=30.0, duration=60 at 0x7f5b6420e890> (executor:317)
2018-10-17 11:50:06,840+0300 ERROR (periodic/5) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:50:06,840+0300 ERROR (periodic/5) [storage.TaskManager.Task] 
(Task='b8f9b887-7558-4d1a-a264-af9f8c47b552') Unexpected error (task:875)
2018-10-17 11:50:06,841+0300 ERROR (periodic/6) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=2d0cf85c-f53c-42fd-bbdb-7f1a6d20e193 at 
0x7f5b6420e610> timeout=30.0, duration=60 at 0x7f5b6420edd0> (executor:317)
2018-10-17 11:50:06,841+0300 ERROR (periodic/6) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 11:50:06,841+0300 ERROR (periodic/6) [storage.TaskManager.Task] 
(Task='0359b365-cea7-436b-96c2-98d7efe9470b') Unexpected error (task:875)
2018-10-17 11:50:06,853+0300 ERROR (periodic/7) [storage.TaskManager.Task] 
(Task='91d471ef-853e-4351-b0a8-3712961573d9') Unexpected error (task:875)
2018-10-17 11:50:06,854+0300 ERROR (periodic/7) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=efd80eea-5bb4-493c-9e88-7ad9f4accd78 at 
0x7f5b6420ebd0> timeout=30.0, duration=60 at 0x7f5b6420e950> (executor:317)
2018-10-17 11:50:06,854+0300 ERROR (periodic/7) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 12:02:01,456+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 12:04:01,456+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 13:43:31,453+0300 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
 (monitor:498)
2018-10-17 13:44:27,300+0300 ERROR (monitor/b606bde) [storage.Monitor] Error 
checking domain b606bde9-21da-4c10-b77b-2d8e0c374ea2 (monitor:424)
2018-10-17 13:44:36,992+0300 ERROR (periodic/12) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=18f90e1d-ad1d-4937-8b71-4f5e09e772a2 at 
0x7f5b6463b910> timeout=30.0, duration=60 at 0x7f5b26e35450> (executor:317)
2018-10-17 13:44:36,992+0300 ERROR (periodic/12) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 13:44:36,992+0300 ERROR (periodic/12) [storage.TaskManager.Task] 
(Task='12d1da8e-ba6b-48fb-8ae7-6b304891356d') Unexpected error (task:875)
2018-10-17 13:44:36,995+0300 ERROR (periodic/14) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=ddf00567-3842-4982-8035-5cdd8efa2c36 at 
0x7f5b26e35610> timeout=30.0, duration=60 at 0x7f5b26cfce90> (executor:317)
2018-10-17 13:44:36,995+0300 ERROR (periodic/14) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 13:44:36,995+0300 ERROR (periodic/14) [storage.TaskManager.Task] 
(Task='b198455f-2847-4c8e-821b-6c269bed2411') Unexpected error (task:875)
2018-10-17 13:44:36,996+0300 ERROR (periodic/16) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=b40f7d38-daf7-499b-988c-1c2d19d115a5 at 
0x7f5b26cfca90> timeout=30.0, duration=60 at 0x7f5b26cfc790> (executor:317)
2018-10-17 13:44:36,996+0300 ERROR (periodic/16) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 13:44:36,996+0300 ERROR (periodic/16) [storage.TaskManager.Task] 
(Task='60d13a5b-0b10-4a60-b9ee-05820a4e47a7') Unexpected error (task:875)
2018-10-17 13:44:37,001+0300 ERROR (periodic/17) [storage.TaskManager.Task] 
(Task='6c827331-3373-4ed0-9630-0cea0d989a06') Unexpected error (task:875)
2018-10-17 13:44:37,002+0300 ERROR (periodic/17) [Executor] Unhandled exception 
in <Task discardable <UpdateVolumes vm=acadd04a-b762-46eb-81b8-bf276758de64 at 
0x7f5b26cfc690> timeout=30.0, duration=60 at 0x7f5b26fc9f90> (executor:317)
2018-10-17 13:44:37,002+0300 ERROR (periodic/17) [storage.Dispatcher] FINISH 
getVolumeSize error=Connection timed out (dispatcher:86)
2018-10-17 14:56:02,788+0300 ERROR (migmon/f52e8e78) [root] Unhandled exception 
(logutils:412)
2018-10-17 14:56:02,884+0300 ERROR (migmon/f52e8e78) [root] FINISH thread 
<Thread(migmon/f52e8e78, stopped daemon 140026755127040)> failed 
(concurrent:201)

On 17 Oct 2018, at 14:42, Sahina Bose 
<sab...@redhat.com<mailto:sab...@redhat.com>> wrote:



On Tue, Oct 16, 2018 at 11:39 PM Spickiy Nikita 
<n.spic...@outlook.com<mailto:n.spic...@outlook.com>> wrote:
Hi, i have oVirt instance (4.2.1.6-1.el7.centos). So, i have cluster with 
gluster. Hosts periodically non response and VM's is not responding. Usually it 
happens after get message "command GetGlusterVolumeHealInfoVDS failed: Message 
timeout which can be caused by communication issues".

Will solve the trouble if an increase timeout for get heat status? And how to 
do it?

I attach part log below:

https://paste.fedoraproject.org/paste/8TTzwjMbYk32d7wd7Ix0Pw/raw

2018-10-15 14:44:22,582+03 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler6) [70cfd553] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM 
ovirt3.example.org<http://ovirt3.example.org/> command 
GetGlusterVolumeHealInfoVDS failed: Message timeout which can be caused by 
communication issues
2018-10-15 14:44:22,584+03 ERROR 
[org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeHealInfoVDSCommand] 
(DefaultQuartzScheduler6) [70cfd553] Command 
'GetGlusterVolumeHealInfoVDSCommand(HostName = 
ovirt3.example.org<http://ovirt3.example.org/>, 
GlusterVolumeVDSParameters:{hostId='39215015-2537-4329-921f-c11256f99e04', 
volumeName='domain1'})' execution failed: VDSGenericException: 
VDSNetworkException: Message timeout which can be caused by communication issues
2018-10-15 14:44:22,584+03 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engine-Thread-7) [70cfd553] Host 
'ovirt3.example.org<http://ovirt3.example.org/>' is not responding. It will 
stay in Connecting state for a grace period of 77 seconds and after that an 
attempt to fence the host will be issued.
2018-10-15 14:44:22,591+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-7) [70cfd553] EVENT_ID: 
VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host 
ovirt3.example.org<http://ovirt3.example.org/> is not responding. It will stay 
in Connecting state for a grace period of 77 seconds and after that an attempt 
to fence the host will be issued.
2018-10-15 14:44:54,620+03 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-13) [] EVENT_ID: 
VDS_STORAGE_VDS_STATS_FAILED(189), Host 
ovirt3.example.org<http://ovirt3.example.org/> reports about one of the Active 
Storage Domains as Problematic.
2018-10-15 14:44:54,827+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-46) [6d9504d1] EVENT_ID: 
VDS_SET_NONOPERATIONAL_DOMAIN(522), Host 
ovirt3.example.org<http://ovirt3.example.org/> cannot access the Storage 
Domain(s) DOMAIN1 attached to the Data Center Default. Setting Host state to 
Non-Operational.
2018-10-15 14:44:54,840+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-46) [6d9504d1] EVENT_ID: 
CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host 
ovirt3.example.org<http://ovirt3.example.org/> to Storage Pool Default
2018-10-15 14:45:28,698+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM HostedEngine is not responding.
2018-10-15 14:45:30,296+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-72) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM vm2 is not responding.
2018-10-15 14:45:30,362+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-72) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM vm3 is not responding.



Can you check the vdsm log to see if you're running into 
https://bugzilla.redhat.com/show_bug.cgi?id=1614430

_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XK7YX6FINFOKA7WGK2ST7KGTCICS6M25/

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OCKSC7U3NWFLSLYEAYRNG3OUNVGVRTQ7/

[ovirt-users] Re: Periodic fall host

Reply via email to