Hi all. I have a HCI setup, glusterfs 3.12, ovirt 4.2.7, 4 nodes

Yesterday I see 3 VMs detected by engine as "not responding" (it is marked as 
HA VMs)
(it all located on ovirtnode1 server)
Two of them are restarted by engine on other nodes successfully, but one are 
not. All get LOCALIZED message like 'cannot set lock: no free space on device'  
- what is this ? Why engine get that errors, and why some VMs can restart 
automatically, some not (but successfully restarted bu user via webui after 
some pause?)
Who knows, what are you think? Full engine logs may be uploaded.

from engine.log: start event
---------------
2019-02-26 17:04:05,308+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'Up' --> 
'NotResponding'
2019-02-26 17:04:05,865+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM openfire.miac is not responding.
2019-02-26 17:04:05,865+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 
'7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'(e-l-k.miac) moved from 'Up' --> 
'NotResponding'
2019-02-26 17:04:05,894+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM e-l-k.miac is not responding.
2019-02-26 17:04:05,895+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 
'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'Up' --> 
'NotResponding'
2019-02-26 17:04:05,926+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM tsgr-mon is not responding.
---------------------------
2019-02-26 17:04:22,237+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is 
down with error. Exit message: VM has been terminated on the host.
....
2019-02-26 17:04:22,374+04 INFO  [org.ovirt.engine.core.bll.VdsEventListener] 
(ForkJoinPool-1-worker-9) [] Highly Available VM went down. Attempting to 
restart. VM Name 'openfire.miac', VM Id 'd546add1-126a-4490-bc83-469bab659854'
...
2019-02-26 17:04:27,737+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: 
VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: resource 
busy: Failed to acquire lock: Lease is held by another host.
...
2019-02-26 17:04:28,350+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886073) [] EVENT_ID: 
VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host 
ovirtnode6.miac
2019-02-26 17:04:31,841+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is 
down with error. Exit message: resource busy: Failed to acquire lock: Lease is 
held by another host.
2019-02-26 17:04:31,877+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: 
VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host 
ovirtnode6.miac
...
2019-02-26 17:04:31,994+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: 
VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host 
ovirtnode1.miac
.....
2019-02-26 17:04:36,054+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is 
down with error. Exit message: Не удалось установить блокировку: На устройстве 
не осталось свободного места.
2019-02-26 17:04:36,054+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(ForkJoinPool-1-worker-9) [] add VM 
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) to rerun treatment
2019-02-26 17:04:36,091+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886083) [] EVENT_ID: 
VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host 
ovirtnode1.miac
-------------------
No more attemtps for this VM were made (state now is 'Down')

Engine tried restart this VM on some other nodes, get ' Lease is held by 
another host' (it is normal, because timeout for lock not expired?) and then 
got (LOCALIZED MESSAGE ?? Why it is localized while all other are not?) 'cannot 
set lock: no free space on device'

Which 'device' it mean? How to see how many 'free space' in it? gluster volumes 
 almost empty:
ovirtstor1.miac:/engine                           150G         7,5G  143G       
     5% /rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine
ovirtnode1.miac:/data                             5,9T          42G  5,9T       
     1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data
ovirtnode1.miac:/vmstore                          4,4T         194G  4,2T       
     5% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore
ovirtnode1.miac:/iso_storage                      600G         1,3G  599G       
     1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_iso__storage
ovirtnode1.miac:/export                           5,1T          14G  5,1T       
     1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_export

Other VMs also get this error about free space:
-------------
2019-02-26 17:05:05,142+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-11) [] EVENT_ID: VM_DOWN_ERROR(119), VM tsgr-mon is down 
with error. Exit message: Не удалось установить блокир
овку: На устройстве не осталось свободного места.

2019-02-26 17:05:43,526+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(ForkJoinPool-1-worker-11) [] VM 
'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'PoweringUp' --> 
'Up'
2019-02-26 17:05:43,537+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-11) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to 
restart VM tsgr-mon on Host ovirtnode6.miac
----------------
And it is last message about this VM, state now is 'Up'

Therefore all VMs are migrated from this node ovirtnode1, and later in log are 
records about it unresponsiveness:

2019-02-26 17:07:43,584+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler9) [35e05907] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirtnode1.miac command 
GetGlusterVolumeHealIn
foVDS failed: Message timeout which can be caused by communication issues
2019-02-26 17:07:43,592+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886194) [35e05907] EVENT_ID: 
VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host ovirtnode1.miac i
s not responding. It will stay in Connecting state for a grace period of 60 
seconds and after that an attempt to fence the host will be issued.
2019-02-26 17:07:49,684+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] EVENT_ID: 
FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power 
management status on Host ovirtnode1.miac using Proxy Host ovirtnode5.miac and 
Fence Agent ipmilan:172.16.16.1.
2019-02-26 17:07:49,685+04 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] START, 
FenceVdsVDSCommand(HostName = ovirtnode5.miac,  ... action='STATUS', 
agent='FenceAgent:{id='338cbdd9-aab6-484c-a906-21e747a96a84', ... 
type='ipmilan', ip='172.16.16.1', port='null', user='admin', password='***', 
encryptOptions='false', options=''}', policy='null'}), log id: 281e6c2019-02-26 
17:07:49,942+04 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] FINISH, 
FenceVdsVDSCommand, return: FenceOperationResult:{status='SUCCESS', 
powerStatus='ON', message=''}, log id: 281e6c

2019-02-26 17:07:56,131+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [317316e4] EVENT_ID: 
SYNC_STORAGE_DEVICES_IN_HOST(4,126), Manually synced the storage devices from 
host ovirtnode1.miac
2019-02-26 17:07:56,550+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: 
VDS_DETECTED(13), Status of host ovirtnode1.miac was set to Up.
2019-02-26 17:07:56,554+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: 
VDS_FENCE_STATUS(496), Host ovirtnode1.miac power managementwas verified 
successfully.
-------------

After all, now I try to start VM that not restarted automatically:
-------------
2019-02-27 11:23:30,139+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2921212) 
[8e405cb9-3a3a-42ab-ae05-5aaf40194b30] EVENT_ID: USER_STARTED_VM(153), VM 
openfire.miac was started by admin@internal-authz (Host: ovirtnode1.miac).
.....
2019-02-27 11:24:12,489+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(ForkJoinPool-1-worker-1) [] VM 
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'PoweringUp' 
--> 'Up'
2019-02-27 11:24:12,502+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-1) [] EVENT_ID: USER_RUN_VM(32), VM openfire.miac 
started on Host ovirtnode1.miac
---------------

It started without problems on same host ovirtnode1. Strange.


--
Mike
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XHC5MDVYLIHKZR4YBCGULV6OLGTB25LF/

Reply via email to