[ovirt-users] Re: oVirt 4.3.7 and Gluster 6.6 multiple issues

Strahil Nikolov Tue, 11 Feb 2020 22:38:28 -0800

On February 12, 2020 5:30:42 AM GMT+02:00, adrianquint...@gmail.com wrote:
>Hi, 
>I am having a couple of issues with fresh ovirt 4.3.7  HCI setup with 3
>nodes
>
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3
>seems to be ok):
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>     service vdsmd status
>Redirecting to /bin/systemctl status vdsmd.service
>● vdsmd.service - Virtual Desktop Server Manager
>Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
>preset: enabled)
>  Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago
>Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
>--pre-start (code=exited, status=0/SUCCESS)
> Main PID: 25549 (vdsmd)
>    Tasks: 76
>   CGroup: /system.slice/vdsmd.service
>           ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd
>├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51
>--max-threads 10 --max-queued-requests 10
>├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86
>--max-threads 10 --max-queued-requests 10
>├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93
>--max-threads 10 --max-queued-requests 10
>└─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101
>--max-threads 10 --max-queued-requests 10
>
>Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local
>vdsmd_init_common.sh[25457]: vdsm: Running test_space
>Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local
>vdsmd_init_common.sh[25457]: vdsm: Running test_lo
>Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started
>Virtual Desktop Server Manager.
>Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM
>not available.
>Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM
>not available, KSM stats will be missing.
>Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
>failed to retrieve Hosted Engine HA score
>                                     Traceback (most recent call last):
>File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
>_getHaInfo...
>Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
>failed to retrieve Hosted Engine HA score
>                                     Traceback (most recent call last):
>File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
>_getHaInfo...
>Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
>failed to retrieve Hosted Engine HA score
>                                     Traceback (most recent call last):
>File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
>_getHaInfo...
>Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
>failed to retrieve Hosted Engine HA score
>                                     Traceback (most recent call last):
>File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
>_getHaInfo...
>Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN
>ping was deprecated in favor of ping2 and confirmConnectivity
>
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>2.-"gluster vol engine heal info" is showing the following and it never
>finishes healing
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>[root@host2~]# gluster vol heal engine info
>Brick host1:/gluster_bricks/engine/engine
>/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
>
>/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
>
>Status: Connected
>Number of entries: 2
>
>Brick host2:/gluster_bricks/engine/engine
>/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
>
>/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
>
>Status: Connected
>Number of entries: 2
>
>Brick host3:/gluster_bricks/engine/engine
>Status: Connected
>Number of entries: 0
>
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>3.-Every hour I see the following entries/errors
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>VDSM command SetVolumeDescriptionVDS failed: Could not acquire
>resource. Probably resource factory threw an exception.: ()
>
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>4.- I am also seeing the following pertaining to the engine volume
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF
>data isn't updated on those OVF stores (Data Center Default, Storage
>Domain hosted_storage).
>
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>5.-hosted-engine --vm-status
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>--== Host host1 (id: 1) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : host1
>Host ID                            : 1
>Engine status                      : {"reason": "vm not running on this
>host", "health": "bad", "vm": "down", "detail": "unknown"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : be592659
>local_conf_timestamp               : 480218
>Host timestamp                     : 480217
>Extra metadata (valid at timestamp):
>       metadata_parse_version=1
>       metadata_feature_version=1
>       timestamp=480217 (Tue Feb 11 19:22:20 2020)
>       host-id=1
>       score=3400
>       vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020)
>       conf_on_shared_storage=True
>       maintenance=False
>       state=EngineDown
>       stopped=False
>
>
>--== Host host3 (id: 2) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : host3
>Host ID                            : 2
>Engine status                      : {"health": "good", "vm": "up",
>"detail": "Up"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : 1f4a8597
>local_conf_timestamp               : 436681
>Host timestamp                     : 436681
>Extra metadata (valid at timestamp):
>       metadata_parse_version=1
>       metadata_feature_version=1
>       timestamp=436681 (Tue Feb 11 19:22:18 2020)
>       host-id=2
>       score=3400
>       vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020)
>       conf_on_shared_storage=True
>       maintenance=False
>       state=EngineUp
>       stopped=False
>
>
>--== Host host2 (id: 3) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : host2
>Host ID                            : 3
>Engine status                      : {"reason": "vm not running on this
>host", "health": "bad", "vm": "down_missing", "detail": "unknown"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : ca5c1918
>local_conf_timestamp               : 479644
>Host timestamp                     : 479644
>Extra metadata (valid at timestamp):
>       metadata_parse_version=1
>       metadata_feature_version=1
>       timestamp=479644 (Tue Feb 11 19:22:21 2020)
>       host-id=3
>       score=3400
>       vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020)
>       conf_on_shared_storage=True
>       maintenance=False
>       state=EngineDown
>       stopped=False
>
>
>
>------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>Any ideas on what might be going ?
>_______________________________________________
>Users mailing list -- users@ovirt.org
>To unsubscribe send an email to users-le...@ovirt.org
>Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives:
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZIHE4E7RIVLPA3Y5JQ7LI5SXXM474CU4/


The meta file issue is a bug which will soon be fixed.
The easiest way to recover is to compare the contents of the file on all Bricks 
and then rsync the newest file (usually only timestamp inside is increased) to 
the other bricks and issue a full heal.

Try to stop and then start ovirt-ha-broker & ovirt-ha-agent services .

For point 4 - I guess you will need find out if the file is there and  if there 
are errors on 'sanlock.service' .


Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RHQNZYCFJZRYDNRE5M4BQXZFTP2O6WQ7/

[ovirt-users] Re: oVirt 4.3.7 and Gluster 6.6 multiple issues

Reply via email to