Hi,

I am running a 3 node Ovirt cluster with a hosted-engine. Unfortunately I had a 
small issue on my NFS server which provides the shared storage for the cluster. 
During the outage, all VM's went into pause, and the cluster itself (hosted 
engine) went down. After restoring nfs service (took 2 minutes), the cluster 
did not recover. The HA agent can't make sense of the lockspace anymore it 
seems. The agent fails on all nodes.

There are several errors in logs  but the main one is (I think) in broker.log:

MainThread::WARNING::2022-05-01 
22:06:06,085::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
 Can't connect vdsm storage: Command Image.prepare with args {'imageID': 
'c1b7e131-bbde-416b-b2a0-de746a039dfd', 'storagepoolID': 
'00000000-0000-0000-0000-000000000000', 'volumeID': 
'b221ea37-2c59-49bf-89f7-83766fb53717', 'storagedomainID': 
'e3b467ec-fdfc-4c7a-9725-8a6d1fe18c6d'} failed:
(code=201, message=Volume does not exist: 
(u'b221ea37-2c59-49bf-89f7-83766fb53717',))

Which is weird because it seems to exist:
ls -l  
/rhev/data-center/mnt/nas.fritz.box:_mnt_HD_HD__a2_hosted__engine__nas/e3b467ec-fdfc-4c7a-9725-8a6d1fe18c6d/images/c1b7e131-bbde-416b-b2a0-de746a039dfd
total 1049608
-rw-rw----. 1 vdsm kvm 1073741824 May  1 15:35 
b221ea37-2c59-49bf-89f7-83766fb53717
-rw-rw----. 1 vdsm kvm    1048576 Dec 21  2020 
b221ea37-2c59-49bf-89f7-83766fb53717.lease
-rw-rw-rw-. 1 vdsm kvm        329 Dec 21  2020 
b221ea37-2c59-49bf-89f7-83766fb53717.meta

When I try to reinitialize  the lockspace (stopping the agent etc) I get:
----
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 
60, in connect
    self.sock.connect(base64.b16decode(self.host))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 2] No such file or directory

Is there a way to to this manually and also recreate the lockspace volume?

I already tried to recreate the lockspace manually with:
 sanlock direct init -s 
hosted-engine:0:/rhev/data-center/mnt/nas.fritz.box:_mnt_HD_HD__a2_hosted__engine__nas/e3b467ec-fdfc-4c7a-9725-8a6d1fe18c6d/ha_agent/hosted-engine.lockspace:0
which resulted in:
init done -19

No further info and nothing changed.

With kind regards,
Joost
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E5NDSMACYIAQS4BQIDZC2L6AYYW3BSCQ/

Reply via email to