Hi, I am running a 3 node Ovirt cluster with a hosted-engine. Unfortunately I had a small issue on my NFS server which provides the shared storage for the cluster. During the outage, all VM's went into pause, and the cluster itself (hosted engine) went down. After restoring nfs service (took 2 minutes), the cluster did not recover. The HA agent can't make sense of the lockspace anymore it seems. The agent fails on all nodes.
There are several errors in logs but the main one is (I think) in broker.log: MainThread::WARNING::2022-05-01 22:06:06,085::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command Image.prepare with args {'imageID': 'c1b7e131-bbde-416b-b2a0-de746a039dfd', 'storagepoolID': '00000000-0000-0000-0000-000000000000', 'volumeID': 'b221ea37-2c59-49bf-89f7-83766fb53717', 'storagedomainID': 'e3b467ec-fdfc-4c7a-9725-8a6d1fe18c6d'} failed: (code=201, message=Volume does not exist: (u'b221ea37-2c59-49bf-89f7-83766fb53717',)) Which is weird because it seems to exist: ls -l /rhev/data-center/mnt/nas.fritz.box:_mnt_HD_HD__a2_hosted__engine__nas/e3b467ec-fdfc-4c7a-9725-8a6d1fe18c6d/images/c1b7e131-bbde-416b-b2a0-de746a039dfd total 1049608 -rw-rw----. 1 vdsm kvm 1073741824 May 1 15:35 b221ea37-2c59-49bf-89f7-83766fb53717 -rw-rw----. 1 vdsm kvm 1048576 Dec 21 2020 b221ea37-2c59-49bf-89f7-83766fb53717.lease -rw-rw-rw-. 1 vdsm kvm 329 Dec 21 2020 b221ea37-2c59-49bf-89f7-83766fb53717.meta When I try to reinitialize the lockspace (stopping the agent etc) I get: ---- File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 60, in connect self.sock.connect(base64.b16decode(self.host)) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 2] No such file or directory Is there a way to to this manually and also recreate the lockspace volume? I already tried to recreate the lockspace manually with: sanlock direct init -s hosted-engine:0:/rhev/data-center/mnt/nas.fritz.box:_mnt_HD_HD__a2_hosted__engine__nas/e3b467ec-fdfc-4c7a-9725-8a6d1fe18c6d/ha_agent/hosted-engine.lockspace:0 which resulted in: init done -19 No further info and nothing changed. With kind regards, Joost _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/E5NDSMACYIAQS4BQIDZC2L6AYYW3BSCQ/