Francesco Romani has uploaded a new change for review. Change subject: vm: handle missing domains on recovery ......................................................................
vm: handle missing domains on recovery When VDSM tries to reconnect to libvirt, it is possible that the domain lookup fails. This is especially true on recovering. This patch add an explicit check in the recovery path to make sure that a VM is either created with a valid libvirt domain handle, or it is reported as Down so the engine can collect its state and explicitely destroy it. This patch can stand on its own to increase the robustness of VDSM, but is the first of a two-part series which collectively fix the bz 1045626. Change-Id: I00ef12883c8035209de0f273925eb8603d6b6da8 Bug-Url: https://bugzilla.redhat.com/1045626 Signed-off-by: Francesco Romani <from...@redhat.com> --- M vdsm/vm.py 1 file changed, 17 insertions(+), 0 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/75/25275/1 diff --git a/vdsm/vm.py b/vdsm/vm.py index af8b3be..bc5e3ba 100644 --- a/vdsm/vm.py +++ b/vdsm/vm.py @@ -1891,6 +1891,10 @@ pass +class LibvirtConnectionError(Exception): + pass + + class Vm(object): """ Used for abstracting communication between various parts of the @@ -2298,6 +2302,15 @@ # behaviors on VM start/destroy, because the tuning can be # done automatically according to its statistical data. self.cif.ksmMonitor.adjust() + except libvirt.libvirtError as e: + # we cannot continue without a libvirt connection handle + # to avoid state desync or worse split-brain scenarios. + if e.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN: + raise LibvirtConnectionError() + if not self.recovering: + raise + else: + self.log.info("Skipping errors on recovery", exc_info=True) except Exception: if not self.recovering: raise @@ -2325,6 +2338,10 @@ self.recovering = False self.saveState() + except LibvirtConnectionError: + self.recovering = False + # we cannot ever deal with this, not even on recovery. + self.setDownStatus(ERROR, "failed to connect to libvirt") except Exception as e: if self.recovering: self.log.info("Skipping errors on recovery", exc_info=True) -- To view, visit http://gerrit.ovirt.org/25275 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I00ef12883c8035209de0f273925eb8603d6b6da8 Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Francesco Romani <from...@redhat.com> _______________________________________________ vdsm-patches mailing list vdsm-patches@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches