Hello Nir Soffer, Federico Simoncelli, Vinzenz Feenstra, Dan Kenigsberg, Michal Skrivanek,
I'd like you to do a code review. Please visit http://gerrit.ovirt.org/24822 to review the following change. Change subject: vm: recover must finish with libvirt connection ...................................................................... vm: recover must finish with libvirt connection bz106336 highlights an scenario on which a Vm can go Up without a proper libvirt connection established (_dom attribute initialized), and this can lead to a split-brain. If a running vm is reported as not running by vdsm, the engine may start another instance of the same vm on another host, which will likely cause data corruption when two vm are writing to the same shared storage. Description of the scenario: * while VM is running, storage becomes blocked or not responding. * the user restarts VDSM while the storage is still blocked. * VDSM does recovery after restart, but given the fact that storage was still not responding, _normalizeVdsmImg() inside buildConfDevices() fails. * due the way exception are handled, this leads to the skip of the libvirt connection step, leaving the _dom attribute at its default value (None), thus * the VM is disconnected from libvirt and out of sync. We cannot avoid this sequence of events, nor we can prohibit users from restarting VDSM, but we can take measures to avoid to go out of sync. When starting a new VM, storage failure/unavailability is fatal; when recovering, we should handle those kinds of error and ensure we do have a libvirt connection on every Vm object. This patch excplitely handle this storage not responsive condition and let the initialization process continue, in order to make sure we obtain a libvirt handle. Bug-Url: https://bugzilla.redhat.com/1063336 Change-Id: I6c33a62c29fc70170b6f802a20b6016b301e96de Signed-off-by: Francesco Romani <from...@redhat.com> Reviewed-on: http://gerrit.ovirt.org/24423 Reviewed-by: Nir Soffer <nsof...@redhat.com> Reviewed-by: Michal Skrivanek <michal.skriva...@redhat.com> Reviewed-by: Dan Kenigsberg <dan...@redhat.com> Reviewed-by: Federico Simoncelli <fsimo...@redhat.com> Reviewed-by: Vinzenz Feenstra <vfeen...@redhat.com> --- M vdsm/vm.py 1 file changed, 20 insertions(+), 9 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/22/24822/1 diff --git a/vdsm/vm.py b/vdsm/vm.py index 7d4aa51..f2c3473 100644 --- a/vdsm/vm.py +++ b/vdsm/vm.py @@ -1701,6 +1701,10 @@ pass +class StorageUnavailableError(Exception): + pass + + class Vm(object): """ Used for abstracting communication between various parts of the @@ -1825,14 +1829,14 @@ if drv['device'] == 'disk': res = self.cif.irs.getVolumeSize(drv['domainID'], drv['poolID'], drv['imageID'], drv['volumeID']) - try: - drv['truesize'] = res['truesize'] - drv['apparentsize'] = res['apparentsize'] - except KeyError: - self.log.error("Unable to get volume size for %s", - drv['volumeID'], exc_info=True) - raise RuntimeError("Volume %s is corrupted or missing" % - drv['volumeID']) + if res['status']['code'] != 0: + raise StorageUnavailableError("Failed to get size for" + " volume %s", + drv['volumeID']) + + # if a key is missing here, is hsm bug and we cannot handle it. + drv['truesize'] = res['truesize'] + drv['apparentsize'] = res['apparentsize'] else: drv['truesize'] = 0 drv['apparentsize'] = 0 @@ -1939,7 +1943,14 @@ # Normalize vdsm images for drv in devices[DISK_DEVICES]: if isVdsmImage(drv): - self._normalizeVdsmImg(drv) + try: + self._normalizeVdsmImg(drv) + except StorageUnavailableError: + # storage unavailable is not fatal on recovery; + # the storage subsystem monitors the devices + # and will notify when they come up later. + if not self.recovering: + raise # Preserve old behavior. Since libvirt add a memory balloon device # to all guests, we need to specifically request not to add it. -- To view, visit http://gerrit.ovirt.org/24822 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6c33a62c29fc70170b6f802a20b6016b301e96de Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: ovirt-3.3 Gerrit-Owner: Francesco Romani <from...@redhat.com> Gerrit-Reviewer: Dan Kenigsberg <dan...@redhat.com> Gerrit-Reviewer: Federico Simoncelli <fsimo...@redhat.com> Gerrit-Reviewer: Michal Skrivanek <michal.skriva...@redhat.com> Gerrit-Reviewer: Nir Soffer <nsof...@redhat.com> Gerrit-Reviewer: Vinzenz Feenstra <vfeen...@redhat.com> _______________________________________________ vdsm-patches mailing list vdsm-patches@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches