Hello Dan Kenigsberg,

I'd like you to do a code review.  Please visit

    http://gerrit.ovirt.org/31671

to review the following change.

Change subject: vm: detect migration completed on recovery
......................................................................

vm: detect migration completed on recovery

If VDSM is down when a migration completes, it will
miss the migration completion event on the
onLibvirtLifeCycleEvent callback, and thus needlessly
wait the full migration timeout before actually
asking libvirt for the domain state and get back in sync.

BZ1104733 provided a quite complex demonstration
of the fact this scenario is unlikely, but possible.

To handle this situation, we
- connect early to the domain, on recovery.
  We know this is safe because on recovery we iterate on
  very domain list libvirt provided to us moments before,
  so the domain will be present.
- inspect the domain state *before* waiting for migration
  termination, and skip the wait if the domain is detected
  running.

A nice side-effect of this patch is also to clarify a tiny bit the
creation flows in Vm._run().

Change-Id: I451c2a940693842e9bf7c63ccc117e75026bb11b
Bug-Url: https://bugzilla.redhat.com/1104733
Signed-off-by: Francesco Romani <[email protected]>
Reviewed-on: http://gerrit.ovirt.org/28511
Reviewed-by: Dan Kenigsberg <[email protected]>
---
M vdsm/virt/vm.py
1 file changed, 28 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/71/31671/1

diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py
index 20cd5e3..43c5cb3 100644
--- a/vdsm/virt/vm.py
+++ b/vdsm/virt/vm.py
@@ -2322,7 +2322,6 @@
                     'exitMessage', ''))
             self.recovering = False
         except MigrationError:
-            # cannot happen during recovery
             self.log.exception("Failed to start a migration destination vm")
             self.setDownStatus(ERROR, vmexitreason.MIGRATION_FAILED)
         except Exception as e:
@@ -3148,8 +3147,10 @@
         try:
             status = self._dom.info()
         except AttributeError:
-            # self._dom may be set to None asynchronously. see _onQemuDeath.
-            # If so, the VM is shutting down or already shut down.
+            # Known reasons for this:
+            # * on migration destination, and migration not yet completed.
+            # * self._dom may be set to None asynchronously (_onQemuDeath).
+            #   If so, the VM is shutting down or already shut down.
             return False
         else:
             return status[0] == libvirt.VIR_DOMAIN_RUNNING
@@ -3303,15 +3304,20 @@
         # We should set this event as a last part of drives initialization
         self._pathsPreparedEvent.set()
 
-        if self.conf.get('migrationDest'):
-            return
-        if not self.recovering:
+        initDomain = 'migrationDest' not in self.conf
+        # we need to complete the initialization, including
+        # domDependentInit, after the migration is completed.
+
+        if not self.recovering and initDomain:
             domxml = hooks.before_vm_start(self._buildCmdLine(), self.conf)
             self.log.debug(domxml)
+
         if self.recovering:
             self._dom = NotifyingVirDomain(
                 self._connection.lookupByUUIDString(self.id),
                 self._timeoutExperienced)
+        elif 'migrationDest' in self.conf:
+            pass  # self._dom will be None until migration ends.
         elif 'restoreState' in self.conf:
             fromSnapshot = self.conf.get('restoreFromSnapshot', False)
             srcDomXML = self.conf.pop('_srcDomXML')
@@ -3346,7 +3352,8 @@
                 hooks.after_device_create(dev._deviceXML, self.conf,
                                           dev.custom)
 
-        self._domDependentInit()
+        if initDomain:
+            self._domDependentInit()
 
     def _updateDevices(self, devices):
         """
@@ -3958,9 +3965,20 @@
             hooks.after_vm_dehibernate(self._dom.XMLDesc(0), self.conf,
                                        {'FROM_SNAPSHOT': fromSnapshot})
         elif 'migrationDest' in self.conf:
-            usedTimeout = self._waitForUnderlyingMigration()
-            self._attachLibvirtDomainAfterMigration(
-                self._incomingMigrationFinished.isSet(), usedTimeout)
+            waitMigration = True
+            if self.recovering:
+                try:
+                    if self._isDomainRunning():
+                        waitMigration = False
+                        self.log.info('migration completed while recovering!')
+                except libvirt.libvirtError:
+                    self.log.exception('migration failed while recovering!')
+                    raise MigrationError()
+            if waitMigration:
+                usedTimeout = self._waitForUnderlyingMigration()
+                self._attachLibvirtDomainAfterMigration(
+                    self._incomingMigrationFinished.isSet(), usedTimeout)
+            # else domain connection already established earlier
             self._domDependentInit()
             del self.conf['migrationDest']
             hooks.after_vm_migrate_destination(self._dom.XMLDesc(0), self.conf)


-- 
To view, visit http://gerrit.ovirt.org/31671
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I451c2a940693842e9bf7c63ccc117e75026bb11b
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.5
Gerrit-Owner: Francesco Romani <[email protected]>
Gerrit-Reviewer: Dan Kenigsberg <[email protected]>
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to