Hello Dan Kenigsberg,

I'd like you to do a code review.  Please visit

    http://gerrit.ovirt.org/23939

to review the following change.

Change subject: Avoid going into 'Paused' status during long lasting migrations
......................................................................

Avoid going into 'Paused' status during long lasting migrations

If a migration is taking longer than 'migration_timeout' the VM
moves into 'Paused' state on the destination host.
This patch increases the timeout to 6 hours as an absolute maximum
for the destination to wait for the migration to finish.
This is now reflected as the migration_destination_timeout.

If the VM after the timeout still has the state PAUSED and the
reason is `migration` we're going to raise an MigrationError and
will lead to the destruction of the VM on the destination.

In all other cases we're keeping the previous behaviour and are
continuing with normal VDSM tasks without touching the state of
the VM. That means that if the VM is in PAUSED state due to other
reasons than migration, it will stay PAUSED.

Change-Id: I6bb1c9ae7ead92093c0d300df7c3567ab20b1e09
Bug-Url: https://bugzilla.redhat.com/1059129
Signed-off-by: Vinzenz Feenstra <[email protected]>
Reviewed-on: http://gerrit.ovirt.org/21963
Reviewed-by: Dan Kenigsberg <[email protected]>
---
M lib/vdsm/config.py.in
M vdsm/vm.py
2 files changed, 30 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/39/23939/1

diff --git a/lib/vdsm/config.py.in b/lib/vdsm/config.py.in
index ff4cebe..c85ab88 100644
--- a/lib/vdsm/config.py.in
+++ b/lib/vdsm/config.py.in
@@ -52,6 +52,9 @@
             'recognized by kvm/qemu if a coma separated list given then a '
             'NIC per device will be created.'),
 
+        ('migration_destination_timeout', '21600',
+            'Maximum time the destination waits for the migration to finish.'),
+
         ('migration_timeout', '300',
             'Maximum time the destination waits since migration is stalled. '
             'Please note, that this is not overall migration timeout. '
diff --git a/vdsm/vm.py b/vdsm/vm.py
index f9b241e..d862839 100644
--- a/vdsm/vm.py
+++ b/vdsm/vm.py
@@ -1695,6 +1695,10 @@
         return m
 
 
+class MigrationError(Exception):
+    pass
+
+
 class Vm(object):
     """
     Used for abstracting communication between various parts of the
@@ -3465,23 +3469,37 @@
             hooks.after_vm_dehibernate(self._dom.XMLDesc(0), self.conf,
                                        {'FROM_SNAPSHOT': fromSnapshot})
         elif 'migrationDest' in self.conf:
-            timeout = config.getint('vars', 'migration_timeout')
-            self.log.debug("Waiting %s seconds for end of migration" % timeout)
+            timeout = config.getint('vars', 'migration_destination_timeout')
+            self.log.debug("Waiting %s seconds for end of migration", timeout)
             self._incomingMigrationFinished.wait(timeout)
+
             try:
                 # Would fail if migration isn't successful,
                 # or restart vdsm if connection to libvirt was lost
                 self._dom = NotifyingVirDomain(
                     self._connection.lookupByUUIDString(self.id),
                     self._timeoutExperienced)
-            except Exception as e:
-                # Improve description of exception
+
                 if not self._incomingMigrationFinished.isSet():
-                    newMsg = ('%s - Timed out '
-                              '(did not receive success event)' %
-                              (e.args[0] if len(e.args) else
-                               'Migration Error'))
-                    e.args = (newMsg,) + e.args[1:]
+                    state = self._dom.state(0)
+                    if state[0] == libvirt.VIR_DOMAIN_PAUSED:
+                        if state[1] == libvirt.VIR_DOMAIN_PAUSED_MIGRATION:
+                            raise MigrationError("Migration Error - Timed out "
+                                                 "(did not receive success "
+                                                 "event)")
+                    self.log.debug("NOTE: incomingMigrationFinished event has "
+                                   "not been set and wait timed out after %d "
+                                   "seconds. Current VM state: %d, reason %d. "
+                                   "Continuing with VM initialization anyway.",
+                                   timeout, state[0], state[1])
+            except libvirt.libvirtError as e:
+                if e.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN:
+                    if not self._incomingMigrationFinished.isSet():
+                        newMsg = ('%s - Timed out '
+                                  '(did not receive success event)' %
+                                  (e.args[0] if len(e.args) else
+                                   'Migration Error'))
+                        e.args = (newMsg,) + e.args[1:]
                 raise
 
             self._domDependentInit()


-- 
To view, visit http://gerrit.ovirt.org/23939
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6bb1c9ae7ead92093c0d300df7c3567ab20b1e09
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.3
Gerrit-Owner: Vinzenz Feenstra <[email protected]>
Gerrit-Reviewer: Dan Kenigsberg <[email protected]>
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to