Reviewed: https://review.openstack.org/551302 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b626c0dc7b113365002e743e6de2aeb40121fc81 Submitter: Zuul Branch: master
commit b626c0dc7b113365002e743e6de2aeb40121fc81 Author: Matthew Booth <mbo...@redhat.com> Date: Fri Mar 9 14:41:49 2018 +0000 Avoid redundant initialize_connection on source post live migration During live migration we update bdm.connection_info for attached volumes in pre_live_migration to reflect the new connection on the destination node. This means that after migration completes the BDM no longer has a reference to the original connection_info to do the detach on the source host. To address this, change I3dfb75eb added a second call to initialize_connection on the source host to re-fetch the source host connection_info before calling disconnect. Unfortunately the cinder driver interface does not strictly require that multiple calls to initialize_connection will return consistent results. Although they normally do in practice, there is at least one cinder driver (delliscsi) which doesn't. This results in a failure to disconnect on the source host post migration. This change avoids the issue entirely by fetching the BDMs prior to modification on the destination node. As well as working round this specific issue, it also avoids a redundant cinder call in all cases. Note that this massively simplifies post_live_migration in the libvirt driver. The complexity removed was concerned with reconstructing the original connection_info. This required considering the cinder v2 and v3 use cases, and reconstructing the multipath_id which was written to connection_info by the libvirt fibrechannel volume connector on connection. These things are not necessary when we just use the original data unmodified. Other drivers affected are Xenapi and HyperV. Xenapi doesn't touch volumes in post_live_migration, so is unaffected. HyperV did not previously account for differences in connection_info between source and destination, so was likely previously broken. This change should fix it. Closes-Bug: #1754716 Closes-Bug: #1814245 Change-Id: I0390c9ff51f49b063f736ca6ef868a4fa782ede5 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1814245 Title: _disconnect_volume incorrectly called for multiattach volumes during post_live_migration Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Bug description: Description =========== Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f introduced a simple check to ensure disconnect_volume is only called when detaching a multi-attach volume from the final instance using it on a given host. That change however doesn't take LM into account and more specifically the call to _disconect_volume during post_live_migration at the end of the migration from the source. At this point the original instance has already moved so the call to objects.InstanceList.get_uuids_by_host will only return one local instance that is using the volume instead of two, allowing disconnect_volume to be called. Depending on the backend being used this call can succeed removing the connection to the volume for the remaining instance or os-brick can fail in situations where it needs to flush I/O etc from the in-use connection. Steps to reproduce ================== * Launch two instances attached to the same multiattach volume on the same host. * LM one of these instances to another host. Expected result =============== No calls to disconnect_volume are made and the remaining instance on the host is still able to access the multi-attach volume. Actual result ============= A call to disconnect_volume is made and the remaining instance is unable to access the volume *or* the LM fails due to os-brick failures to disconnect the in-use volume on the host. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) Libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? LVM/iSCSI with multipath enabled reproduces the os-brick failure. 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== # nova show testvm2 [..] | fault | {"message": "Unexpected error while running command. | | | Command: multipath -f 360014054a424982306a4a659007f73b2 | | | Exit code: 1 | | | Stdout: u'Jan 28 16:09:29 | 360014054a424982306a4a659007f73b2: map in use\ | | | Jan 28 16:09:29 | failed to remove multipath map 360014054a424982306a4a", "code": 500, "details": " | | | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 202, in decorated_function | | | return function(self, context, *args, **kwargs) | | | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 6299, in _post_live_migration | | | migrate_data) | | | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 7744, in post_live_migration | | | self._disconnect_volume(context, connection_info, instance) | | | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 1287, in _disconnect_volume | | | vol_driver.disconnect_volume(connection_info, instance) | | | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py\", line 74, in disconnect_volume | | | self.connector.disconnect_volume(connection_info['data'], None) | | | File \"/usr/lib/python2.7/site-packages/os_brick/utils.py\", line 150, in trace_logging_wrapper | | | result = f(*args, **kwargs) | | | File \"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py\", line 274, in inner | | | return f(*args, **kwargs) | | | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py\", line 848, in disconnect_volume | | | ignore_errors=ignore_errors) | | | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py\", line 885, in _cleanup_connection | | | force, exc) | | | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 219, in remove_connection | | | self.flush_multipath_device(multipath_name) | | | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 275, in flush_multipath_device | | | root_helper=self._root_helper) | | | File \"/usr/lib/python2.7/site-packages/os_brick/executor.py\", line 52, in _execute | | | result = self.__execute(*args, **kwargs) | | | File \"/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py\", line 169, in execute | | | return execute_root(*cmd, **kwargs) | | | File \"/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py\", line 207, in _wrap | | | return self.channel.remote_call(name, args, kwargs) | | | File \"/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py\", line 202, in remote_call | | | raise exc_type(*result[2]) | | | ", "created": "2019-01-28T07:10:09Z"} To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1814245/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp