Yep. The actual error thrown was "Unable to detach from guest transient domain.", which is now "Unable to detach the device from the live config." in master. That RetryDecorator makes this function a whole lot harder to read, but with your explanation it seems that the detach was actually timing out, which is consistent with the underlying problem we eventually discovered.
Thanks! I'll close this out. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1836212 Title: libvirt: Failure to recover from failed detach Status in OpenStack Compute (nova): Invalid Bug description: 1020162 ERROR root [req-46fbc6c8-de2c-4afb-9f24-9d75947c9a3c 9ccddbb72e2d42b6ab1a31ad48ea21fb 86bea4eb057b412a98402a1b7e1d9222 - - -] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site- packages/nova/virt/libvirt/guest.py", line 390, in _try_detach_device\n self.detach_device(conf, persistent=persistent, live=live)\n', ' File "/usr/lib/python2.7 /site-packages/nova/virt/libvirt/guest.py", line 467, in detach_device\n self._domain.detachDeviceFlags(device_xml, flags=flags)\n', ' File "/usr/lib/python2.7/site- packages/eventlet/tpool.py", line 186, in doit\n result = proxy_call(self._autowrap, f, *args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call\n rv = execute(f, *args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute\n six.reraise(c, e, tb)\n', ' File "/usr/lib/python2.7 /site-packages/eventlet/tpool.py", line 83, in tworker\n rv = meth(*args, **kwargs)\n', ' File "/usr/lib64/python2.7/site- packages/libvirt.py", line 1194, in detachDeviceFlags\n if ret == -1: raise libvirtError (\'virDomainDetachDeviceFlags() failed\', dom=self)\n', 'libvirtError: invalid argument: no target device vdb\n'] This appears to happen because when we call detach_device_with_retry(live=True) we ultimately call detachDeviceFlags(flags=VIR_DOMAIN_AFFECT_CONFIG | VIR_DOMAIN_AFFECT_LIVE). 'no target device' is the error generated when libvirt failed to remove the device from CONFIG (persistent). This can happen because detachDeviceFlags(flags=VIR_DOMAIN_AFFECT_CONFIG | VIR_DOMAIN_AFFECT_LIVE) will succeed and remove the device from the CONFIG domain as long as the LIVE domain removal was queued, even though this is an asynchronous operation. Consequently, a subsequent check for the device may return the device because it hasn't yet been (and may never be) removed from the LIVE domain, but it has been removed from the CONFIG domain. This will prevent libvirt from attempting to remove the device from the LIVE domain, and so the detach will never succeed. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1836212/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp