Public bug reported: Description =========== Iab5afdf1b5b8d107ea0e5895c24d50712e7dc7b1 [1] ensured that _cleanup_failed_start is always called if we encounter VIF plugging failures in _create_domain_and_network. However this currently leads to any local instance files being removed as cleanup is called with destroy_disks=True.
As such any failures when resuming or restarting an instance will lead to these files being removed and the instance left in an unbootable state. IMHO these files should only be removed when cleaning up after errors hit while initially spawning an instance. Steps to reproduce ================== - Boot an instance using local disks - Stop the instance - Start the instance, causing a timeout or other failure during plug_vifs - Attempt to start the instance again Expected result =============== The local instance files are left on the host if instances are rebooting or resuming. Actual result ============= The local instance files are removed from the host if _cleanup_failed_start is called. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ $ pwd /opt/stack/nova $ git rev-parse HEAD 42222969a21ee28ef4a68bd5ab1ec8a12c4ad126 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== $ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 test-boot [..] $ nova stop test-boot $ ll ../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/ total 18596 -rw-rw-r--. 1 root root 16699 Jan 31 09:30 console.log -rw-r--r--. 1 root root 10289152 Jan 31 09:30 disk -rw-r--r--. 1 stack libvirtd 257 Jan 31 09:29 disk.info -rw-rw-r--. 1 qemu qemu 4979632 Jan 31 09:29 kernel -rw-rw-r--. 1 qemu qemu 3740163 Jan 31 09:29 ramdisk I used the following change to artificially recreate an issue plugging the VIFs : $ git diff diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py index 33e3157..248e960 100644 --- a/nova/virt/libvirt/driver.py +++ b/nova/virt/libvirt/driver.py @@ -5015,6 +5015,7 @@ class LibvirtDriver(driver.ComputeDriver): pause = bool(events) guest = None try: + raise exception.VirtualInterfaceCreateException() with self.virtapi.wait_for_instance_event( instance, events, deadline=timeout, error_callback=self._neutron_failed_callback): $ nova start test-boot Request to start server test-boot has been accepted. $ nova list +--------------------------------------+-----------+---------+------------+-------------+--------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------+---------+------------+-------------+--------------------------------+ | be6cb386-e005-4fb2-8332-7e0c375ee452 | test-boot | SHUTOFF | - | Shutdown | public=172.24.4.8, 2001:db8::9 | +--------------------------------------+-----------+---------+------------+-------------+--------------------------------+ $ ll ../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/ ls: cannot access '../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/': No such file or directory Future attempts to start the instance will fail as a result : $ nova start test-boot Request to start server test-boot has been accepted. $ vi ../logs/n-cpu.log [..] 5353 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server Traceback (most recent call last): 5354 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming 5355 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 5356 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch 5357 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 5358 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch 5359 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server result = func(ctxt, **new_args) 5360 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 75, in wrapped 5361 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server function_name, call_dict, binary) 5362 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 5363 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise() 5364 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 5365 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 5366 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 66, in wrapped 5367 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return f(self, context, *args, **kw) 5368 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 188, in decorated_function 5369 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance) 5370 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 5371 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise() 5372 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 5373 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 5374 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 157, in decorated_function 5375 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 5376 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/utils.py", line 685, in decorated_function 5377 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 5378 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 216, in decorated_function 5379 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info()) 5380 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 5381 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise() 5382 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 5383 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 5384 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 204, in decorated_function 5385 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 5386 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 2524, in start_instance 5387 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self._power_on(context, instance) 5388 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 2494, in _power_on 5389 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server block_device_info) 5390 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2494, in power_on 5391 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self._hard_reboot(context, instance, network_info, block_device_info) 5392 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2373, in _hard_reboot 5393 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server block_device_info) 5394 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6931, in _get_instance_disk_info 5395 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server dk_size = int(os.path.getsize(path)) 5396 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib64/python2.7/genericpath.py", line 57, in getsize 5397 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return os.stat(filename).st_size 5398 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server OSError: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/disk' 5399 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1660647 Title: _cleanup_failed_start aggressively removes local instance files when handling plug_vif failures Status in OpenStack Compute (nova): New Bug description: Description =========== Iab5afdf1b5b8d107ea0e5895c24d50712e7dc7b1 [1] ensured that _cleanup_failed_start is always called if we encounter VIF plugging failures in _create_domain_and_network. However this currently leads to any local instance files being removed as cleanup is called with destroy_disks=True. As such any failures when resuming or restarting an instance will lead to these files being removed and the instance left in an unbootable state. IMHO these files should only be removed when cleaning up after errors hit while initially spawning an instance. Steps to reproduce ================== - Boot an instance using local disks - Stop the instance - Start the instance, causing a timeout or other failure during plug_vifs - Attempt to start the instance again Expected result =============== The local instance files are left on the host if instances are rebooting or resuming. Actual result ============= The local instance files are removed from the host if _cleanup_failed_start is called. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ $ pwd /opt/stack/nova $ git rev-parse HEAD 42222969a21ee28ef4a68bd5ab1ec8a12c4ad126 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== $ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 test-boot [..] $ nova stop test-boot $ ll ../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/ total 18596 -rw-rw-r--. 1 root root 16699 Jan 31 09:30 console.log -rw-r--r--. 1 root root 10289152 Jan 31 09:30 disk -rw-r--r--. 1 stack libvirtd 257 Jan 31 09:29 disk.info -rw-rw-r--. 1 qemu qemu 4979632 Jan 31 09:29 kernel -rw-rw-r--. 1 qemu qemu 3740163 Jan 31 09:29 ramdisk I used the following change to artificially recreate an issue plugging the VIFs : $ git diff diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py index 33e3157..248e960 100644 --- a/nova/virt/libvirt/driver.py +++ b/nova/virt/libvirt/driver.py @@ -5015,6 +5015,7 @@ class LibvirtDriver(driver.ComputeDriver): pause = bool(events) guest = None try: + raise exception.VirtualInterfaceCreateException() with self.virtapi.wait_for_instance_event( instance, events, deadline=timeout, error_callback=self._neutron_failed_callback): $ nova start test-boot Request to start server test-boot has been accepted. $ nova list +--------------------------------------+-----------+---------+------------+-------------+--------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------+---------+------------+-------------+--------------------------------+ | be6cb386-e005-4fb2-8332-7e0c375ee452 | test-boot | SHUTOFF | - | Shutdown | public=172.24.4.8, 2001:db8::9 | +--------------------------------------+-----------+---------+------------+-------------+--------------------------------+ $ ll ../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/ ls: cannot access '../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/': No such file or directory Future attempts to start the instance will fail as a result : $ nova start test-boot Request to start server test-boot has been accepted. $ vi ../logs/n-cpu.log [..] 5353 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server Traceback (most recent call last): 5354 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming 5355 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 5356 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch 5357 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 5358 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch 5359 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server result = func(ctxt, **new_args) 5360 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 75, in wrapped 5361 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server function_name, call_dict, binary) 5362 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 5363 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise() 5364 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 5365 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 5366 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 66, in wrapped 5367 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return f(self, context, *args, **kw) 5368 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 188, in decorated_function 5369 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance) 5370 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 5371 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise() 5372 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 5373 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 5374 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 157, in decorated_function 5375 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 5376 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/utils.py", line 685, in decorated_function 5377 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 5378 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 216, in decorated_function 5379 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info()) 5380 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 5381 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise() 5382 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 5383 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 5384 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 204, in decorated_function 5385 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 5386 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 2524, in start_instance 5387 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self._power_on(context, instance) 5388 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 2494, in _power_on 5389 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server block_device_info) 5390 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2494, in power_on 5391 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self._hard_reboot(context, instance, network_info, block_device_info) 5392 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2373, in _hard_reboot 5393 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server block_device_info) 5394 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6931, in _get_instance_disk_info 5395 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server dk_size = int(os.path.getsize(path)) 5396 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib64/python2.7/genericpath.py", line 57, in getsize 5397 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return os.stat(filename).st_size 5398 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server OSError: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/disk' 5399 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1660647/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp