** Changed in: nova Status: Fix Committed => Fix Released ** Changed in: nova Milestone: None => liberty-2
-- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1450594 Title: Instance deletion fails sometimes when serial_console is enabled Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) juno series: New Status in OpenStack Compute (nova) kilo series: Fix Committed Bug description: Nova Version: 2014.2.1 For situations where nova-compute is re-trying an instance delete after the original delete failed, and the serial console feature is enabled, the instance delete fails with: 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1179, in cleanup 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] for host, port in self._get_serial_ports_from_instance(instance): 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1197, in _get_serial_ports_from_instance 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] virt_dom = self._lookup_by_name(instance['name']) 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4195, in _lookup_by_name 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] raise exception.InstanceNotFound(instance_id=instance_name) 2015-04-27 16:54:49.900 114127 TRACE nova.compute.manager [instance: 6d117169-4057-4a4a-a0b7-0b12e996caa0] InstanceNotFound: Instance instance-00000444 could not be found. Or, said another way, the _get_serial_ports_from_instance call should maybe not cause an exception if the instance cannot be found. More details/context: In our particular situation, some instance deletes are initially failing because the neutron port delete operation was failing or timing out. So the VM goes to 'error' and remains in the deleting task_state. However, since the failure is on the port delete, the domain has already been undefined in libvirt. The first invocation of _delete_instance calls shutdown_instance before an attempt is made to delete the network. Shutdown_instance is able to successfully call driver.destroy which will shutdown the instance and then runs the cleanup action, ignoring any errors around vif removal. This will undefine the domain as long as it was successfully shutdown. The next time nova-compute is started, it finds the instance still in the deleting task state, so it re-tries the delete. Part of the cleanup call ran by driver.destroy is to remove the serial console. Note: this was already ran and successfully deleted on the first delete when the domain was successfully undefined. But since the domain is no longer defined in libvirt, the _get_serial_ports_from_instance call fails, and again the entire delete operation fails and stops. This makes it impossible to fully delete the instance. When the serial console feature is disabled, this delete re-try operation functions correctly and properly cleans up the rest of the instance, and it transitions to deleted. FWIW, we are also running nova-cells, so the neutron --> nova port notifications do not work/are disabled. Don't know if that's relevant or not. Steps to reproduce: - nova-compute configured with serial console feature enabled - Create an instance which has a serial console configured - Delete that instance, but cause the neutron port delete to fail or timeout (via iptables or just shutting off neutron temporarily) - The instance should now be stuck in the deleting task state - Restart nova-compute - During the re-try of the delete operation, the above stack trace results. Expected result: Retries of instance deletions in this scenario should succeed with the same behavior that happens when the serial console feature is disabled. Proposed Fix: Under: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L761-L765 shorty above this create a variable called isdefined and set it to true when we are checking to see if the domain is defined set the variable isdefined to false Under: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L848-L851 add a test to see if isdefined is false and if it is, do not attempt to get the serial console for the nonexistent domain. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1450594/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp