Reviewed: https://review.openstack.org/553067 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5f16e714f58336344752305f94451e7c7c55742c Submitter: Zuul Branch: master
commit 5f16e714f58336344752305f94451e7c7c55742c Author: Matt Riedemann <mriedem...@gmail.com> Date: Wed Mar 14 16:43:22 2018 -0400 libvirt: handle DiskNotFound during update_available_resource The update_available_resource periodic task in the compute manager eventually calls through to the resource tracker and virt driver get_available_resource method, which gets the guests running on the hypervisor, and builds up a set of information about the host. This includes disk information for the active domains. However, the periodic task can race with instances being deleted concurrently and the hypervisor can report the domain but the driver has already deleted the backing files as part of deleting the instance, and this leads to failures when running "qemu-img info" on the disk path which is now gone. When that happens, the entire periodic update fails. This change simply tries to detect the specific failure from 'qemu-img info' and translate it into a DiskNotFound exception which the driver can handle. In this case, if the associated instance is undergoing a task state transition such as moving to another host or being deleted, we log a message and continue. If the instance is in steady state (task_state is not set), then we consider it a failure and re-raise it up. Note that we could add the deleted=False filter to the instance query in _get_disk_over_committed_size_total but that doesn't help us in this case because the hypervisor says the domain is still active and the instance is not actually considered deleted in the DB yet. Change-Id: Icec2769bf42455853cbe686fb30fda73df791b25 Closes-Bug: #1662867 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1662867 Title: update_available_resource_for_node racing instance deletion Status in OpenStack Compute (nova): Fix Released Bug description: Description =========== The following trace was seen multiple times during a CI run for https://review.openstack.org/#/c/383859/ : http://logs.openstack.org/09/395709/7/check/gate-tempest-dsvm-full-devstack-plugin-nfs-nv/a4c1057/logs/screen-n-cpu.txt.gz?level=ERROR#_2017-02-07_19_10_25_548 http://logs.openstack.org/09/395709/7/check/gate-tempest-dsvm-full-devstack-plugin-nfs-nv/a4c1057/logs/screen-n-cpu.txt.gz?level=ERROR#_2017-02-07_19_15_26_004 In the first example a request to terminate the instance 60b7cb32 appears to race an existing run of the update_available_resource_for_node periodic task : req-fa96477b-34d2-4ab6-83bf-24c269ed7c28 http://logs.openstack.org/09/395709/7/check/gate-tempest-dsvm-full- devstack-plugin-nfs- nv/a4c1057/logs/screen-n-cpu.txt.gz?#_2017-02-07_19_10_25_478 req-dc60ed89-d3da-45f6-b98c-8f57c767d751 http://logs.openstack.org/09/395709/7/check/gate-tempest-dsvm-full- devstack-plugin-nfs- nv/a4c1057/logs/screen-n-cpu.txt.gz?#_2017-02-07_19_10_25_548 Steps to reproduce ================== Delete an instance while update_available_resource_for_node is running Expected result =============== Either swallow the exception and move on or lock instances in such a way that they can't be removed while this periodic task is running. Actual result ============= update_available_resource_for_node fails and stops. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ https://review.openstack.org/#/c/383859/ - but it should reproduce against master. 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? n/a 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) n/a To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1662867/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp