Reviewed: https://review.openstack.org/566096 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=80a001989351d3d427c204c8c06cfacc964f2a35 Submitter: Zuul Branch: master
commit 80a001989351d3d427c204c8c06cfacc964f2a35 Author: Matt Riedemann <mriedem...@gmail.com> Date: Thu May 3 11:21:47 2018 -0400 Handle @safe_connect returns None side effect in _ensure_resource_provider Change I0c4ca6a81f213277fe7219cb905a805712f81e36 added more error handling to the _ensure_resource_provider flow but didn't account for @safe_connect returning None when calling _create_resource_provider in the case that nova-compute is started before placement is running. If that happens, we fail with a TypeError during the nova-compute startup because we put None in the resource provider cache and then later blindly try to use it because the compute node resource provider uuid is in the cache, but mapped to None. This adds the None check back in _ensure_resource_provider and if None is returned from _create_resource_provider we raise the same exception that _create_resource_provider would raise if it couldn't create the provider. Change-Id: If9e1581db9c1ae14340b787d03c815d243d5a50c Closes-Bug: #1767139 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1767139 Title: TypeError in _get_inventory_and_update_provider_generation Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: Description =========== Bringing up a new cluster as part of our CI after switch from 16.1.0 to 16.1.1 on Centos, I'm seeing this error on some computes: 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager Traceback (most recent call last): 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6752, in update_available_resource_for_node 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 704, in update_available_resource 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._update_available_resource(context, resources) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager return f(*args, **kwargs) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 728, in _update_available_resource 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._init_compute_node(context, resources) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 585, in _init_compute_node 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._update(context, cn) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 886, in _update 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager inv_data, 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 64, in set_inventory_for_provider 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager inv_data, 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager return getattr(self.instance, __name)(*args, **kwargs) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 789, in set_inventory_for_provider 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._update_inventory(rp_uuid, inv_data) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 56, in wrapper 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager return f(self, *a, **k) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 675, in _update_inventory 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager if self._update_inventory_attempt(rp_uuid, inv_data): 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 562, in _update_inventory_attempt 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager curr = self._get_inventory_and_update_provider_generation(rp_uuid) 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 546, in _get_inventory_and_update_provider_generation 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager if server_gen != my_rp['generation']: 2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager TypeError: 'NoneType' object has no attribute '__getitem__' The error seems persistent for a single run of nova-compute. Steps to reproduce ================== Nodes were started by our CI infrastructure. We start 3 computes and a single control node. In 50% of cases, one of the computes comes up in this bad state. Expected result =============== Working cluster. Actual result ============= At least one of 3 nodes fails to join the cluster, it's not picked up by discover_hosts and I see the above stack trace repeated in the nova-compute logs. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ $ rpm -qa | grep nova python-nova-16.1.1-1.el7.noarch openstack-nova-common-16.1.1-1.el7.noarch python2-novaclient-9.1.1-1.el7.noarch openstack-nova-api-16.1.1-1.el7.noarch openstack-nova-compute-16.1.1-1.el7.noarch 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? $ rpm -qa | grep kvm libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64 qemu-kvm-common-ev-2.9.0-16.el7_4.14.1.x86_64 qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? Not sure 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) Neutron with Calico (I work on Calico, this is our CI system) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1767139/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp