[Yahoo-eng-team] [Bug 1839920] Re: Macvtap CI fails on Train
** Changed in: nova Importance: Undecided => High ** Tags added: train-rc-potential ** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Confirmed ** Changed in: nova/train Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839920 Title: Macvtap CI fails on Train Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: Confirmed Bug description: MacVtap CI[1] started to fail after merging commit[2] We think it related to this https://github.com/libvirt/libvirt/commit/b91a33638476cf57d910b6056a8fc11921edd029#diff-28bc83a0c3470bba712dfa6824a79c9d. So they change from setting the admin mac to the effective mac. The problem is that the sriov-nic agent relay on the admin mac to send rpc to the neutron server. If the mac and the pci slot don't much it ignores it and the VM stuck in spawn until timeout [1] https://wiki.openstack.org/wiki/ThirdPartySystems/Mellanox_CI [2] https://review.opendev.org/#/c/31/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839920/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1849165] Re: _populate_assigned_resources raises TypeError: argument of type 'NoneType' is not iterable
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22if%20mig.dest_compute%20%3D%3D%20self.host%20and%20'new_resources'%20in%20mig_ctx%3A%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d ** Also affects: nova/train Importance: Undecided Status: New ** Summary changed: - _populate_assigned_resources raises TypeError: argument of type 'NoneType' is not iterable + _populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration ** Changed in: nova/train Importance: Undecided => High ** Changed in: nova Status: New => Confirmed ** Changed in: nova/train Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1849165 Title: _populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) train series: Confirmed Bug description: Seen here: https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675 Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager [None req- dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu-bionic-rax-iad-0012404623.: TypeError: argument of type 'NoneType' is not iterable Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager Traceback (most recent call last): Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 8925, in _update_available_resource_for_node Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager startup=startup) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in update_available_resource Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist- packages/oslo_concurrency/lockutils.py", line 328, in inner Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager return f(*args, **kwargs) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 965, in _update_available_resource Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager self._populate_assigned_resources(context, instance_by_uuid) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 482, in _populate_assigned_resources Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager if mig.dest_compute == self.host and 'new_resources' in mig_ctx: Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager TypeError: argument of type 'NoneType' is not iterable Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager This was added late in Train: https://review.opendev.org/#/c/678452/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1849165/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1849165] [NEW] _populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration
Public bug reported: Seen here: https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675 Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager [None req-dd5ddbad-4234-4288 -bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu- bionic-rax-iad-0012404623.: TypeError: argument of type 'NoneType' is not iterable Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager Traceback (most recent call last): Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 8925, in _update_available_resource_for_node Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager startup=startup) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in update_available_resource Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager return f(*args, **kwargs) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 965, in _update_available_resource Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager self._populate_assigned_resources(context, instance_by_uuid) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 482, in _populate_assigned_resources Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager if mig.dest_compute == self.host and 'new_resources' in mig_ctx: Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager TypeError: argument of type 'NoneType' is not iterable Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager This was added late in Train: https://review.opendev.org/#/c/678452/ ** Affects: nova Importance: High Status: Confirmed ** Affects: nova/train Importance: High Status: Confirmed ** Tags: resource-tracker -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1849165 Title: _populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) train series: Confirmed Bug description: Seen here: https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675 Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager [None req- dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu-bionic-rax-iad-0012404623.: TypeError: argument of type 'NoneType' is not iterable Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager Traceback (most recent call last): Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 8925, in _update_available_resource_for_node Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager startup=startup) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in update_available_resource Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist- packages/oslo_concurrency/lockutils.py", line 328, in inner Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova- compute[26938]: ERROR
[Yahoo-eng-team] [Bug 1848514] Re: Booting from volume providing an image fails
Hmm, did something change in Stein on the Cinder side to enforce the update_volume_admin_metadata policy rule on the os-attach API? I'm not aware of anything that has changed on the nova side in stein that would be related to this. ** Also affects: cinder Importance: Undecided Status: New ** Tags added: policy volumes -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848514 Title: Booting from volume providing an image fails Status in Cinder: New Status in OpenStack Compute (nova): New Bug description: Trying to create an instance (booting from volume when specifying an image) fails. Running Stein (19.0.1) ### When using: ### nova boot --flavor FLAVOR_ID --block-device source=image,id=IMAGE_ID,dest=volume,size=10,shutdown=preserve,bootindex=0 INSTANCE_NAME ### nova-compute logs: ### Instance failed block device setup Forbidden: Policy doesn't allow volume:update_volume_admin_metadata to be performed. (HTTP 403) (Request-ID: req-875cc6e1-ffe1-45dd-b942-944166c6040a) The full trace: http://paste.openstack.org/raw/784535/ Definitely this is a policy issue! Our cinder policy: "volume:update_volume_admin_metadata": "rule:admin_api" (default) Using an user with admin credentials works as expected! Is this expected? we didn't identified this behaviour previously (before stein) using the same policy for "update_volume_admin_metadata" Found an old similar report: https://bugs.launchpad.net/nova/+bug/1661189 To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/1848514/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1848499] [NEW] powervm driver tests fail with networkx 2.4: "AttributeError: 'DiGraph' object has no attribute 'node'"
Public bug reported: https://c6fecb2db5c55fa0effa- 6229cc6450d9b491384804026d2fbd81.ssl.cf5.rackcdn.com/688980/1/gate /openstack-tox-py36/71a8bdd/testr_results.html.gz ft1.2: nova.tests.unit.virt.powervm.tasks.test_vm.TestVMTasks.test_power_on_revert_StringException: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/mock/mock.py", line 1330, in patched return func(*args, **keywargs) File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/virt/powervm/tasks/test_vm.py", line 90, in test_power_on_revert self.assertRaises(ValueError, tf_eng.run, flow) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py", line 485, in assertRaises self.assertThat(our_callable, matcher) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py", line 496, in assertThat mismatch_error = self._matchHelper(matchee, matcher, message, verbose) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py", line 547, in _matchHelper mismatch = matcher.match(matchee) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/matchers/_exception.py", line 108, in match mismatch = self.exception_matcher.match(exc_info) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/matchers/_higherorder.py", line 62, in match mismatch = matcher.match(matchee) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py", line 475, in match reraise(*matchee) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/_compat3x.py", line 16, in reraise raise exc_obj.with_traceback(exc_tb) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/matchers/_exception.py", line 101, in match result = matchee() File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/testtools/testcase.py", line 1049, in __call__ return self._callable_object(*self._args, **self._kwargs) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/helpers.py", line 162, in run engine.run() File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run for _state in self.run_iter(timeout=timeout): File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 271, in run_iter self.compile() File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/fasteners/lock.py", line 306, in wrapper return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 470, in compile self._runtime.compile() File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/runtime.py", line 143, in compile metadata['edge_deciders'] = tuple(deciders_it) File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/taskflow/engines/action_engine/runtime.py", line 75, in _walk_edge_deciders u_node_kind = graph.node[u_node]['kind'] AttributeError: 'DiGraph' object has no attribute 'node' Seems this is since networkx 2.4 was released 11 hours ago: https://pypi.org/project/networkx/2.4/ And upper-constraints aren't being honored for some reason: networkx===2.2;python_version=='2.7' networkx===2.3;python_version=='3.4' networkx===2.3;python_version=='3.5' networkx===2.3;python_version=='3.6' networkx===2.3;python_version=='3.7' I guess maybe because they are a transitive dependency of taskflow which the powervm driver depends on? ** Affects: nova Importance: Critical Status: Confirmed ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848499 Title: powervm driver tests fail with networkx 2.4: "AttributeError: 'DiGraph' object has no attribute 'node'" Status in OpenStack Compute (nova): Confirmed Bug description: https://c6fecb2db5c55fa0effa- 6229cc6450d9b491384804026d2fbd81.ssl.cf5.rackcdn.com/688980/1/gate /openstack-tox-py36/71a8bdd/testr_results.html.gz ft1.2: nova.tests.unit.virt.powervm.tasks.test_vm.TestVMTasks.test_power_on_revert_StringException: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/mock/mock.py", line 1330, in patched
[Yahoo-eng-team] [Bug 1848442] Re: The request method of "os-floating-ips" should be DELETE
That API is for nova-network only which we are removing so eventually that API is just going to return a 410 response and won't be used anyway. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848442 Title: The request method of "os-floating-ips" should be DELETE Status in OpenStack Compute (nova): Won't Fix Bug description: Bulk-deletes floating IPs,the request method of /os-floating-ips- bulk/delete should be DELETE To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848442/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1848373] Re: Instance.save(expected_task_state=) is passed string in many locations
Looks like expected_task_state is pulled from the values dict here: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/db/sqlalchemy/api.py#L2850 and if not None converted to a list here: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/db/sqlalchemy/api.py#L2857 So I guess that's why things work and I can close this bug - there are wrong uses of expected_task_state for the Instance.save but the DB API handles it. ** Changed in: nova Status: Triaged => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848373 Title: Instance.save(expected_task_state=) is passed string in many locations Status in OpenStack Compute (nova): Invalid Bug description: I noticed this in some code I was writing when it didn't behave like I expected: https://review.opendev.org/#/c/627891/63/nova/conductor/tasks/cross_cell_migrate.py@423 https://review.opendev.org/#/c/688832/2/nova/conductor/tasks/cross_cell_migrate.py@781 That "works" because strings are iterable but it's not the intended use of that kwarg which should be None or a list or tuple: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/objects/instance.py#L758 We have several places that incorrectly pass a string though, here are a couple: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/api.py#L3228 https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2554 https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L3103 The Instance.save() method should probably assert that if the value is not None that it's not a string type since the latter is a coding error. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848373/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1848373] [NEW] Instance.save(expected_task_state=) is passed string in many locations
Public bug reported: I noticed this in some code I was writing when it didn't behave like I expected: https://review.opendev.org/#/c/627891/63/nova/conductor/tasks/cross_cell_migrate.py@423 https://review.opendev.org/#/c/688832/2/nova/conductor/tasks/cross_cell_migrate.py@781 That "works" because strings are iterable but it's not the intended use of that kwarg which should be None or a list or tuple: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/objects/instance.py#L758 We have several places that incorrectly pass a string though, here are a couple: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/api.py#L3228 https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2554 https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L3103 The Instance.save() method should probably assert that if the value is not None that it's not a string type since the latter is a coding error. ** Affects: nova Importance: Medium Status: Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848373 Title: Instance.save(expected_task_state=) is passed string in many locations Status in OpenStack Compute (nova): Triaged Bug description: I noticed this in some code I was writing when it didn't behave like I expected: https://review.opendev.org/#/c/627891/63/nova/conductor/tasks/cross_cell_migrate.py@423 https://review.opendev.org/#/c/688832/2/nova/conductor/tasks/cross_cell_migrate.py@781 That "works" because strings are iterable but it's not the intended use of that kwarg which should be None or a list or tuple: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/objects/instance.py#L758 We have several places that incorrectly pass a string though, here are a couple: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/api.py#L3228 https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2554 https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L3103 The Instance.save() method should probably assert that if the value is not None that it's not a string type since the latter is a coding error. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848373/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1848343] Re: MigrationTask rollback can leak allocations for a deleted server
** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848343 Title: MigrationTask rollback can leak allocations for a deleted server Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) train series: New Bug description: This came up in the cross-cell resize review: https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495 And I was able to recreate with a functional test here: https://review.opendev.org/#/c/688832/ That test is doing a cross-cell cold migration but looking at the code: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461 We can hit an issue for same-cell resize/cold migrate if we have swapped the allocations so the source node allocations are held by the migration consumer and the instance holds allocations on the target node (created by the scheduler): https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328 If something fails between ^ and the cast to prep_resize, the task will rollback and revert the allocations so the target node allocations are dropped and the source node allocations are moved back to the instance: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91 Furthermore, if the instance was deleted when we perform that swap, the move_allocations method will recreate the allocations on the source node for the now-deleted instance since we don't assert consumer generations during the swap: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886 This results in leaking allocations for the source node since the instance is deleted. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1848343] [NEW] MigrationTask rollback can leak allocations for a deleted server
Public bug reported: This came up in the cross-cell resize review: https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495 And I was able to recreate with a functional test here: https://review.opendev.org/#/c/688832/ That test is doing a cross-cell cold migration but looking at the code: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461 We can hit an issue for same-cell resize/cold migrate if we have swapped the allocations so the source node allocations are held by the migration consumer and the instance holds allocations on the target node (created by the scheduler): https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328 If something fails between ^ and the cast to prep_resize, the task will rollback and revert the allocations so the target node allocations are dropped and the source node allocations are moved back to the instance: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91 Furthermore, if the instance was deleted when we perform that swap, the move_allocations method will recreate the allocations on the source node for the now-deleted instance since we don't assert consumer generations during the swap: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886 This results in leaking allocations for the source node since the instance is deleted. ** Affects: nova Importance: Undecided Status: Triaged ** Tags: cold-migrate placement resize ** Changed in: nova Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848343 Title: MigrationTask rollback can leak allocations for a deleted server Status in OpenStack Compute (nova): Triaged Bug description: This came up in the cross-cell resize review: https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495 And I was able to recreate with a functional test here: https://review.opendev.org/#/c/688832/ That test is doing a cross-cell cold migration but looking at the code: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461 We can hit an issue for same-cell resize/cold migrate if we have swapped the allocations so the source node allocations are held by the migration consumer and the instance holds allocations on the target node (created by the scheduler): https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328 If something fails between ^ and the cast to prep_resize, the task will rollback and revert the allocations so the target node allocations are dropped and the source node allocations are moved back to the instance: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91 Furthermore, if the instance was deleted when we perform that swap, the move_allocations method will recreate the allocations on the source node for the now-deleted instance since we don't assert consumer generations during the swap: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886 This results in leaking allocations for the source node since the instance is deleted. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1841481] Re: Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache
Hits in ironic multinode jobs: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Skipping%20removal%20of%20allocations%20for%20deleted%20instances%3A%20Failed%20to%20retrieve%20allocations%20for%20resource%20provider%5C%22%20AND%20message%3A%5C%22No%20resource%20provider%20with%20uuid%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22%20AND%20project%3A%5C%22openstack%2Fironic%5C%22=7d We don't have an elastic-recheck query for that since none of the jobs it hits on are voting. ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/ocata Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1841481 Title: Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) ocata series: New Status in OpenStack Compute (nova) pike series: New Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) train series: New Bug description: Seen with an ironic re-balance in this job: https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check /ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/ On the subnode we see the RT detect that the node is moving hosts: Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: INFO nova.compute.resource_tracker [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42 -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-0010443319 On that new host, the ProviderTree cache is getting updated with refreshed associations for inventory: Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:761}} aggregates: Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate associations for resource provider 61dbc9c7-828b-4c42-b19c- a3716037965f, aggregates: None {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:770}} and traits - but when we get traits the provider is gone: Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: ResourceProviderTraitRetrievalFailed: Failed to get traits for resource provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager Traceback (most recent call last): Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager startup=startup) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager return f(*args, **kwargs) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in _update_available_resource Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager is_new_compute_node = self._init_compute_node(context, resources) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File
[Yahoo-eng-team] [Bug 1836754] Re: Conflict when deleting allocations for an instance that hasn't finished building
This goes back to Stein because https://review.opendev.org/#/c/591597/ changed the method from using DELETE /allocations/{consumer_id} to the GET/PUT dance. ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1836754 Title: Conflict when deleting allocations for an instance that hasn't finished building Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) stein series: New Status in OpenStack Compute (nova) train series: New Bug description: Description === When deleting an instance that hasn't finished building, we'll sometimes get a 409 from placement as such: Failed to delete allocations for consumer 6494d4d3-013e-478f- 9ac1-37ca7a67b776. Error: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.nn Inventory and/or allocations changed while attempting to allocate: Another thread concurrently updated the data. Please retry your update ", "code": "placement.concurrent_update", "request_id": "req-6dcd766b-f5d3-49fa-89f3-02e64079046a"}]} Steps to reproduce == 1. Boot an instance 2. Don't wait for it to become active 3. Delete it immediately Expected result === The instance deletes successfully. Actual result = Nova bubbles up that error from Placement. Logs & Configs == This is being hit at a low rate in various CI tests, logstash query is here: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Inventory%20and%2For%20allocations%20changed%20while%20attempting%20to%20allocate%3A%20Another%20thread%20concurrently%20updated%20the%20data%5C%22%20AND%20filename%3A%5C %22job-output.txt%5C%22 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1836754/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1737131] Re: Superfluous re-mount attempts with the Quobyte Nova driver and multi-registry volume URLs
** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova/queens Status: New => In Progress ** Changed in: nova/queens Importance: Undecided => Low ** Changed in: nova/queens Assignee: (unassigned) => Silvan Kaiser (2-silvan) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1737131 Title: Superfluous re-mount attempts with the Quobyte Nova driver and multi- registry volume URLs Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Committed Bug description: When using a multi-registry volume URL in the Cinder Quobyte driver the Nova Quobyte driver does not detect existing mounts correctly. Upon trying to mount the given volume the driver fails because the mount already exists: [..] 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/block_device.py", line 389, in attach 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server device_type=self['device_type'], encryption=encryption) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1248, in attach_volume 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server self._connect_volume(connection_info, disk_info, instance) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1181, in _connect_volume 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server vol_driver.connect_volume(connection_info, disk_info, instance) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/volume/quobyte.py", line 147, in connect_volume 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server CONF.libvirt.quobyte_client_cfg) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/volume/quobyte.py", line 61, in mount_volume 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server utils.execute(*command) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/utils.py", line 229, in execute 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server return processutils.execute(*cmd, **kwargs) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 419, in execute 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server cmd=sanitized_cmd) 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command. 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Command: mount.quobyte --disable-xattrs 78.46.57.153:7861,78.46.57.153:7861,78.46.57.153:7861/82000e41-c6ac-4be2-b31a-0543db93767c /mnt/quobyte-volume/531b7439e360bdea0a79870354906cab 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Exit code: 4 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Stdout: u'mount.quobyte failed: Unable to initialize mount point\n' 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server Stderr: u"Logging to file /opt/stack/logs/quobyte_client.log.\nfuse: mountpoint is not empty\nfuse: if you are sure this is safe, use the 'nonempty' mount option\n" 2017-12-08 08:32:29.277 25660 ERROR oslo_messaging.rpc.server To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1737131/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1835400] Re: Issues booting with os_distro=centos7.0
** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/queens Status: New => In Progress ** Changed in: nova/rocky Status: New => In Progress ** Tags added: libvirt ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova/stein Importance: Undecided => Medium ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova/stein Status: New => In Progress ** Changed in: nova/queens Assignee: (unassigned) => Lee Yarwood (lyarwood) ** Changed in: nova/rocky Assignee: (unassigned) => Lee Yarwood (lyarwood) ** Changed in: nova/stein Assignee: (unassigned) => Lee Yarwood (lyarwood) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1835400 Title: Issues booting with os_distro=centos7.0 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Bug description: If we have os_distro=centos this isn't known by os-info, so we get: Cannot find OS information - Reason: (No configuration information found for operating system centos7): OsInfoNotFound: No configuration information found for operating system centos7 If we "fix" it to os_distro=centos7.0 we get: Instance failed to spawn: UnsupportedHardware: Requested hardware 'virtio1.0-net' is not supported by the 'kvm' virt driver This is with Rocky, but was also happening with Queens, I believe. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1835400/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1847302] [NEW] doc: need admin guide for the image cache
Public bug reported: There is no documentation for the image cache, so we should add one to the admin guide. I think a relatively simple beginning would include: - A high level description of what an image cache is, where it lives, and the benefits. - Which compute drivers support image cache (that's not detailed here either: https://docs.openstack.org/nova/latest/user/support- matrix.html), this is any driver that supports the "has_imagecache" driver capability (currently libvirt, hyperv and vmware). - The related configuration options since the options are not in a particular config option group they are all spread across DEFAULT (moving those to an [imagecache] group would probably be useful as well outside the docs change). More advanced topics could be things like known issues/limitations (maybe mdbooth can help here), some of which is probably covered in this spec: https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved /image-precache-support.html ** Affects: nova Importance: Undecided Status: New ** Tags: doc image-cache -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1847302 Title: doc: need admin guide for the image cache Status in OpenStack Compute (nova): New Bug description: There is no documentation for the image cache, so we should add one to the admin guide. I think a relatively simple beginning would include: - A high level description of what an image cache is, where it lives, and the benefits. - Which compute drivers support image cache (that's not detailed here either: https://docs.openstack.org/nova/latest/user/support- matrix.html), this is any driver that supports the "has_imagecache" driver capability (currently libvirt, hyperv and vmware). - The related configuration options since the options are not in a particular config option group they are all spread across DEFAULT (moving those to an [imagecache] group would probably be useful as well outside the docs change). More advanced topics could be things like known issues/limitations (maybe mdbooth can help here), some of which is probably covered in this spec: https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved /image-precache-support.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1847302/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1833581] Re: instance stuck in BUILD state if nova-compute is restarted
This is extremely latent but I've marked it going back to at least queens since that's currently our oldest non-extended maintenance branch. ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/queens Status: New => Confirmed ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/train Status: New => Confirmed ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/train Importance: Undecided => Critical ** Changed in: nova/stein Importance: Undecided => Low ** Changed in: nova/rocky Importance: Undecided => Low ** Changed in: nova/queens Importance: Undecided => Low ** Changed in: nova/train Importance: Critical => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1833581 Title: instance stuck in BUILD state if nova-compute is restarted Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Status in OpenStack Compute (nova) train series: Confirmed Bug description: Description === Instance stuck in BUILD state indefinitely if nova-compute service restarted in the mean time. Even after the instance_build_timeout the instance is not put into ERROR state. Steps to reproduce == 1) Start 10 VMs in parallel to increase the chance of hitting the bug $ for NUM in `seq 1 1 10`; do openstack server create --flavor c1 --image cirros-0.4.0-x86_64-disk --availability-zone nova:ubuntu vm$NUM & done 2) when the first instance reach the BUILD state restart the nova-compute service $ sudo systemctl restart devstack@n-cpu.service 3) Observer that instance states after the compute is up again. Expected result === Instances either in ACTIVE or in ERROR state. Actual result = Some instance stuck in BUILD state. Environment === all in one devstack build from recent nova master 61558f274842b149044a14bbe7537b9f278035fd Logs & Configs == stack@ubuntu:~$ openstack server list +--+--+++--+---+ | ID | Name | Status | Networks | Image| Flavor| +--+--+++--+---+ | 9ee76601-4a61-4682-86f1-743dac2b05e6 | vm3 | BUILD | | cirros-0.4.0-x86_64-disk | cirros256 | | e459beae-ccb5-4781-b938-2dff68e33bf7 | vm9 | ACTIVE | public=2001:db8::181, 172.24.4.44 | cirros-0.4.0-x86_64-disk | cirros256 | | 562f44db-cd51-4516-bce9-598bd29c6310 | vm10 | ERROR | public=2001:db8::3a1, 172.24.4.196 | cirros-0.4.0-x86_64-disk | cirros256 | | 73f1e2c6-78a1-44c5-b178-7adcf9bf58a0 | vm5 | ERROR | public=2001:db8::21, 172.24.4.177 | cirros-0.4.0-x86_64-disk | cirros256 | | 1b01acfc-b798-48f9-b808-6cfd0d5cd3fb | vm6 | ERROR | public=2001:db8::3e1, 172.24.4.20 | cirros-0.4.0-x86_64-disk | cirros256 | | c709e3bf-9c71-4f64-bad3-e9e07e911f62 | vm7 | ERROR | public=2001:db8::231, 172.24.4.46 | cirros-0.4.0-x86_64-disk | cirros256 | | 538d2534-98f1-4e11-9bbb-b4e74bab8c65 | vm4 | ERROR | public=2001:db8::3e9, 172.24.4.157 | cirros-0.4.0-x86_64-disk | cirros256 | | ed74eb32-00fe-4f24-9379-c57c04ce9af1 | vm2 | ERROR | public=2001:db8::f5, 172.24.4.53 | cirros-0.4.0-x86_64-disk | cirros256 | | 582b5356-4f3d-42ed-937e-966580303af0 | vm8 | ERROR | public=2001:db8::92, 172.24.4.16 | cirros-0.4.0-x86_64-disk | cirros256 | | ae36ffca-e4d6-4353-8e7e-41db500a5e0d | vm1 | ERROR | public=2001:db8::1cf, 172.24.4.203 | cirros-0.4.0-x86_64-disk | cirros256 | +--+--+++--+---+ stack@ubuntu:~$ openstack server show 9ee76601-4a61-4682-86f1-743dac2b05e6 +-+-+ | Field | Value | +-+-+ | OS-DCF:diskConfig | MANUAL
[Yahoo-eng-team] [Bug 1847131] [NEW] UnboundLocalError: local variable 'cell_uuid' referenced before assignment
Public bug reported: https://review.opendev.org/#/c/684118/ recently merged and is causing an issue because a variable used in the log message isn't in scope: Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server [None req-72524ba6-86bf-479d-a09f-9a9d302f7d2f demo demo] Exception during message handling: UnboundLocalError: local variable 'cell_uuid' referenced before assignment Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 235, in inner Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server return func(*args, **kwargs) Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/manager.py", line 214, in select_destinations Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server allocation_request_version, return_alternates) Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 96, in select_destinations Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server allocation_request_version, return_alternates) Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 152, in _schedule Oct 07 07:16:51.372050 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server provider_summaries) Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 494, in _get_all_host_states Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server spec_obj) Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 774, in get_host_states_by_uuids Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server context, cells, compute_uuids=compute_uuids) Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 640, in _get_computes_for_cells Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server targeted_operation) Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/context.py", line 449, in scatter_gather_cells Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server cell_uuid, exc_info=True) Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server UnboundLocalError: local variable 'cell_uuid' referenced before assignment Oct 07 07:16:51.374461 ubuntu-bionic-ovh-bhs1-0012185489 nova-scheduler[28235]: ERROR oslo_messaging.rpc.server The fix is here: https://review.opendev.org/#/c/686996/ Apparently we don't have test coverage for that code.
[Yahoo-eng-team] [Bug 1552071] Re: Deleted instances didn't show when calling "nova list --deleted" by non-admin users
To capture what I said in the now abandoned patch: "This would change something that's not an error to an error, regardless of the weird latent behavior. Because of that, I think this would require a microversion which means we'd need a spec if we wanted to change this. gmann was compiling a list of random cleanup items for the compute API in an etherpad I believe, and this is something that could probably go in that list as a candidate for something to cleanup in a mass cleanup microversion." ** Changed in: nova Importance: Undecided => Wishlist ** Changed in: nova Status: In Progress => Opinion ** Changed in: nova Assignee: huanhongda (hongda) => (unassigned) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1552071 Title: Deleted instances didn't show when calling "nova list --deleted" by non-admin users Status in OpenStack Compute (nova): Opinion Bug description: When calling "nova list --deleted" using non-admin context, no instance in "DELETED" will return: root@SZX158625:/opt/devstack# nova list --deleted +--+-+++-+-+ | ID | Name| Status | Task State | Power State | Networks| +--+-+++-+-+ | 40bab05f-0692-43df-a8a9-e7c0d58a73bd | test_inject | ACTIVE | - | Running | private=10.0.0.13, fdb7:5d7b:6dcd:0:f816:3eff:fe63:b012 | | ee8907c7-0730-4051-8426-64be44300e70 | test_inject | ACTIVE | - | Running | private=10.0.0.14, fdb7:5d7b:6dcd:0:f816:3eff:fe4f:1b32 | +--+-+++-+-+ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1552071/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1783565] Re: ServerGroupTestV21.test_evacuate_with_anti_affinity_no_valid_host intermittently fails with "Instance compute service state on host2 expected to be down, but it was
We don't seem to be hitting this in the gate anymore so I'm not sure if it's just rare now or if it's resolved some other way: http://status.openstack.org/elastic-recheck/#1783565 I'm marking invalid for now though. We can re-open if necessary. ** Changed in: nova Assignee: Zhenyu Zheng (zhengzhenyu) => (unassigned) ** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1783565 Title: ServerGroupTestV21.test_evacuate_with_anti_affinity_no_valid_host intermittently fails with "Instance compute service state on host2 expected to be down, but it was up." Status in OpenStack Compute (nova): Invalid Bug description: http://logs.openstack.org/32/584032/5/check/nova-tox-functional- py35/7061ec1/job-output.txt.gz#_2018-07-25_03_16_46_462415 18-07-25 03:16:46.418499 | ubuntu-xenial | {5} nova.tests.functional.test_server_group.ServerGroupTestV21.test_evacuate_with_anti_affinity_no_valid_host [14.070214s] ... FAILED 2018-07-25 03:16:46.418582 | ubuntu-xenial | 2018-07-25 03:16:46.418645 | ubuntu-xenial | Captured traceback: 2018-07-25 03:16:46.418705 | ubuntu-xenial | ~~~ 2018-07-25 03:16:46.418798 | ubuntu-xenial | b'Traceback (most recent call last):' 2018-07-25 03:16:46.419095 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/test_server_group.py", line 456, in test_evacuate_with_anti_affinity_no_valid_host' 2018-07-25 03:16:46.419232 | ubuntu-xenial | b" self.admin_api.post_server_action(servers[1]['id'], post)" 2018-07-25 03:16:46.419471 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/api/client.py", line 294, in post_server_action' 2018-07-25 03:16:46.419602 | ubuntu-xenial | b"'/servers/%s/action' % server_id, data, **kwargs).body" 2018-07-25 03:16:46.419841 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/api/client.py", line 235, in api_post' 2018-07-25 03:16:46.419975 | ubuntu-xenial | b'return APIResponse(self.api_request(relative_uri, **kwargs))' 2018-07-25 03:16:46.420187 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/functional/api/client.py", line 213, in api_request' 2018-07-25 03:16:46.420263 | ubuntu-xenial | b'response=response)' 2018-07-25 03:16:46.420545 | ubuntu-xenial | b'nova.tests.functional.api.client.OpenStackApiException: Unexpected status code: {"badRequest": {"message": "Compute service of host2 is still in use.", "code": 400}}' 2018-07-25 03:16:46.420581 | ubuntu-xenial | b'' 2018-07-25 03:16:46.420606 | ubuntu-xenial | 2018-07-25 03:16:46.420654 | ubuntu-xenial | Captured stderr: 2018-07-25 03:16:46.420702 | ubuntu-xenial | 2018-07-25 03:16:46.421102 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/oslo_db/sqlalchemy/enginefacade.py:350: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade' 2018-07-25 03:16:46.421240 | ubuntu-xenial | b' self._legacy_facade = LegacyEngineFacade(None, _factory=self)' 2018-07-25 03:16:46.421623 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/oslo_db/sqlalchemy/enginefacade.py:350: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade' 2018-07-25 03:16:46.421751 | ubuntu-xenial | b' self._legacy_facade = LegacyEngineFacade(None, _factory=self)' 2018-07-25 03:16:46.422054 | ubuntu-xenial | b"/home/zuul/src/git.openstack.org/openstack/nova/nova/test.py:323: DeprecationWarning: Using class 'MoxStubout' (either directly or via inheritance) is deprecated in version '3.5.0'" 2018-07-25 03:16:46.422174 | ubuntu-xenial | b' mox_fixture = self.useFixture(moxstubout.MoxStubout())' 2018-07-25 03:16:46.422537 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/paste/deploy/loadwsgi.py:22: DeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately.' 2018-07-25 03:16:46.422664 | ubuntu-xenial | b' return pkg_resources.EntryPoint.parse("x=" + s).load(False)' 2018-07-25 03:16:46.422928 | ubuntu-xenial | b"/home/zuul/src/git.openstack.org/openstack/nova/nova/db/sqlalchemy/api.py:205: DeprecationWarning: Property 'async_compat' has moved to 'function.async_'" 2018-07-25 03:16:46.423038 | ubuntu-xenial | b' reader_mode = get_context_manager(context).async' 2018-07-25 03:16:46.423301 | ubuntu-xenial |
[Yahoo-eng-team] [Bug 1846777] [NEW] Inefficient/redundant image GETs during large boot from volume server create requests with the same image
Public bug reported: This is demonstrated by this functional test patch: https://review.opendev.org/#/c/686734/ That adds a test which creates a single server create request to create 10 servers and each server has 255 BDMs using the same image and asserts that the API calls GET /v2/images/{image_id} on the same image 2551 times which is pretty inefficient. For the lifetime of the server create request we should be smarter and cache the results of each image we get so we don't make the same redundant calls to the image service. ** Affects: nova Importance: Low Status: Confirmed ** Tags: api performance ** Summary changed: - Inefficient image GET during large boot from volume server create requests + Inefficient/redundant image GETs during large boot from volume server create requests with the same image -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846777 Title: Inefficient/redundant image GETs during large boot from volume server create requests with the same image Status in OpenStack Compute (nova): Confirmed Bug description: This is demonstrated by this functional test patch: https://review.opendev.org/#/c/686734/ That adds a test which creates a single server create request to create 10 servers and each server has 255 BDMs using the same image and asserts that the API calls GET /v2/images/{image_id} on the same image 2551 times which is pretty inefficient. For the lifetime of the server create request we should be smarter and cache the results of each image we get so we don't make the same redundant calls to the image service. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846777/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1846656] [NEW] Compute API in nova - show/list servers with details says security_groups is required in response but it's optional
Public bug reported: - [x] This doc is inaccurate in this way: This came up in review: https://review.opendev.org/#/c/685927/2//COMMIT_MSG@9 https://docs.openstack.org/api-ref/compute/#show-server-details and https://docs.openstack.org/api-ref/compute/#list-servers-detailed response parameter tables both say that "security_groups" is a required field in the response but that's not true if the server does not have any attached ports which is possible. This is the server view builder code: https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L627 Note the key is not in the GET response if the server is not attached to any ports that have security groups. I recreated in devstack by creating a server with no network: $ openstack --os-compute-api-version 2.37 server create --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic none --wait vm-no-net And the security_groups key is not in the GET /servers/detail response: $ curl -H "X-Auth-Token: $token" http://10.128.0.6/compute/v2.1/servers/detail | python -m json.tool | grep security_groups % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 1388 100 13880 0 8213 0 --:--:-- --:--:-- --:--:-- 8213 --- Release: on 2019-09-19 17:55:19 SHA: 9ca14e081860b1abcc0d676f253a472028690e29 Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst URL: https://docs.openstack.org/api-ref/compute/ ** Affects: nova Importance: Low Status: Triaged ** Tags: api-ref doc low-hanging-fruit -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846656 Title: Compute API in nova - show/list servers with details says security_groups is required in response but it's optional Status in OpenStack Compute (nova): Triaged Bug description: - [x] This doc is inaccurate in this way: This came up in review: https://review.opendev.org/#/c/685927/2//COMMIT_MSG@9 https://docs.openstack.org/api-ref/compute/#show-server-details and https://docs.openstack.org/api-ref/compute/#list-servers-detailed response parameter tables both say that "security_groups" is a required field in the response but that's not true if the server does not have any attached ports which is possible. This is the server view builder code: https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L627 Note the key is not in the GET response if the server is not attached to any ports that have security groups. I recreated in devstack by creating a server with no network: $ openstack --os-compute-api-version 2.37 server create --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic none --wait vm-no-net And the security_groups key is not in the GET /servers/detail response: $ curl -H "X-Auth-Token: $token" http://10.128.0.6/compute/v2.1/servers/detail | python -m json.tool | grep security_groups % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 1388 100 13880 0 8213 0 --:--:-- --:--:-- --:--:-- 8213 --- Release: on 2019-09-19 17:55:19 SHA: 9ca14e081860b1abcc0d676f253a472028690e29 Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst URL: https://docs.openstack.org/api-ref/compute/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1846559] [NEW] Handling Down Cells in nova - security_groups can be in the response for GET /servers/detail
Public bug reported: - [x] This doc is inaccurate in this way: This came up during a review to remove nova-net usage from functional tests and enhance the neutron fixture used in those tests: https://review.opendev.org/#/c/685927/2/nova/tests/functional/test_servers.py@1264 In summary, GET /servers/detail responses for servers in a down cell may include a "security_groups" key because the API proxies that information from neutron only using the server id (the neutron security group driver finds the ports from that server id and the security groups from the ports). None of the security group information about a server, when using neutron, is cached with the server in the cell database unlike the port information (VIFs i.e. instance.info_cache.network_info). As a result, the doc is wrong for the keys it says can be returned from a GET /servers/detail response in a down cell scenario since it doesn't include 'security_groups'. The linked patch above shows that with the changed sample: https://review.opendev.org/#/c/685927/2/doc/api_samples/servers/v2.69 /servers-details-resp.json Also note that this is not the same for the GET /servers/{server_id} (show) case because that returns from the view builder here: https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L210 without including any security group information. Note that fixing the API to be consistent between show and detail would require a microversion and is likely not worth a new microversion of that, a user can get security group information from the networking API directly with something like this: GET /v2.0/ports?device_id==security_groups And from the ports response the client can get the security groups by id. This bug is just to update the down cell API guide docs. --- Release: 19.1.0.dev1588 on 2019-09-24 00:12:44 SHA: 2b15e162546ff5aa6458b2d1b2422a775e92b785 Source: https://opendev.org/openstack/nova/src/api-guide/source/down_cells.rst URL: https://docs.openstack.org/api-guide/compute/down_cells.html ** Affects: nova Importance: Medium Status: Confirmed ** Tags: api-guide cells doc -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846559 Title: Handling Down Cells in nova - security_groups can be in the response for GET /servers/detail Status in OpenStack Compute (nova): Confirmed Bug description: - [x] This doc is inaccurate in this way: This came up during a review to remove nova-net usage from functional tests and enhance the neutron fixture used in those tests: https://review.opendev.org/#/c/685927/2/nova/tests/functional/test_servers.py@1264 In summary, GET /servers/detail responses for servers in a down cell may include a "security_groups" key because the API proxies that information from neutron only using the server id (the neutron security group driver finds the ports from that server id and the security groups from the ports). None of the security group information about a server, when using neutron, is cached with the server in the cell database unlike the port information (VIFs i.e. instance.info_cache.network_info). As a result, the doc is wrong for the keys it says can be returned from a GET /servers/detail response in a down cell scenario since it doesn't include 'security_groups'. The linked patch above shows that with the changed sample: https://review.opendev.org/#/c/685927/2/doc/api_samples/servers/v2.69 /servers-details-resp.json Also note that this is not the same for the GET /servers/{server_id} (show) case because that returns from the view builder here: https://github.com/openstack/nova/blob/867401e575d2b27b9bc63ceda41cd85233545cd5/nova/api/openstack/compute/views/servers.py#L210 without including any security group information. Note that fixing the API to be consistent between show and detail would require a microversion and is likely not worth a new microversion of that, a user can get security group information from the networking API directly with something like this: GET /v2.0/ports?device_id==security_groups And from the ports response the client can get the security groups by id. This bug is just to update the down cell API guide docs. --- Release: 19.1.0.dev1588 on 2019-09-24 00:12:44 SHA: 2b15e162546ff5aa6458b2d1b2422a775e92b785 Source: https://opendev.org/openstack/nova/src/api-guide/source/down_cells.rst URL: https://docs.openstack.org/api-guide/compute/down_cells.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846559/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team
[Yahoo-eng-team] [Bug 1846532] [NEW] Confusing error message when volume create fails
Public bug reported: Method `nova.volume.cinder.API#create` accepts `size` as the 3rd args, but in wrapper of `nova.volume.cinder.translate_volume_exception`, the 3rd parameter is volume_id. If we hit cinder exception when creating volumes like the response body down below: ``` {"itemNotFound": {"message": "Volume type with name xxx could not be found.", "code": 404}} ``` we may get exception in nova compute log like this: ``` BuildAbortException: Build of instance xxx aborted: Volume 40 could not be found. ``` actually, `40` is volume size, not voluem id. This could be a little misleading. ** Affects: nova Importance: Medium Assignee: Fan Zhang (fanzhang) Status: In Progress ** Affects: nova/queens Importance: Low Status: Confirmed ** Affects: nova/rocky Importance: Low Status: Confirmed ** Affects: nova/stein Importance: Low Status: Confirmed ** Affects: nova/train Importance: Low Status: Confirmed ** Tags: serviceability volumes ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846532 Title: Confusing error message when volume create fails Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Status in OpenStack Compute (nova) train series: Confirmed Bug description: Method `nova.volume.cinder.API#create` accepts `size` as the 3rd args, but in wrapper of `nova.volume.cinder.translate_volume_exception`, the 3rd parameter is volume_id. If we hit cinder exception when creating volumes like the response body down below: ``` {"itemNotFound": {"message": "Volume type with name xxx could not be found.", "code": 404}} ``` we may get exception in nova compute log like this: ``` BuildAbortException: Build of instance xxx aborted: Volume 40 could not be found. ``` actually, `40` is volume size, not voluem id. This could be a little misleading. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846532/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1846527] [NEW] online_data_migrations docs don't mention using --config-file to run the migrations per cell db
Public bug reported: This came up in the mailing list while answering some questions about when/how various cells v2 and database related commands get run: http://lists.openstack.org/pipermail/openstack- discuss/2019-October/009937.html Recent change https://review.opendev.org/#/c/671298/ was added to the upgrade guide to mention that you can use the --config-file option with the nova-manage db sync command to migrate the cell database schema per cell database, in most cases that being cell0 and cell1. The same is true for the online_data_migrations command since that does data migrations for both the API DB and cell DB, and you would need to run it per cell DB using the --config-file option with a config file whose [database]/connection is configured for a given cell, e.g. cell0 or cell1. So I think the CLI guide should probably be updated for nova-manage and the upgrades guide like in https://review.opendev.org/#/c/671298/. For the CLI guide, it might be useful to just have a generic section about using --config-file per cell database for commands that require a cell database but don't have a kind of --all-cells option like the archive_deleted_rows and purge commands. ** Affects: nova Importance: Undecided Status: New ** Tags: cells doc nova-manage upgrade ** Tags added: upgrade -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846527 Title: online_data_migrations docs don't mention using --config-file to run the migrations per cell db Status in OpenStack Compute (nova): New Bug description: This came up in the mailing list while answering some questions about when/how various cells v2 and database related commands get run: http://lists.openstack.org/pipermail/openstack- discuss/2019-October/009937.html Recent change https://review.opendev.org/#/c/671298/ was added to the upgrade guide to mention that you can use the --config-file option with the nova-manage db sync command to migrate the cell database schema per cell database, in most cases that being cell0 and cell1. The same is true for the online_data_migrations command since that does data migrations for both the API DB and cell DB, and you would need to run it per cell DB using the --config-file option with a config file whose [database]/connection is configured for a given cell, e.g. cell0 or cell1. So I think the CLI guide should probably be updated for nova-manage and the upgrades guide like in https://review.opendev.org/#/c/671298/. For the CLI guide, it might be useful to just have a generic section about using --config-file per cell database for commands that require a cell database but don't have a kind of --all-cells option like the archive_deleted_rows and purge commands. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846527/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1846401] Re: console proxy deployment info was removed from cells v2 layout doc
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => In Progress ** Changed in: nova/train Importance: Undecided => Low ** Changed in: nova/train Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846401 Title: console proxy deployment info was removed from cells v2 layout doc Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: In Progress Bug description: The information about how console proxies need to be deployed in a multi-cell deployment was mistakenly removed in the following commit as part of nova-consoleauth service docs removal: https://github.com/openstack/nova/commit/009fd0f35bcb88acc80f12e69d5fb72c0ee5391f #diff-236824986276093f57fa8ba4d3639e68L322 We need to restore the general information for console proxies using the database for storing token authorizations. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846401/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1846262] [NEW] Failed resize claim leaves otherwise active instance in ERROR state
Public bug reported: I noticed this while working on a functional test to recreate a bug during resize reschedule: https://review.opendev.org/#/c/686017/ And discussed a bit in IRC: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2019-10-01.log.html#t2019-10-01T16:33:27 The issue is that we can start a resize (or cold migration) of a stopped or active (normally active) server and fail a resize claim in the compute service due to some race issue or for resource claims that are not handled by placement yet, like NUMA and PCI devices: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4527 That ResourceTracker.resize_claim can raise ComputeResourcesUnavailable which is handled here: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4610 We may try to reschedule but if rescheduling fails, or we don't reschedule, the instance is set to error state by this context manager: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4592 That will set the instance vm_state to error: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L8809 If we failed a resize claim, there is actually no change in the guest, same like if we failed a cold migration because the scheduler selected the same host and the virt driver does not support that, see: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4489 If _prep_resize raises InstanceFaultRollback the _error_out_instance_on_exception will handle it differently since https://review.opendev.org/#/c/633212/ and not put the instance into ERROR state but revert the vm_state to its previous value (active or stopped). If the guest is not changed I don't think the instance should be in ERROR status because of a resize claim failure, but opinions on that differ, e.g.: (11:40:45 AM) mriedem: dansmith: ok, but still, the user shouldn't have to stop and then start to get out of that, or hard reboot, when the thing that failed is a resize claim race (11:41:03 AM) dansmith: mriedem: so maybe it's just stop I'm thinking of.. anyway, I dunno.. it's very annoying as a user to do something, come back later and have it not obvious that the thing has happened, or failed or whatever (11:41:52 AM) dansmith: mriedem: if you're going to retry the operation for them, I agree. if you're not, then being super obvious about what has happened is best, IMHO If we aren't going to automatically handle the resize claim failure and not set the instance to error state, then we should at least have something in the API reference documentation about post-conditions for resize and cold migrate actions such that if the instance is in ERROR state and there is a fault for the resize claim failure, the user can stop/start or hard reboot the server to reset its status. I do think we have some precedence in handling non-error conditions like this though since https://review.opendev.org/#/c/633227/. This is latent behavior so I'm going to mark it low priority but I wanted to make sure we have a bug reported for it. ** Affects: nova Importance: Low Status: Triaged ** Tags: resize -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846262 Title: Failed resize claim leaves otherwise active instance in ERROR state Status in OpenStack Compute (nova): Triaged Bug description: I noticed this while working on a functional test to recreate a bug during resize reschedule: https://review.opendev.org/#/c/686017/ And discussed a bit in IRC: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2019-10-01.log.html#t2019-10-01T16:33:27 The issue is that we can start a resize (or cold migration) of a stopped or active (normally active) server and fail a resize claim in the compute service due to some race issue or for resource claims that are not handled by placement yet, like NUMA and PCI devices: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4527 That ResourceTracker.resize_claim can raise ComputeResourcesUnavailable which is handled here: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4610 We may try to reschedule but if rescheduling fails, or we don't reschedule, the instance is set to error state by this context manager: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L4592 That will set the instance vm_state to error: https://github.com/openstack/nova/blob/4d18b29c95e3862c68ab41a4c090eb30c32a037a/nova/compute/manager.py#L8809
[Yahoo-eng-team] [Bug 1781286] Re: CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call
Note for backports: this problem goes back to Pike but we won't be able to backport the fix since it's going to require RPC API version changes. ** No longer affects: nova/pike ** No longer affects: nova/queens ** Changed in: nova Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1781286 Title: CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call Status in OpenStack Compute (nova): Triaged Bug description: In a stable/queens devstack environment with multiple PowerVM compute nodes, everytime I see this in devstack@n-cond-cell1.service logs: Jul 11 15:48:57 myhostname nova-conductor[3796]: DEBUG nova.conductor.manager [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin admin] Rescheduling: True {{(pid=4108) build_instances /opt/stack/nova/nova/conductor/manager.py:571}} it is shortly thereafter followed by: Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin admin] Exception during message handling: CantStartEngineError: No sql_connection parameter is established Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/manager.py", line 652, in build_instances Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server host.service_host)) Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/availability_zones.py", line 95, in get_host_availability_zone Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server key='availability_zone') Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs) Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/objects/aggregate.py", line 541, in get_by_host Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server _get_by_host_from_db(context, host, key=key)] Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 987, in wrapper Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server with self._transaction_scope(context): Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self.gen.next() Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1037, in _transaction_scope Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server context=context) as resource: Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self.gen.next() Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-package
[Yahoo-eng-team] [Bug 1846045] [NEW] Docs don't mention running console proxies per cell
Public bug reported: This came up in the mailing list today: http://lists.openstack.org/pipermail/openstack- discuss/2019-September/009827.html It's not immediately obvious that console proxy services should be run per-cell rather than globally. One would expect to see something about that here: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html and/or here: https://docs.openstack.org/nova/latest/admin/remote-console-access.html or even in the cells FAQs page: https://docs.openstack.org/nova/latest/user/cells.html#faqs There was a lot of confusion over the deprecation of the nova- consoleauth service in Rocky and several release notes and workarounds for that: https://docs.openstack.org/nova/stein/configuration/config.html#workarounds.enable_consoleauth ** Affects: nova Importance: Undecided Status: New ** Tags: cells console doc -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1846045 Title: Docs don't mention running console proxies per cell Status in OpenStack Compute (nova): New Bug description: This came up in the mailing list today: http://lists.openstack.org/pipermail/openstack- discuss/2019-September/009827.html It's not immediately obvious that console proxy services should be run per-cell rather than globally. One would expect to see something about that here: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html and/or here: https://docs.openstack.org/nova/latest/admin/remote-console- access.html or even in the cells FAQs page: https://docs.openstack.org/nova/latest/user/cells.html#faqs There was a lot of confusion over the deprecation of the nova- consoleauth service in Rocky and several release notes and workarounds for that: https://docs.openstack.org/nova/stein/configuration/config.html#workarounds.enable_consoleauth To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1846045/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1845986] Re: SEV does not enable IOMMU on SCSI controller
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => In Progress ** Changed in: nova/train Assignee: (unassigned) => Boris Bobrov (bbobrov) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845986 Title: SEV does not enable IOMMU on SCSI controller Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: In Progress Bug description: https://review.opendev.org/#/c/644565/ added logic to libvirt/designer.py for enabling iommu for certain devices where virtio is used. This is required for AMD SEV[0]. However it missed the case of a SCSI controller where the model is virtio-scsi, e.g.: As with other virtio devices, here a child element needs to be added to the config when SEV is enabled: [0] http://specs.openstack.org/openstack/nova- specs/specs/train/approved/amd-sev-libvirt-support.html#proposed- change To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845986/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1845905] Re: vpmem - libvirt.libvirtError: XML error: Invalid value for element or attribute 'maxMemory'
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Assignee: (unassigned) => Dan Smith (danms) ** Changed in: nova/train Status: New => In Progress ** Changed in: nova/train Importance: Undecided => High ** Changed in: nova Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845905 Title: vpmem - libvirt.libvirtError: XML error: Invalid value for element or attribute 'maxMemory' Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: In Progress Bug description: The result of Python3 divide operation is float point. This resulted in an invalid value for 'maxMemory' entry of libvirt domain xml which expects an integer. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845905/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1845146] Re: NUMA aware live migration failed when vCPU pin set
** Also affects: nova/train Importance: High Assignee: Artom Lifshitz (notartom) Status: In Progress ** No longer affects: nova/train ** Also affects: nova/train Importance: High Assignee: Artom Lifshitz (notartom) Status: In Progress ** Changed in: nova/train Assignee: Artom Lifshitz (notartom) => Dan Smith (danms) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845146 Title: NUMA aware live migration failed when vCPU pin set Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: In Progress Bug description: Description === When vCPU pin policy is dedicated, the NUMA aware live migration may go failed. Steps to reproduce == 1. Create two flavor: 2c2g.numa; 4c.4g.numa (venv) [root@t1 ~]# openstack flavor show 2c2g.numa ++--+ | Field | Value | ++--+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 1 | | id | b4a2df98-82c5-4a53-8ba5-4372f20a98bd | | name | 2c2g.numa | | os-flavor-access:is_public | True | | properties | hw:cpu_policy='dedicated', hw:numa_cpus.0='0', hw:numa_cpus.1='1', hw:numa_mem.0='1024', hw:numa_mem.1='1024', hw:numa_nodes='2' | | ram| 2048 | | rxtx_factor| 1.0 | | swap | | | vcpus | 2 | ++--+ (venv) [root@t1 ~]# openstack flavor show 4c.4g.numa ++--+ | Field | Value | ++--+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 1 | | id | cf53f5ea-c036-4a79-8183-6a2389212d02
[Yahoo-eng-team] [Bug 1845243] Re: Nested 'path' query param in console URL breaks serialproxy
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Confirmed ** Changed in: nova/train Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845243 Title: Nested 'path' query param in console URL breaks serialproxy Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Status in OpenStack Compute (nova) train series: Confirmed Bug description: Description === Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console URLs to have the token attached as a nested query parameter inside an outer "path" query parameter, e.g. "?path=?token=***". While this was necessary for NoVNC support, it appears to have broken Ironic serial consoles, which use the nova-serialproxy service, which apparently is not aware that it needs to parse the token in this manner. It uses websockify. To test, I enabled debug mode and added some extra logging in the nova-serialproxy to prove that "token" was empty in this function: https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143 Steps to reproduce == 1. Have Ironic set up to allow web/serial consoles (https://docs.openstack.org/ironic/pike/admin/console.html). I believe this also requires having nova-serialproxy deployed. 2. Launch an Ironic instance and attempt to access the console via Horizon. Expected result === The serial console loads in the web interface; "Status: Opened" is displayed in the bottom. Console is interactive assuming the node has booted properly. Actual result = The serial console loads, but is blank; "Status: Closed" is displayed in the bottom. nova-serialproxy logs indicate the token was expired or invalid. The console never becomes interactive, but does not indicate there is an error in Horizon (at least on my deployment.) Environment === OpenStack Rocky release, deployed with Kolla-Ansible. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845243/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1841481] Re: Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache
** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1841481 Title: Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Bug description: Seen with an ironic re-balance in this job: https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check /ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/ On the subnode we see the RT detect that the node is moving hosts: Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: INFO nova.compute.resource_tracker [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42 -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-0010443319 On that new host, the ProviderTree cache is getting updated with refreshed associations for inventory: Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:761}} aggregates: Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate associations for resource provider 61dbc9c7-828b-4c42-b19c- a3716037965f, aggregates: None {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:770}} and traits - but when we get traits the provider is gone: Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: ResourceProviderTraitRetrievalFailed: Failed to get traits for resource provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager Traceback (most recent call last): Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager startup=startup) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager return f(*args, **kwargs) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in _update_available_resource Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager is_new_compute_node = self._init_compute_node(context, resources) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 561, in _init_compute_node Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager if self._check_for_nodes_rebalance(context, resources, nodename): Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 516, in _check_for_nodes_rebalance Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager self._update(context, cn) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR
[Yahoo-eng-team] [Bug 1845243] Re: Nested 'path' query param in console URL breaks serialproxy
I know tempest has a novnc console test, I wonder if the same is possible for ironic serial consoles in ironic CI testing so we could avoid these types of regressions in the future? ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Tags added: console ironic ** Changed in: nova Status: New => Confirmed ** Changed in: nova/rocky Importance: Undecided => High ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova Importance: Undecided => High ** Tags added: regression ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/stein Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845243 Title: Nested 'path' query param in console URL breaks serialproxy Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: Description === Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console URLs to have the token attached as a nested query parameter inside an outer "path" query parameter, e.g. "?path=?token=***". While this was necessary for NoVNC support, it appears to have broken Ironic serial consoles, which use the nova-serialproxy service, which apparently is not aware that it needs to parse the token in this manner. It uses websockify. To test, I enabled debug mode and added some extra logging in the nova-serialproxy to prove that "token" was empty in this function: https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143 Steps to reproduce == 1. Have Ironic set up to allow web/serial consoles (https://docs.openstack.org/ironic/pike/admin/console.html). I believe this also requires having nova-serialproxy deployed. 2. Launch an Ironic instance and attempt to access the console via Horizon. Expected result === The serial console loads in the web interface; "Status: Opened" is displayed in the bottom. Console is interactive assuming the node has booted properly. Actual result = The serial console loads, but is blank; "Status: Closed" is displayed in the bottom. nova-serialproxy logs indicate the token was expired or invalid. The console never becomes interactive, but does not indicate there is an error in Horizon (at least on my deployment.) Environment === OpenStack Rocky release, deployed with Kolla-Ansible. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845243/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1845291] Re: migration is not recheduled if the server originally booted with --availability-zone :
This goes back to Newton: https://github.com/openstack/nova/commit/76dfb4ba9fa0fed1350021591956c4e8143b1ce9 ** Changed in: nova Status: New => In Progress ** Changed in: nova Importance: Undecided => Medium ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/ocata Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845291 Title: migration is not recheduled if the server originally booted with --availability-zone : Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) ocata series: New Status in OpenStack Compute (nova) pike series: New Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Bug description: Steps to reproduce == 1) boot a server with --availability-zone : This will force nova to boot the server on the given host 2) Try to migrate the server in a situation when the first destination host of the migration selected by the scheduler will fail (e.g. move_claim fails) but there are alternate hosts that could support the migration. Expected result === Migration is re-scheduled after the first failure and can succeed on an alternate destination. Actual result = Nova does not try to re-schedule the migration after the first failure. Server goes to ERROR state. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845291/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1845148] Re: OpenStack Compute (nova) in nova
Do you have the logs? Are there specific errors in the scheduler or conductor logs about NoValidHost? You can trace a request through the logs by the request ID which is something like "req-" so trace a request and see why the scheduler is filtering out all hosts. I'm closing this as invalid since it's a support request. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845148 Title: OpenStack Compute (nova) in nova Status in OpenStack Compute (nova): Invalid Bug description: In openstack stein. I create a new instance but it give me an error. Exhausted all hosts available for retrying build failures for instance 07e367e0-0a9c-4e6e-b08c-e03cdc54cec4.]. How could I fix it? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845148/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1694844] Re: Boot from volume fails when cross_az_attach=False and volume is provided to nova without an AZ for the instance
** No longer affects: nova/ocata -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1694844 Title: Boot from volume fails when cross_az_attach=False and volume is provided to nova without an AZ for the instance Status in OpenStack Compute (nova): In Progress Bug description: This was recreated with a devstack change: http://logs.openstack.org/74/467674/4/check/gate-tempest-dsvm-neutron- full-ubuntu- xenial/3dbd6e9/logs/screen-n-api.txt.gz#_May_26_02_41_54_584798 In this failing test, Tempest creates a volume: {"volume": {"status": "creating", "user_id": "2256bb66db8741aab58a20367b00bfa2", "attachments": [], "links": [{"href": "https://10.39.38.35:8776/v2/272882ba896341d483982dbcb1fde0f4/volumes /55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "rel": "self"}, {"href": "https://10.39.38.35:8776/272882ba896341d483982dbcb1fde0f4/volumes /55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "rel": "bookmark"}], "availability_zone": "nova", "bootable": "false", "encrypted": false, "created_at": "2017-05-26T02:41:45.617286", "description": null, "updated_at": null, "volume_type": "lvmdriver-1", "name": "tempest- TestVolumeBootPattern-volume-origin-1984626538", "replication_status": null, "consistencygroup_id": null, "source_volid": null, "snapshot_id": null, "multiattach": false, "metadata": {}, "id": "55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "size": 1}} And the AZ on the volume defaults to 'nova' because that's the default AZ in cinder.conf. That volume ID is then passed to create the server: {"server": {"block_device_mapping_v2": [{"source_type": "volume", "boot_index": 0, "destination_type": "volume", "uuid": "55a7c64a- f7b2-4b77-8f60-c1ccda8e0c30", "delete_on_termination": true}], "networks": [{"uuid": "da48954d-1f66-427b-892c-a7f2eb1b54a3"}], "imageRef": "", "name": "tempest-TestVolumeBootPattern- server-1371698056", "flavorRef": "42"}} Which fails with the 400 InvalidVolume error because of this check in the API: https://github.com/openstack/nova/blob/f112dc686dadd643410575cc3487cf1632e4f689/nova/volume/cinder.py#L286 The instance is not associated with a host yet so it's not in an aggregate, and since an AZ wasn't specified when creating an instance (and I don't think we want people passing 'nova' as the AZ), it fails when comparing None to 'nova'. This is separate from bug 1497253 and change https://review.openstack.org/#/c/366724/ because in that case Nova is creating the volume during boot from volume and can specify the AZ for the volume. In this bug, the volume already exists and is provided to Nova. We might need to be able to distinguish if the API or compute service is calling check_availability_zone and if so, pass a default AZ in the case of the API if one isn't defined. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1694844/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1844929] [NEW] grenade jobs failing due to "Timed out waiting for response from cell" in scheduler
Public bug reported: Seen here: https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368 Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova- scheduler[18043]: WARNING nova.context [None req- 1929039e-1517-4326-9700-738d4b570ba6 tempest- AttachInterfacesUnderV243Test-2009753731 tempest- AttachInterfacesUnderV243Test-2009753731] Timed out waiting for response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90 Looks like something is causing timeouts reaching cell1 during grenade runs. The only errors I see in the rabbit logs are these for the uwsgi (API) servers: =ERROR REPORT 22-Sep-2019::00:35:30 === closing AMQP connection <0.1511.0> (217.182.141.188:48492 -> 217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e- f0605979ed7d): missed heartbeats from client, timeout: 60s -- It looks like we don't have mysql logs in this grenade run, maybe we need a fix like this somewhere for grenade: https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc logstash shows 1101 hits in the last 7 days, since Sept 17 actually: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d check and gate queues, all failures. It also appears to only show up on fortnebula and OVH nodes, primarily fortnebula. I wonder if there is a performing/timing issue if those nodes are slower and we aren't waiting for something during the grenade upgrade before proceeding. ** Affects: nova Importance: High Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1844929 Title: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler Status in OpenStack Compute (nova): Confirmed Bug description: Seen here: https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368 Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova- scheduler[18043]: WARNING nova.context [None req- 1929039e-1517-4326-9700-738d4b570ba6 tempest- AttachInterfacesUnderV243Test-2009753731 tempest- AttachInterfacesUnderV243Test-2009753731] Timed out waiting for response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90 Looks like something is causing timeouts reaching cell1 during grenade runs. The only errors I see in the rabbit logs are these for the uwsgi (API) servers: =ERROR REPORT 22-Sep-2019::00:35:30 === closing AMQP connection <0.1511.0> (217.182.141.188:48492 -> 217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e- f0605979ed7d): missed heartbeats from client, timeout: 60s -- It looks like we don't have mysql logs in this grenade run, maybe we need a fix like this somewhere for grenade: https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc logstash shows 1101 hits in the last 7 days, since Sept 17 actually: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d check and gate queues, all failures. It also appears to only show up on fortnebula and OVH nodes, primarily fortnebula. I wonder if there is a performing/timing issue if those nodes are slower and we aren't waiting for something during the grenade upgrade before proceeding. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1844929/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1763761] Re: CPU topologies in nova - doesn't mention numa specific image properties
** Tags added: low-hanging-fruit ** No longer affects: python-glanceclient -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1763761 Title: CPU topologies in nova - doesn't mention numa specific image properties Status in Glance: Triaged Status in OpenStack Compute (nova): Confirmed Bug description: - [x] This is a doc addition request. This doc only talks about flavor extra specs for specifying numa nodes using the "hw:numa_nodes" flavor extra spec, but it's also possible to define numa nodes using the hw_numa_nodes image property, which coincidentally is also missing from the glance image properties doc: https://docs.openstack.org/python-glanceclient/latest/cli/property- keys.html --- Release: 17.0.0.0rc2.dev694 on 2018-04-13 15:32 SHA: e93be2690754bcba4cb346d4376ce87f94f03303 Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/admin/cpu-topologies.rst URL: https://docs.openstack.org/nova/latest/admin/cpu-topologies.html To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1763761/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1636338] Re: Numa topology not calculated for instance with numa_topology after upgrading to Mitaka
Is this still a problem we need to track? Mitaka is long end of life upstream at this point so I'm not even sure this is a problem on upstream stable branches for which we could backport a fix. ** Changed in: nova Assignee: Stephen Finucane (stephenfinucane) => (unassigned) ** Changed in: nova Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1636338 Title: Numa topology not calculated for instance with numa_topology after upgrading to Mitaka Status in OpenStack Compute (nova): Won't Fix Bug description: This is related to this bug https://bugs.launchpad.net/nova/+bug/1596119 After upgrading to Mitaka with the above patch, a new bug surfaced. The bug is related to InstanceNUMACell having cpu_policy set to None. This causes cpu_pinning_requested to always return False. https://github.com/openstack/nova/blob/master/nova/objects/instance_numa_topology.py#L112 This will then trick computes with old NUMA instances into thinking that nothing is pinned, causing new instances with cpu_policy set to CPUAllocationPolicy.DEDICATED to potentially get scheduled on the same NUMA zone. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1636338/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1840424] Re: glance manpage building in rocky is broken due to missing glance-cache-manage
** Also affects: glance/rocky Importance: Undecided Status: New ** Also affects: glance/stein Importance: Undecided Assignee: Thomas Bechtold (toabctl) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1840424 Title: glance manpage building in rocky is broken due to missing glance- cache-manage Status in Glance: In Progress Status in Glance rocky series: New Status in Glance stein series: In Progress Bug description: Using the latest commit (b3ff79ffa45f2439d769006fe9eb84ccf5690759) from stable/rocky branch. When trying to build the man pages with: sphinx-build -W -b man doc/source doc/build/man I get: [snipped] looking for now-outdated files... none found pickling environment... done checking consistency... done writing... glance-api.1 { } glance-cache-cleaner.1 { } glance-cache-manage.1 { Exception occurred: File "/home/tom/devel/openstack/glance/.tox/docs/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 782, in get_doctree with open(doctree_filename, 'rb') as f: IOError: [Errno 2] No such file or directory: u'/home/tom/devel/openstack/glance/doc/build/man/.doctrees/cli/glancecachemanage.doctree' The full traceback has been saved in /tmp/sphinx-err-YA1GQ3.log, if you want to report the issue to the developers. This is because commit f126d3b8cc6ea5b8dc45bba52402cadfb4beb041 removed glancecachemanage.rst and the man page building is not tested in CI. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1840424/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1814245] Re: _disconnect_volume incorrectly called for multiattach volumes during post_live_migration
** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/pike Status: New => In Progress ** Changed in: nova/pike Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1814245 Title: _disconnect_volume incorrectly called for multiattach volumes during post_live_migration Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Bug description: Description === Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f introduced a simple check to ensure disconnect_volume is only called when detaching a multi-attach volume from the final instance using it on a given host. That change however doesn't take LM into account and more specifically the call to _disconect_volume during post_live_migration at the end of the migration from the source. At this point the original instance has already moved so the call to objects.InstanceList.get_uuids_by_host will only return one local instance that is using the volume instead of two, allowing disconnect_volume to be called. Depending on the backend being used this call can succeed removing the connection to the volume for the remaining instance or os-brick can fail in situations where it needs to flush I/O etc from the in-use connection. Steps to reproduce == * Launch two instances attached to the same multiattach volume on the same host. * LM one of these instances to another host. Expected result === No calls to disconnect_volume are made and the remaining instance on the host is still able to access the multi-attach volume. Actual result = A call to disconnect_volume is made and the remaining instance is unable to access the volume *or* the LM fails due to os-brick failures to disconnect the in-use volume on the host. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) Libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? LVM/iSCSI with multipath enabled reproduces the os-brick failure. 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == # nova show testvm2 [..] | fault| {"message": "Unexpected error while running command. | | | Command: multipath -f 360014054a424982306a4a659007f73b2 | | | Exit code: 1 | | | Stdout: u'Jan 28 16:09:29 | 360014054a424982306a4a659007f73b2: map in use\ | | | Jan 28 16:09:29 | failed to remove multipath map 360014054a424982306a4a", "code": 500, "details": " | | | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 202, in decorated_function | | | return function(self, context, *args, **kwargs) | | | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 6299, in _post_live_migration | | | migrate_data) | | | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 7744, in post_live_migration| | | self._disconnect_volume(context, connection_info, instance) | | | File \"/usr/li
[Yahoo-eng-team] [Bug 1844583] [NEW] tox -e docs fails with "WARNING: RSVG converter command 'rsvg-convert' cannot be run. Check the rsvg_converter_bin setting"
Public bug reported: Since this change: https://github.com/openstack/nova/commit/16b9486bf7e91bfd5dc48297cee9f54b49156c93 Local docs builds fail if you don't have librsvg2-bin installed for the sphinxcontrib-svg2pdfconverter dependency (I'm on Ubuntu 18.04). We should include that in bindep.txt. ** Affects: nova Importance: Low Assignee: Matt Riedemann (mriedem) Status: Confirmed ** Tags: doc ** Changed in: nova Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1844583 Title: tox -e docs fails with "WARNING: RSVG converter command 'rsvg-convert' cannot be run. Check the rsvg_converter_bin setting" Status in OpenStack Compute (nova): Confirmed Bug description: Since this change: https://github.com/openstack/nova/commit/16b9486bf7e91bfd5dc48297cee9f54b49156c93 Local docs builds fail if you don't have librsvg2-bin installed for the sphinxcontrib-svg2pdfconverter dependency (I'm on Ubuntu 18.04). We should include that in bindep.txt. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1844583/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1844510] Re: openstackNova(Rocky)-launch instacne-NeutronAdminCredentialConfigurationInvalid
Double check the configuration for the [neutron] section in nova.conf against this: https://docs.openstack.org/neutron/rocky/install/controller-install- ubuntu.html#configure-the-compute-service-to-use-the-networking-service Note that the install guide is just a reference, the actual URLs have to make sense for your deployment, e.g. I'm guessing the URL hostname for auth isn't actually "controller". It also looks like you can drop the /v3 suffix on the auth_url. ** Tags added: config neutron ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1844510 Title: openstackNova(Rocky)-launch instacne- NeutronAdminCredentialConfigurationInvalid Status in OpenStack Compute (nova): Invalid Bug description: I try to launch new instance (from horizon as 'admin'). From empty list of instance, click 'launch instance'. [root@controller1 ~]# uname -a Linux controller1 3.10.84-21.fc21.loongson.18.mips64el #1 SMP PREEMPT Tue Apr 16 18:41:34 CST 2019 mips64 mips64 mips64 GNU/Linux [root@controller1 ~]# less /var/log/nova/nova-api.log 2019-09-18 16:42:27.566 2320 ERROR nova.network.neutronv2.api [req-4ff4645f-47be-4c66-bff1-2c8dbb4cca99 5af84d7c91ce4def8dad829fdd707e00 0c71a300399e4d759ef8b9dc6b00accf - default default] Neutron client was not able to generate a valid admin token, please verify Neutron admin credential located in nova.conf: Unauthorized: 401-{u'error': {u'message': u'The request you have made requires authentication.', u'code': 401, u'title': u'Unauthorized'}} 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi [req-4ff4645f-47be-4c66-bff1-2c8dbb4cca99 5af84d7c91ce4def8dad829fdd707e00 0c71a300399e4d759ef8b9dc6b00accf - default default] Unexpected exception in API method: NeutronAdminCre dentialConfigurationInvalid: Networking client is experiencing an unauthorized exception. 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 801, in wrapped 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-09-18 16:42:27.568 2320 ERROR nova.api.openstack.wsgi File
[Yahoo-eng-team] [Bug 1763043] Re: Unnecessary "Instance not resizing, skipping migration" warning in n-cpu logs during live migration
This is no longer valid on master (Train) due to this change: https://review.opendev.org/#/c/634606/86/nova/compute/resource_tracker.py I'm not sure it's worth trying to do a stable-only change to avoid the warning messages during live migration at this point since they have been around for years. ** Changed in: nova Status: In Progress => Invalid ** Changed in: nova Assignee: Matt Riedemann (mriedem) => (unassigned) ** No longer affects: nova/queens ** No longer affects: nova/rocky ** Changed in: nova Status: Invalid => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1763043 Title: Unnecessary "Instance not resizing, skipping migration" warning in n-cpu logs during live migration Status in OpenStack Compute (nova): Fix Released Bug description: In a 7 day CI run, we have over 40K hits of this warning in the logs: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Instance%20not%20resizing%2C%20skipping%20migration%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d http://logs.openstack.org/54/507854/4/gate/legacy-tempest-dsvm- multinode-live- migration/d723002/logs/subnode-2/screen-n-cpu.txt#_Apr_11_13_54_16_225676 Apr 11 13:54:16.225676 ubuntu-xenial-rax-dfw-0003443206 nova- compute[29642]: WARNING nova.compute.resource_tracker [None req- 61a6f9c9-3355-4594-acfa-ebf31ba995aa tempest- LiveMigrationTest-1725408283 tempest-LiveMigrationTest-1725408283] [instance: 6f4923e3-bf1f-4cb7-bd37-00e5d437759e] Instance not resizing, skipping migration. That warning was written back in 2012 when resize support was added to the resource tracker: https://review.openstack.org/#/c/15799/ And since https://review.openstack.org/#/c/226411/ in 2015 it doesn't apply to evacuations. We shouldn't see a warning in the nova-compute logs during a normal operation like a live migration, so we really should either just drop this down to debug or remove it completely. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1763043/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1823215] Re: TestInstanceNotificationSampleWithMultipleComputeOldAttachFlow._test_live_migration_force_complete intermittent fails with MismatchError: 6 != 7
*** This bug is a duplicate of bug 1843615 *** https://bugs.launchpad.net/bugs/1843615 This was fixed with https://review.opendev.org/#/c/681540/ since I didn't remember we already had a bug for this. ** This bug has been marked a duplicate of bug 1843615 TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1823215 Title: TestInstanceNotificationSampleWithMultipleComputeOldAttachFlow._test_live_migration_force_complete intermittent fails with MismatchError: 6 != 7 Status in OpenStack Compute (nova): Confirmed Bug description: Seen here: http://logs.openstack.org/47/638047/9/check/nova-tox- functional/71f64ae/job-output.txt.gz#_2019-04-02_00_07_32_290065 2019-04-02 00:07:32.290065 | ubuntu-bionic | {2} nova.tests.functional.notification_sample_tests.test_instance.TestInstanceNotificationSampleWithMultipleComputeOldAttachFlow.test_multiple_compute_actions [14.302238s] ... FAILED 2019-04-02 00:07:32.290219 | ubuntu-bionic | 2019-04-02 00:07:32.290275 | ubuntu-bionic | Captured traceback: 2019-04-02 00:07:32.290318 | ubuntu-bionic | ~~~ 2019-04-02 00:07:32.290378 | ubuntu-bionic | Traceback (most recent call last): 2019-04-02 00:07:32.290525 | ubuntu-bionic | File "nova/tests/functional/notification_sample_tests/test_instance.py", line 68, in test_multiple_compute_actions 2019-04-02 00:07:32.290569 | ubuntu-bionic | action(server) 2019-04-02 00:07:32.290726 | ubuntu-bionic | File "nova/tests/functional/notification_sample_tests/test_instance.py", line 311, in _test_live_migration_force_complete 2019-04-02 00:07:32.290822 | ubuntu-bionic | self.assertEqual(6, len(fake_notifier.VERSIONED_NOTIFICATIONS)) 2019-04-02 00:07:32.291011 | ubuntu-bionic | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual 2019-04-02 00:07:32.291148 | ubuntu-bionic | self.assertThat(observed, matcher, message) 2019-04-02 00:07:32.291351 | ubuntu-bionic | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat 2019-04-02 00:07:32.291402 | ubuntu-bionic | raise mismatch_error 2019-04-02 00:07:32.291475 | ubuntu-bionic | testtools.matchers._impl.MismatchError: 6 != 7 2019-04-02 00:07:32.291497 | ubuntu-bionic | 2019-04-02 00:07:32.291515 | ubuntu-bionic | 2019-04-02 00:07:32.291558 | ubuntu-bionic | Captured pythonlogging: 2019-04-02 00:07:32.291602 | ubuntu-bionic | ~~~ 2019-04-02 00:07:32.291737 | ubuntu-bionic | 2019-04-02 00:07:19,024 WARNING [placement.db_api] TransactionFactory already started, not reconfiguring. 2019-04-02 00:07:32.291908 | ubuntu-bionic | 2019-04-02 00:07:19,053 INFO [nova.service] Starting conductor node (version 19.1.0) 2019-04-02 00:07:32.292181 | ubuntu-bionic | 2019-04-02 00:07:19,073 INFO [nova.service] Starting scheduler node (version 19.1.0) 2019-04-02 00:07:32.292326 | ubuntu-bionic | 2019-04-02 00:07:19,089 INFO [nova.network.driver] Loading network driver 'nova.network.linux_net' 2019-04-02 00:07:32.292438 | ubuntu-bionic | 2019-04-02 00:07:19,090 INFO [nova.service] Starting network node (version 19.1.0) 2019-04-02 00:07:32.292606 | ubuntu-bionic | 2019-04-02 00:07:19,118 INFO [nova.virt.driver] Loading compute driver 'fake.FakeLiveMigrateDriver' 2019-04-02 00:07:32.292820 | ubuntu-bionic | 2019-04-02 00:07:19,118 WARNING [nova.compute.monitors] Excluding nova.compute.monitors.cpu monitor virt_driver. Not in the list of enabled monitors (CONF.compute_monitors). 2019-04-02 00:07:32.292945 | ubuntu-bionic | 2019-04-02 00:07:19,119 INFO [nova.service] Starting compute node (version 19.1.0) 2019-04-02 00:07:32.293174 | ubuntu-bionic | 2019-04-02 00:07:19,141 WARNING [nova.compute.manager] No compute node record found for host compute. If this is the first time this service is starting on this host, then you can ignore this warning. 2019-04-02 00:07:32.293304 | ubuntu-bionic | 2019-04-02 00:07:19,144 WARNING [nova.compute.resource_tracker] No compute node record for compute:fake-mini 2019-04-02 00:07:32.293484 | ubuntu-bionic | 2019-04-02 00:07:19,148 INFO [nova.compute.resource_tracker] Compute node record created for compute:fake-mini with uuid: 109a2d73-cdf9-4d76-8e6e-74dc79ff7359 2019-04-02 00:07:32.293687 | ubuntu-bionic | 2019-04-02 00:07:19,187 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers?in_tree=109a2d73-cdf9-4d76-8e6e-74dc79ff7359" status: 200 len: 26 microversion: 1.14
[Yahoo-eng-team] [Bug 1843615] Re: TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019
** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/stein Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1843615 Title: TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019 Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) stein series: Confirmed Bug description: Seen here: https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_c4c/671072/18/gate /nova-tox-functional/c4ca604/job-output.txt 2019-09-11 16:01:31.460243 | ubuntu-bionic | {3} nova.tests.functional.notification_sample_tests.test_instance.TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions [15.126947s] ... FAILED 2019-09-11 16:01:31.460323 | ubuntu-bionic | 2019-09-11 16:01:31.460383 | ubuntu-bionic | Captured traceback: 2019-09-11 16:01:31.460442 | ubuntu-bionic | ~~~ 2019-09-11 16:01:31.460525 | ubuntu-bionic | Traceback (most recent call last): 2019-09-11 16:01:31.460714 | ubuntu-bionic | File "nova/tests/functional/notification_sample_tests/test_instance.py", line 61, in test_multiple_compute_actions 2019-09-11 16:01:31.460775 | ubuntu-bionic | action(server) 2019-09-11 16:01:31.460975 | ubuntu-bionic | File "nova/tests/functional/notification_sample_tests/test_instance.py", line 306, in _test_live_migration_force_complete 2019-09-11 16:01:31.461065 | ubuntu-bionic | fake_notifier.VERSIONED_NOTIFICATIONS) 2019-09-11 16:01:31.461297 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual 2019-09-11 16:01:31.461394 | ubuntu-bionic | self.assertThat(observed, matcher, message) 2019-09-11 16:01:31.461628 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat 2019-09-11 16:01:31.461695 | ubuntu-bionic | raise mismatch_error 2019-09-11 16:01:31.484778 | ubuntu-bionic | testtools.matchers._impl.MismatchError: 6 != 7: [{'priority': 'INFO', 'payload': {'nova_object.namespace': 'nova', 'nova_object.name': 'RequestSpecPayload', 'nova_object.version': '1.1', 'nova_object.data': {'flavor': {'nova_object.namespace': 'nova', 'nova_object.name': 'FlavorPayload', 'nova_object.version': '1.4', 'nova_object.data': {'flavorid': u'a22d5517-147c-4147-a0d1-e698df5cd4e3', 'is_public': True, 'ephemeral_gb': 0, 'vcpus': 1, 'root_gb': 1, 'disabled': False, 'description': None, 'projects': None, 'vcpu_weight': 0, 'memory_mb': 512, 'name': u'test_flavor', 'rxtx_factor': 1.0, 'extra_specs': {'trait:COMPUTE_STATUS_DISABLED': u'forbidden', u'hw:watchdog_action': u'disabled'}, 'swap': 0}}, 'image': {'nova_object.namespace': 'nova', 'nova_object.name': 'ImageMetaPayload', 'nova_object.version': '1.0', 'nova_object.data': {'direct_url': None, 'container_format': u'raw', 'visibility': u'public', 'size': 25165824, 'disk_format': u'raw', 'virtual_size': None, 'protected': False, 'status': u'active', 'updated_at': '2011-01-01T01:02:03Z', 'tags': [u'tag1', u'tag2'], 'name': u'fakeimage123456', 'created_at': '2011-01-01T01:02:03Z', 'min_disk': 0, 'checksum': None, 'owner': None, 'id': u'155d900f-4e14-4e4c-a73d-069cbf4541e6', 'properties': {'nova_object.namespace': 'nova', 'nova_object.name': 'ImageMetaPropsPayload', 'nova_object.version': '1.1', 'nova_object.data': {'hw_architecture': u'x86_64'}}, 'min_ram': 0}}, 'requested_destination': {'nova_object.namespace': 'nova', 'nova_object.name': 'DestinationPayload', 'nova_object.version': '1.0', 'nova_object.data': {'host': u'host2', 'aggregates': None, 'node': u'host2', 'cell': {'nova_object.namespace': 'nova', 'nova_object.name': 'CellMappingPayload', 'nova_object.version': '2.0', 'nova_object.data': {'disabled': False, 'uuid': u'49bb4305-6acb-4b60-abff-382e2e85108a', 'name': u'cell1', 'security_groups': [u'default'], 'scheduler_hints': {}, 'project_id': u'6f70656e737461636b20342065766572', 'retry': None, 'num_instances': 1, 'instance_group': None, 'force_nodes': None, 'ignore_hosts': [u'compute'], 'force_hosts': None, 'numa_topology': None, 'instance_uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9', 'availability_zone': None, 'user_id': u'fake', 'pci_requests': {'nova_object.namespace': 'nova', 'nova_object.name': 'InstancePCIRequestsPayload', 'nova_object.version': '1.0', 'nova_object.data': {'requests': [], 'instance_uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9', 'publisher_id': u'nova-scheduler:host2',
[Yahoo-eng-team] [Bug 1843615] [NEW] TestInstanceNotificationSampleWithMultipleCompute.test_multiple_compute_actions intermittently failing since Sept 10, 2019
r', 'launched_at': '2012-10-29T13:42:11Z', 'state': u'active', 'action_initiator_project': u'6f70656e737461636b20342065766572', 'architecture': u'x86_64', 'deleted_at': None, 'host': u'compute', 'availability_zone': u'nova', 'locked': False, 'ip_addresses': [{'nova_object.namespace': 'nova', 'nova_object.name': 'IpPayload', 'nova_object.version': '1.0', 'nova_object.data': {'label': u'private-network', 'meta': {}, 'address': '192.168.1.3', 'device_name': u'tapce531f90-19', 'mac': u'fa:16:3e:4c:2c:30', 'version': 4, 'port_uuid': u'ce531f90-199f-48c0-816c-13e38010b442'}}], 'auto_disk_config': u'MANUAL', 'block_devices': [{'nova_object.namespace': 'nova', 'nova_object.name': 'BlockDevicePayload', 'nova_object.version': '1.0', 'nova_object.data': {'boot_index': None, 'device_name': u'/dev/sdb', 'delete_on_termination': False, 'volume_id': u'a07f71dc-8151-4e7d-a0cc-cd24a3f3', 'tag': None}}], 'node': u'fake-mini', 'request_id': u'req-5b6c791d-5709-4f36-8fbe-c3e02869e35d', 'locked_reason': None, 'tenant_id': u'6f70656e737461636b20342065766572', 'metadata': {}, 'task_state': u'migrating', 'terminated_at': None, 'image_uuid': u'155d900f-4e14-4e4c-a73d-069cbf4541e6', 'display_name': u'some-server', 'updated_at': '2012-10-29T13:42:11Z', 'power_state': u'running', 'user_id': u'fake', 'uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9'}}, 'publisher_id': u'nova-compute:compute', 'event_type': u'instance.live_migration_force_complete.end'}, {'priority': 'INFO', 'payload': {'nova_object.namespace': 'nova', 'nova_object.name': 'InstanceActionPayload', 'nova_object.version': '1.8', 'nova_object.data': {'os_type': None, 'flavor': {'nova_object.namespace': 'nova', 'nova_object.name': 'FlavorPayload', 'nova_object.version': '1.4', 'nova_object.data': {'flavorid': u'a22d5517-147c-4147-a0d1-e698df5cd4e3', 'is_public': True, 'ephemeral_gb': 0, 'vcpus': 1, 'root_gb': 1, 'disabled': False, 'description': None, 'projects': None, 'vcpu_weight': 0, 'memory_mb': 512, 'name': u'test_flavor', 'rxtx_factor': 1.0, 'extra_specs': {u'hw:watchdog_action': u'disabled'}, 'swap': 0}}, 'display_description': u'some-server', 'action_initiator_user': u'admin', 'kernel_id': u'', 'host_name': u'some-server', 'created_at': '2012-10-29T13:42:11Z', 'ramdisk_id': u'', 'key_name': u'my-key', 'fault': None, 'progress': 0, 'reservation_id': u'r-7gm79j0r', 'launched_at': '2012-10-29T13:42:11Z', 'state': u'active', 'action_initiator_project': u'6f70656e737461636b20342065766572', 'architecture': u'x86_64', 'deleted_at': None, 'host': u'compute', 'availability_zone': u'nova', 'locked': False, 'ip_addresses': [{'nova_object.namespace': 'nova', 'nova_object.name': 'IpPayload', 'nova_object.version': '1.0', 'nova_object.data': {'label': u'private-network', 'meta': {}, 'address': '192.168.1.3', 'device_name': u'tapce531f90-19', 'mac': u'fa:16:3e:4c:2c:30', 'version': 4, 'port_uuid': u'ce531f90-199f-48c0-816c-13e38010b442'}}], 'auto_disk_config': u'MANUAL', 'block_devices': [{'nova_object.namespace': 'nova', 'nova_object.name': 'BlockDevicePayload', 'nova_object.version': '1.0', 'nova_object.data': {'boot_index': None, 'device_name': u'/dev/sdb', 'delete_on_termination': False, 'volume_id': u'a07f71dc-8151-4e7d-a0cc-cd24a3f3', 'tag': None}}], 'node': u'fake-mini', 'request_id': u'req-5b6c791d-5709-4f36-8fbe-c3e02869e35d', 'locked_reason': None, 'tenant_id': u'6f70656e737461636b20342065766572', 'metadata': {}, 'task_state': u'migrating', 'terminated_at': None, 'image_uuid': u'155d900f-4e14-4e4c-a73d-069cbf4541e6', 'display_name': u'some-server', 'updated_at': '2012-10-29T13:42:11Z', 'power_state': u'running', 'user_id': u'fake', 'uuid': u'8d65a36d-36e8-4994-9bdd-89a455166ab9'}}, 'publisher_id': u'nova-compute:compute', 'event_type': u'instance.live_migration_post.start'}] The test code is expecting 6 notifications but got 7: self._wait_for_notification( 'instance.live_migration_force_complete.end') # 0. scheduler.select_destinations.start # 1. scheduler.select_destinations.end # 2. instance.live_migration_pre.start # 3. instance.live_migration_pre.end # 4. instance.live_migration_force_complete.start # 5. instance.live_migration_force_complete.end self.assertEqual(6, len(fake_notifier.VERSIONED_NOTIFICATIONS), fake_notifier.VERSIONED_NOTIFICATIONS) The 7th is instance.live_migration_post.start: http://paste.openstack.org/show/775148/ so it appears something has changed when that is sent or we're losing a race with when force complete is triggered? Meaning maybe we don't catch the force complete in time before post live migration starts. ** Affects: nova Importance: High Assignee: Matt Riedemann (mriedem) Status: Confirmed ** Changed in: nova Importance: Undecided => High ** Changed in: nova Status: New => Confirmed -- You received this bug notification because you are a member o
[Yahoo-eng-team] [Bug 1843098] [NEW] Compute API in nova - host_numa_node field in server topology API is wrong
Public bug reported: - [x] This doc is inaccurate in this way: https://docs.openstack.org/api-ref/compute/?expanded=show-server- topology-detail#id401 There is no 'host_numa_node' parameter in the response, it's called 'host_node'. --- Release: on 2019-08-06 17:29:30 SHA: 3882cc5bb6c74b1df60475b6b7ec907d6ddf54f5 Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst URL: https://docs.openstack.org/api-ref/compute/ ** Affects: nova Importance: High Status: Confirmed ** Tags: api-ref ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => Medium ** Changed in: nova Importance: Medium => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1843098 Title: Compute API in nova - host_numa_node field in server topology API is wrong Status in OpenStack Compute (nova): Confirmed Bug description: - [x] This doc is inaccurate in this way: https://docs.openstack.org/api-ref/compute/?expanded=show-server- topology-detail#id401 There is no 'host_numa_node' parameter in the response, it's called 'host_node'. --- Release: on 2019-08-06 17:29:30 SHA: 3882cc5bb6c74b1df60475b6b7ec907d6ddf54f5 Source: https://opendev.org/openstack/nova/src/api-ref/source/index.rst URL: https://docs.openstack.org/api-ref/compute/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1843098/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1843090] [NEW] ComputeTaskManager._cold_migrate could get a legacy request spec dict from stein computes if rpc pinned and not convert it properly
Public bug reported: As of this change in Stein https://review.opendev.org/#/c/582417/ the compute service will pass a request spec back to conductor when rescheduling during a resize or cold migration. If the compute RPC API version is pinned below 5.1, however, that request spec will be a legacy dict rather than a full RequestSpec object so the code here: https://github.com/openstack/nova/blob/19.0.0/nova/conductor/manager.py#L302-L321 Needs to account for the legacy dict case. ** Affects: nova Importance: Low Assignee: Matt Riedemann (mriedem) Status: In Progress ** Affects: nova/stein Importance: Low Status: Triaged ** Tags: conductor upgrade ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/stein Status: New => Triaged ** Changed in: nova/stein Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1843090 Title: ComputeTaskManager._cold_migrate could get a legacy request spec dict from stein computes if rpc pinned and not convert it properly Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) stein series: Triaged Bug description: As of this change in Stein https://review.opendev.org/#/c/582417/ the compute service will pass a request spec back to conductor when rescheduling during a resize or cold migration. If the compute RPC API version is pinned below 5.1, however, that request spec will be a legacy dict rather than a full RequestSpec object so the code here: https://github.com/openstack/nova/blob/19.0.0/nova/conductor/manager.py#L302-L321 Needs to account for the legacy dict case. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1843090/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1843058] [NEW] libvirt live migration fails intermittently in grenade live migration job with "error while loading state for instance 0x0 of device 'kvm-tpr-opt'"
Public bug reported: This may be related to bug 1838309 but I'm not sure so I'm reporting it separately so we can track it in elastic-recheck. This is the traceback in the nova-compute logs: Sep 06 01:28:11.837685 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: DEBUG nova.virt.libvirt.driver [None req-e6bcaa2e-aa66-4107-b0c6-9b3976d45c76 None None] [instance: 64689c1f-27b6-4889-8206-3bc458427197] Migration operation thread notification {{(pid=3855) thread_finished /opt/stack/old/nova/nova/virt/libvirt/driver.py:8039}} Sep 06 01:28:11.838031 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: Traceback (most recent call last): Sep 06 01:28:11.838031 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 460, in fire_timers Sep 06 01:28:11.838282 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: timer() Sep 06 01:28:11.838282 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 59, in __call__ Sep 06 01:28:11.838561 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: cb(*args, **kw) Sep 06 01:28:11.838561 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 175, in _do_send Sep 06 01:28:11.838774 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: waiter.switch(result) Sep 06 01:28:11.838774 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 219, in main Sep 06 01:28:11.839008 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: result = function(*args, **kwargs) Sep 06 01:28:11.839008 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/utils.py", line 800, in context_wrapper Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: return func(*args, **kwargs) Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7711, in _live_migration_operation Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: LOG.error("Live Migration failure: %s", e, instance=instance) Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: self.force_reraise() Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: six.reraise(self.type_, self.value, self.tb) Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7704, in _live_migration_operation Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: bandwidth=CONF.libvirt.live_migration_bandwidth) Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/virt/libvirt/guest.py", line 682, in migrate Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: destination, params=params, flags=flags) Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in doit Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: result = proxy_call(self._autowrap, f, *args, **kwargs) Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in proxy_call Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: rv = execute(f, *args, **kwargs) Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in execute Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: six.reraise(c, e, tb) Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker Sep 06 01:28:11.841508 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: rv = meth(*args, **kwargs) Sep 06 01:28:11.841508 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1745, in migrateToURI3 Sep 06 01:28:11.843021 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) Sep 06 01:28:11.843239 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]:
[Yahoo-eng-team] [Bug 1842985] [NEW] Testing Zero Downtime Upgrade Process in nova - broken reference link
Public bug reported: - [x] This doc is inaccurate in this way: The reference link here is broken: https://docs.openstack.org/nova/latest/contributor/testing/zero- downtime-upgrade.html#zero-downtime-upgrade-process --- Release: on 2017-09-06 22:01:01 SHA: 4476e6218499bf1ae757973b500acfa59a5a9cbe Source: https://opendev.org/openstack/nova/src/doc/source/contributor/testing/zero-downtime-upgrade.rst URL: https://docs.openstack.org/nova/latest/contributor/testing/zero-downtime-upgrade.html ** Affects: nova Importance: Low Status: Confirmed ** Tags: doc low-hanging-fruit -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1842985 Title: Testing Zero Downtime Upgrade Process in nova - broken reference link Status in OpenStack Compute (nova): Confirmed Bug description: - [x] This doc is inaccurate in this way: The reference link here is broken: https://docs.openstack.org/nova/latest/contributor/testing/zero- downtime-upgrade.html#zero-downtime-upgrade-process --- Release: on 2017-09-06 22:01:01 SHA: 4476e6218499bf1ae757973b500acfa59a5a9cbe Source: https://opendev.org/openstack/nova/src/doc/source/contributor/testing/zero-downtime-upgrade.rst URL: https://docs.openstack.org/nova/latest/contributor/testing/zero-downtime-upgrade.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1842985/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838666] Re: lxml 4.4.0 causes failed tests in nova
** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova Importance: Undecided => Medium ** Changed in: nova/stein Importance: Undecided => Medium ** Changed in: nova/stein Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838666 Title: lxml 4.4.0 causes failed tests in nova Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Confirmed Bug description: It looks like it's just a ordering issue for the elements that are returned. See https://review.opendev.org/673848 for details on the failure (you can depend on it for testing fixes as well). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1842087] [NEW] _check_can_migrate_pci in the LiveMigrationTask has host agnostic validation that is redundant/expensive
Public bug reported: This PCI validation code in the live migration task in conductor is run per possible dest host for the migration: https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L212-L228 But is host agnostic, meaning if I have 100 possible dest hosts for the live migration and an instance with a flavor-defined pci request, it's going to fail that validation the same way 100 times. That validation should be pulled up to a point before we even start asking the scheduler for hosts, e.g. like the numa live migration support: https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L85 ** Affects: nova Importance: Low Status: Triaged ** Tags: conductor live-migration -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1842087 Title: _check_can_migrate_pci in the LiveMigrationTask has host agnostic validation that is redundant/expensive Status in OpenStack Compute (nova): Triaged Bug description: This PCI validation code in the live migration task in conductor is run per possible dest host for the migration: https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L212-L228 But is host agnostic, meaning if I have 100 possible dest hosts for the live migration and an instance with a flavor-defined pci request, it's going to fail that validation the same way 100 times. That validation should be pulled up to a point before we even start asking the scheduler for hosts, e.g. like the numa live migration support: https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L85 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1842087/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1842081] [NEW] Error during ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady: Virt driver is not ready. (ironic)
Public bug reported: Seeing this on start of nova-compute with ironic when ironic-api isn't yet available: Aug 24 01:06:39.710754 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR nova.virt.ironic.driver [None req- 9542c6c8-a038-45f5-bd18-e18f83c17755 None None] An unknown error has occurred when trying to get the list of nodes from the Ironic inventory. Error: StrictVersion instance has no attribute 'version' Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task [None req- 9542c6c8-a038-45f5-bd18-e18f83c17755 None None] Error during ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady: Virt driver is not ready. Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task Traceback (most recent call last): Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task File "/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py", line 222, in run_periodic_tasks Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task task(self, context) Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task File "/opt/stack/nova/nova/compute/manager.py", line 8369, in _cleanup_running_deleted_instances Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task for instance in self._running_deleted_instances(context): Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task File "/opt/stack/nova/nova/compute/manager.py", line 8423, in _running_deleted_instances Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task instances = self._get_instances_on_driver(context, filters) Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task File "/opt/stack/nova/nova/compute/manager.py", line 634, in _get_instances_on_driver Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task driver_uuids = self.driver.list_instance_uuids() Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task File "/opt/stack/nova/nova/virt/ironic/driver.py", line 685, in list_instance_uuids Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task fields=['instance_uuid'], limit=0) Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task File "/opt/stack/nova/nova/virt/ironic/driver.py", line 656, in _get_node_list Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task raise exception.VirtDriverNotReady() Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task VirtDriverNotReady: Virt driver is not ready. Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task Looks like this is due to https://review.opendev.org/#/c/657132/ in Train where the _cleanup_running_deleted_instances periodic task runs immediately on startup of the nova-compute service which could be before the hypervisor (in this case ironic) is ready. This doesn't really break anything, but it's an ugly traceback in the logs that could be avoided. We should handle the VirtDriverNotReady error and return from the periodic. ** Affects: nova Importance: Low Assignee: Matt Riedemann (mriedem) Status: Triaged ** Tags: compute ironic -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1842081 Title: Error during ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady: Virt driver is not ready. (ironic) Status in OpenStack Compute (nova): Triaged Bug description: Seeing this on start of nova-compute with ironic when ironic-api isn't yet available: Aug 24 01:06:39.710754 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR nova.virt.ironic.driver [None req- 9542c6c8-a038-45f5-bd18-e18f83c17755 None None] An unknown error has occurred when trying to get the list of nodes from the Ironic inventory. Error: StrictVersion instance has no attribute 'version' Aug 24 01:06:39.711672 ubuntu-bionic-rax-iad-0010410623 nova- compute[7945]: ERROR oslo_service.periodic_task [None req- 9542c6c8-a038-45f5-bd18-e18f83c17755 None None] Error during ComputeManager._cleanup_running_deleted_instances: VirtDriverNotReady: Virt driver is not ready
[Yahoo-eng-team] [Bug 1842061] [NEW] Compute schedulers in nova - AggregateInstanceExtraSpecsFilter docs are not clear
Public bug reported: - [x] This is a doc addition request. The description for the AggregateInstanceExtraSpecsFilter filter is not clear: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregateinstanceextraspecsfilter (note it's also described here: https://docs.openstack.org/nova/latest/user/filter-scheduler.html) It's not clear what aggregate_instance_extra_specs is used for. Note that further down in the document there are some examples: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#example-specify-compute- hosts-with-ssds So I guess based on that, it means you would just add metadata to a host aggregate like foo=bar and then tie a flavor to that by setting an extra spec of aggregate_instance_extra_specs:foo=bar on the flavor. But what about other standard extra specs like hide_hypervisor_id, you can't put the aggregate_instance_extra_specs prefix on that in the flavor since it would break the extra spec for the actual code that checks for that standard extra spec. Does that mean the flavor has to have both the scoped and unscoped spec? Or that the filter will handle the unscoped spec? It would be nice to have the documentation on the filter itself explain this and give examples of how to use it, for both a standard and custom flavor extra spec (note the latter has an example linked above for the ssd example). This originally came up while triaging bug 1841932 and trying to make sense of the filter (it's not very clear even by looking at the code). --- Release: on 2019-08-22 20:13:47 SHA: 0882ea69ea0c46cf97ecd5a1ec49a3007f293c28 Source: https://opendev.org/openstack/nova/src/doc/source/admin/configuration/schedulers.rst URL: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html ** Affects: nova Importance: Undecided Status: New ** Tags: doc -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1842061 Title: Compute schedulers in nova - AggregateInstanceExtraSpecsFilter docs are not clear Status in OpenStack Compute (nova): New Bug description: - [x] This is a doc addition request. The description for the AggregateInstanceExtraSpecsFilter filter is not clear: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregateinstanceextraspecsfilter (note it's also described here: https://docs.openstack.org/nova/latest/user/filter-scheduler.html) It's not clear what aggregate_instance_extra_specs is used for. Note that further down in the document there are some examples: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#example-specify-compute- hosts-with-ssds So I guess based on that, it means you would just add metadata to a host aggregate like foo=bar and then tie a flavor to that by setting an extra spec of aggregate_instance_extra_specs:foo=bar on the flavor. But what about other standard extra specs like hide_hypervisor_id, you can't put the aggregate_instance_extra_specs prefix on that in the flavor since it would break the extra spec for the actual code that checks for that standard extra spec. Does that mean the flavor has to have both the scoped and unscoped spec? Or that the filter will handle the unscoped spec? It would be nice to have the documentation on the filter itself explain this and give examples of how to use it, for both a standard and custom flavor extra spec (note the latter has an example linked above for the ssd example). This originally came up while triaging bug 1841932 and trying to make sense of the filter (it's not very clear even by looking at the code). --- Release: on 2019-08-22 20:13:47 SHA: 0882ea69ea0c46cf97ecd5a1ec49a3007f293c28 Source: https://opendev.org/openstack/nova/src/doc/source/admin/configuration/schedulers.rst URL: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1842061/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1512645] Re: Security groups incorrectly applied on new additional interfaces
** Changed in: nova Status: New => Opinion ** Changed in: nova Importance: Undecided => Wishlist -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1512645 Title: Security groups incorrectly applied on new additional interfaces Status in neutron: Invalid Status in OpenStack Compute (nova): Opinion Bug description: When launching an instance with one network interface and enabling 2 security groups everything is working as it supposed to be. But when attaching additional network interfaces only the default security group is applied to those new interfaces. The additional security group isn't enabled at all on those extra interfaces. We had to dig into the iptables chains to discover this behavior. Once adding the rule manually or adding them to the default security group everything is working fine. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1512645/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1841411] Re: Instances recovered after failed migrations enter error state (hyper-v)
** Summary changed: - Instances recovered after failed migrations enter error state + Instances recovered after failed migrations enter error state (hyper-v) ** Tags added: live-migration ** Changed in: nova Importance: Undecided => Medium ** Also affects: nova/ocata Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** No longer affects: nova/ocata ** No longer affects: nova/pike -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1841411 Title: Instances recovered after failed migrations enter error state (hyper-v) Status in compute-hyperv: In Progress Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Bug description: Most users expect that if a live migration fails but the instance is fully recovered, it shouldn't enter 'error' state. Setting the migration status to 'error' should be enough. This simplifies debugging, making it clear that the instance dosn't have to be manually recovered. This patch changed this behavior, indirectly affecting the Hyper-V driver, which propagates migration errors: Idfdce9e7dd8106af01db0358ada15737cb846395 When using the Hyper-V driver, instances enter error state even after successful recoveries. We may copy the Libvirt driver behavior and avoid propagating exceptions in this case. To manage notifications about this bug go to: https://bugs.launchpad.net/compute-hyperv/+bug/1841411/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1841667] Re: failing libvirt tests: need ordering
*** This bug is a duplicate of bug 1838666 *** https://bugs.launchpad.net/bugs/1838666 The actual version of libvirt on the system shouldn't matter, these tests should not be running against a real libvirt, everything should be faked out. My guess is the tests are using unordered dicts and that's why the keys are in a different order, or something with the way the xml comparison code is asserting the attributes. ** Tags added: libvirt testing ** Also affects: nova/stein Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1841667 Title: failing libvirt tests: need ordering Status in OpenStack Compute (nova): New Status in OpenStack Compute (nova) stein series: New Bug description: When rebuilding Nova from Stein in Debian Sid, I get 3 unit test errors, probably due to a more recent libvirt (ie: 5.6.0). See for example, on this first one: we get bus= and dev= inverted. == FAIL: nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_get_disk_xml nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_get_disk_xml -- _StringException: pythonlogging:'': {{{2019-08-27 20:26:05,026 WARNING [os_brick.initiator.connectors.remotefs] Connection details not present. RemoteFsClient may not initialize properly.}}} Traceback (most recent call last): File "/<>/nova/tests/unit/virt/libvirt/test_driver.py", line 20926, in test_get_disk_xml self.assertEqual(diska_xml.strip(), actual_diska_xml.strip()) File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 411, in assertEqual self.assertThat(observed, matcher, message) File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 498, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: !=: reference = '''\ 0e38683e-f0af-418f-a3f1-6b67ea0f919d ''' actual= '''\ 0e38683e-f0af-418f-a3f1-6b67ea0f919d ''' == FAIL: nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_detach_volume_with_vir_domain_affect_live_flag nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_detach_volume_with_vir_domain_affect_live_flag -- _StringException: pythonlogging:'': {{{2019-08-27 20:26:31,189 WARNING [os_brick.initiator.connectors.remotefs] Connection details not present. RemoteFsClient may not initialize properly.}}} Traceback (most recent call last): File "/usr/lib/python3/dist-packages/mock/mock.py", line 1330, in patched return func(*args, **keywargs) File "/<>/nova/tests/unit/virt/libvirt/test_driver.py", line 7955, in test_detach_volume_with_vir_domain_affect_live_flag """, flags=flags) File "/usr/lib/python3/dist-packages/mock/mock.py", line 944, in assert_called_with six.raise_from(AssertionError(_error_message(cause)), cause) File "", line 3, in raise_from AssertionError: expected call not found. Expected: detachDeviceFlags('\n \n \n\n', flags=3) Actual: detachDeviceFlags('\n \n \n\n', flags=3) == FAIL: nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_update_volume_xml nova.tests.unit.virt.libvirt.test_driver.LibvirtConnTestCase.test_update_volume_xml -- _StringException: pythonlogging:'': {{{2019-08-27 20:26:37,451 WARNING [os_brick.initiator.connectors.remotefs] Connection details not present. RemoteFsClient may not initialize properly.}}} Traceback (most recent call last): File "/<>/nova/tests/unit/virt/libvirt/test_driver.py", line 10157, in test_update_volume_xml etree.tostring(config, encoding='unicode')) File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 411, in assertEqual self.assertThat(observed, matcher, message) File "/usr/lib/python3/dist-packages/testtools/testcase.py", line 498, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: !=: reference = '58a84f6d-3f0c-4e19-a0af-eb657b790657' actual= '58a84f6d-3f0c-4e19-a0af-eb657b790657' To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1841667/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1841481] [NEW] Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache
Public bug reported: Seen with an ironic re-balance in this job: https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check /ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/ On the subnode we see the RT detect that the node is moving hosts: Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: INFO nova.compute.resource_tracker [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42 -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-0010443319 On that new host, the ProviderTree cache is getting updated with refreshed associations for inventory: Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:761}} aggregates: Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate associations for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f, aggregates: None {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:770}} and traits - but when we get traits the provider is gone: Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: ResourceProviderTraitRetrievalFailed: Failed to get traits for resource provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager Traceback (most recent call last): Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager startup=startup) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager return f(*args, **kwargs) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in _update_available_resource Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager is_new_compute_node = self._init_compute_node(context, resources) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 561, in _init_compute_node Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager if self._check_for_nodes_rebalance(context, resources, nodename): Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 516, in _check_for_nodes_rebalance Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager self._update(context, cn) Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1054, in _update Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup) Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 206, in call
[Yahoo-eng-team] [Bug 1841476] [NEW] Spurious ComputeHostNotFound warnings in nova-compute logs during ironic node re-balance
Public bug reported: Seen here: https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check /ironic-tempest-ipa-wholedisk-direct-tinyipa- multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz We see a warning that a compute node could not be found by host and node but then later is found just by nodename and is moving to the current host: Aug 26 18:41:38.800657 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: WARNING nova.compute.resource_tracker [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] No compute node record for ubuntu-bionic-rax-ord-0010443319:61dbc9c7-828b-4c42-b19c-a3716037965f: ComputeHostNotFound_Remote: Compute host ubuntu-bionic-rax- ord-0010443319 could not be found. Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: INFO nova.compute.resource_tracker [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42 -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-0010443319 The warning comes from this call: https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L554 And the re-balance is found here: https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L561 The warning is then a red herring. We could: 1. add something to the warning message saying this could be due to a re-balance but that might be confusing for non-ironic computes and/or 2. check if self.driver.rebalances_nodes and if True, change the warning to an info level message (and potentially modify the message with the re-balance wording in #1 above). ** Affects: nova Importance: Low Status: Triaged ** Tags: ironic resource-tracker serviceability -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1841476 Title: Spurious ComputeHostNotFound warnings in nova-compute logs during ironic node re-balance Status in OpenStack Compute (nova): Triaged Bug description: Seen here: https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check /ironic-tempest-ipa-wholedisk-direct-tinyipa- multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz We see a warning that a compute node could not be found by host and node but then later is found just by nodename and is moving to the current host: Aug 26 18:41:38.800657 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: WARNING nova.compute.resource_tracker [None req- a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] No compute node record for ubuntu-bionic-rax-ord-0010443319:61dbc9c7-828b-4c42-b19c- a3716037965f: ComputeHostNotFound_Remote: Compute host ubuntu-bionic- rax-ord-0010443319 could not be found. Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova- compute[747]: INFO nova.compute.resource_tracker [None req-a894abee- a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42 -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-0010443319 The warning comes from this call: https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L554 And the re-balance is found here: https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L561 The warning is then a red herring. We could: 1. add something to the warning message saying this could be due to a re-balance but that might be confusing for non-ironic computes and/or 2. check if self.driver.rebalances_nodes and if True, change the warning to an info level message (and potentially modify the message with the re-balance wording in #1 above). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1841476/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1833902] Re: Revert resize tests are failing in jobs with iptables_hybrid fw driver
** No longer affects: neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1833902 Title: Revert resize tests are failing in jobs with iptables_hybrid fw driver Status in OpenStack Compute (nova): Fix Released Bug description: Tests: tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert_with_volume_attached are failing 100% times since last ~2 days. And it happens only in jobs with iptables_hybrid fw driver but I don't know if this is really some source of issue or maybe just red herring. Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor%5C%22%20AND%20message%3A%5C%22FAILED%5C%22 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1833902/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1840978] [NEW] nova-manage commands with unexpected errors returning 1 conflict with expected cases of 1 for flow control
Public bug reported: The archive_deleted_rows command returns 1 meaning some records were archived and the code documents that if automating and not using --until-complete, you should keep going while you get rc=1 until you get rc=0: https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L505 The problem is if some unexpected error happens, let's say there is a TypeError in the code or something, the command will also return 1: https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L2625 That unexpected error should probably be a 255 which generally means a command failed in some unexpected way. There might be other nova-manage commands that return 1 for flow control as well. Note that changing the "unexpected error" code from 1 to 255 is an upgrade impacting change worth a release note. ** Affects: nova Importance: Low Status: Triaged ** Tags: nova-manage ** Tags added: nova-manage ** Changed in: nova Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1840978 Title: nova-manage commands with unexpected errors returning 1 conflict with expected cases of 1 for flow control Status in OpenStack Compute (nova): Triaged Bug description: The archive_deleted_rows command returns 1 meaning some records were archived and the code documents that if automating and not using --until-complete, you should keep going while you get rc=1 until you get rc=0: https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L505 The problem is if some unexpected error happens, let's say there is a TypeError in the code or something, the command will also return 1: https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L2625 That unexpected error should probably be a 255 which generally means a command failed in some unexpected way. There might be other nova- manage commands that return 1 for flow control as well. Note that changing the "unexpected error" code from 1 to 255 is an upgrade impacting change worth a release note. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1840978/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1704179] Re: Too many period db actions in large scale clusters increase the load of database
*** This bug is a duplicate of bug 1729621 *** https://bugs.launchpad.net/bugs/1729621 ** This bug has been marked a duplicate of bug 1729621 Inconsistent value for vcpu_used -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1704179 Title: Too many period db actions in large scale clusters increase the load of database Status in OpenStack Compute (nova): In Progress Bug description: Too many period db actions in large scale clusters increase the load of database, especially un-necessary db update or query. For example, over 1000 nodes, it will be 2 * 1000=2000 db update for compute_node table in 60s in _update_available_resource, but this two db update is not necessary if resource usage not changed. Delete the first and second _update() in _init_compute_node can reduce two db update for one node in 60s, if resource usage not changed for this compute_node. Then the funtion self._resource_change(compute_node) in _update() make sense To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1704179/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1729621] Re: Inconsistent value for vcpu_used
** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/queens Status: New => Fix Released ** Changed in: nova/rocky Status: New => Fix Released ** Changed in: nova/pike Assignee: Tony Breeds (o-tony) => Radoslav Gerganov (rgerganov) ** Changed in: nova/pike Status: In Progress => Won't Fix ** Changed in: nova/queens Assignee: (unassigned) => Radoslav Gerganov (rgerganov) ** Changed in: nova/rocky Assignee: (unassigned) => Radoslav Gerganov (rgerganov) ** No longer affects: nova/ocata ** Changed in: nova/queens Importance: Undecided => High ** Changed in: nova/rocky Importance: Undecided => High ** Changed in: nova/pike Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1729621 Title: Inconsistent value for vcpu_used Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Won't Fix Status in OpenStack Compute (nova) queens series: Fix Released Status in OpenStack Compute (nova) rocky series: Fix Released Bug description: Description === Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733) 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) 678 def _update_available_resource(self, context, resources): 679 681 # initialize the compute node object, creating it 682 # if it does not already exist. 683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.* 731 # update the compute_node 732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). Steps to reproduce == 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result === Returned values should be the same during test. Actual result = while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC
[Yahoo-eng-team] [Bug 1789991] Re: nova-compute error after enrolling ironic baremetal nodes
*** This bug is a duplicate of bug 1839674 *** https://bugs.launchpad.net/bugs/1839674 ** This bug has been marked a duplicate of bug 1839674 ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1789991 Title: nova-compute error after enrolling ironic baremetal nodes Status in OpenStack Compute (nova): New Bug description: Description === After enrolling some ironic baremetal nodes, I noticed the following in nova-compute.log (longer trace below): 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager [req-73ba9d4b- b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Error updating resources for node 0e5705cc-e872-49aa-aff4-1a91278b5cb3.: NotImplementedError: Cannot load 'id' in the base class Steps to reproduce == * Enroll ironic baremetal nodes (openstack baremetal node provide) * Wait * Error repeatedly appears in nova-compute.log Expected result === No errors in log Actual result = Errors in log Environment === openstack-nova-compute-18.0.0-0.20180829095234.45fc232.el7.noarch puppet-nova-13.3.1-0.20180825165256.5d1889b.el7.noarch python-nova-18.0.0-0.20180829095234.45fc232.el7.noarch python-novajoin-1.0.19-0.20180828183900.3d58511.el7.noarch openstack-nova-common-18.0.0-0.20180829095234.45fc232.el7.noarch python2-novaclient-11.0.0-0.20180807085257.f1005ce.el7.noarch Logs & Configs = 2018-08-30 17:00:51.142 7 DEBUG oslo_concurrency.lockutils [req-73ba9d4b-b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Lock "compute_resources" release\ d by "nova.compute.resource_tracker._update_available_resource" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils\ .py:285 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager [req-73ba9d4b-b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Error updating resources for node 0e57\ 05cc-e872-49aa-aff4-1a91278b5cb3.: NotImplementedError: Cannot load 'id' in the base class 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager Traceback (most recent call last): 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7729, in _update_av\ ailable_resource_for_node 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 700, in up\ date_available_resource 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager self._update_available_resource(context, resources) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager return f(*args, **kwargs) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 723, in _u\ pdate_available_resource 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager self._init_compute_node(context, resources) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 563, in _i\ nit_compute_node 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager self._setup_pci_tracker(context, cn, resources) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 594, in _s\ etup_pci_tracker 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager n_id = compute_node.id 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 67, in getter 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager self.obj_load_attr(name) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 603, in obj_l\ oad_attr 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager _("Cannot load '%s' in the base class") % attrname) 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager NotImplementedError: Cannot load 'id' in the base class 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1789991/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1840930] [NEW] Networking service in neutron - install guide says to configure nova with [neutron]/url which is deprecated
Public bug reported: - [x] This doc is inaccurate in this way: The [neutron]/url option https://docs.openstack.org/nova/latest/configuration/config.html#neutron.url in nova has been deprecated since the Queens release and is being removed in Train. The neutron/compute config guide in the neutron install guides still says to use the url option though. Since Queens when nova started using KSA adapters for working with neutron config: https://review.opendev.org/#/c/509892/ I think we want to avoid configuring the [neutron] section in nova.conf with url or endpoint_override but install rely on KSA to use the service types authority to find the endpoint to use based on service name/type and interface, in other words things should just work without needing to explicitly define an endpoint url to use for nova talking to neutron - nova can go through KSA and the service catalog to get the endpoint it needs. --- Release: 14.1.0.dev665 on 2017-06-30 05:58:47 SHA: 490471ebd3ac56d0cee164b9c1c1211687e49437 Source: https://opendev.org/openstack/neutron/src/doc/source/install/index.rst URL: https://docs.openstack.org/neutron/latest/install/ ** Affects: neutron Importance: Undecided Status: New ** Tags: doc -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1840930 Title: Networking service in neutron - install guide says to configure nova with [neutron]/url which is deprecated Status in neutron: New Bug description: - [x] This doc is inaccurate in this way: The [neutron]/url option https://docs.openstack.org/nova/latest/configuration/config.html#neutron.url in nova has been deprecated since the Queens release and is being removed in Train. The neutron/compute config guide in the neutron install guides still says to use the url option though. Since Queens when nova started using KSA adapters for working with neutron config: https://review.opendev.org/#/c/509892/ I think we want to avoid configuring the [neutron] section in nova.conf with url or endpoint_override but install rely on KSA to use the service types authority to find the endpoint to use based on service name/type and interface, in other words things should just work without needing to explicitly define an endpoint url to use for nova talking to neutron - nova can go through KSA and the service catalog to get the endpoint it needs. --- Release: 14.1.0.dev665 on 2017-06-30 05:58:47 SHA: 490471ebd3ac56d0cee164b9c1c1211687e49437 Source: https://opendev.org/openstack/neutron/src/doc/source/install/index.rst URL: https://docs.openstack.org/neutron/latest/install/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1840930/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1840430] Re: 创建虚拟机出错
Looks like the nova-api service isn't configured properly for authenticating to neutron, make sure the [neutron] section of your nova configuration is set for working with neutron. See: https://docs.openstack.org/neutron/latest/install/controller-install- ubuntu.html#configure-the-compute-service-to-use-the-networking-service ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1840430 Title: 创建虚拟机出错 Status in OpenStack Compute (nova): Invalid Bug description: 2019-08-16 17:13:51.274 8884 INFO nova.osapi_compute.wsgi.server [req-71681a14-7b44-471f-8060-30419a0924b2 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1 HTTP/1.1" status: 302 len: 249 time: 0.1194441 2019-08-16 17:13:51.279 8884 INFO nova.osapi_compute.wsgi.server [req-45b7dc8c-f676-40ee-9c41-6698fce3a636 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1/ HTTP/1.1" status: 200 len: 720 time: 0.0041459 2019-08-16 17:13:51.402 8884 INFO nova.api.openstack.wsgi [req-15a014b2-b96a-43bf-b0c7-dd378bb551b3 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] HTTP 异常抛出:云主机类型 m1.tiny 没有找到。 2019-08-16 17:13:51.403 8884 INFO nova.osapi_compute.wsgi.server [req-15a014b2-b96a-43bf-b0c7-dd378bb551b3 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1/flavors/m1.tiny HTTP/1.1" status: 404 len: 472 time: 0.0161059 2019-08-16 17:13:51.422 8884 INFO nova.osapi_compute.wsgi.server [req-9136c449-1258-44ce-abbd-46e560724f29 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1/flavors?is_public=None HTTP/1.1" status: 200 len: 1780 time: 0.0171521 2019-08-16 17:13:51.437 8884 INFO nova.osapi_compute.wsgi.server [req-66e55bfa-f1de-4727-9853-d6bc833abf36 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] 192.168.1.115 "GET /v2.1/flavors/cadf12b6-fa82-4e33-a933-b222a2525622 HTTP/1.1" status: 200 len: 800 time: 0.0120480 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions [req-752381d2-f28e-4907-a536-7169473f9698 7a29e10f16db4e4a938f9b73b2599310 81018e90418c4a708459ee88bf9a734c - default default] 在API方法中发生未预料的异常 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions Traceback (most recent call last): 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/openstack/extensions.py", line 338, in wrapped 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return f(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 642, in create 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions **create_kwargs) 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/hooks.py", line 154, in inner 2019-08-16 17:13:51.709 8884 ERROR nova.api.openstack.extensions rv = f(*args, **kwargs) 2019-08-16 17:13:51.709
[Yahoo-eng-team] [Bug 1784874] Re: ResourceTracker doesn't clean up compute_nodes or stats entries
** Also affects: nova/ocata Importance: Undecided Status: New ** Changed in: nova/ocata Status: New => In Progress ** Changed in: nova/ocata Importance: Undecided => Low ** Changed in: nova/ocata Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1784874 Title: ResourceTracker doesn't clean up compute_nodes or stats entries Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: In Progress Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: This was noted in review: https://review.openstack.org/#/c/587636/4/nova/compute/resource_tracker.py@141 That the ResourceTracker.compute_nodes and ResourceTracker.stats (and old_resources) entries only grow and are never cleaned up as we rebalance nodes or nodes are deleted, which means it just takes up memory over time. When we cleanup compute nodes here: https://github.com/openstack/nova/blob/47ef500f4492c731ebfa33a12822ef6b5db4e7e2/nova/compute/manager.py#L7759 We should probably call a cleanup hook into the ResourceTracker to cleanup those entries as well. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1784874/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1784874] Re: ResourceTracker doesn't clean up compute_nodes or stats entries
** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/pike Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1784874 Title: ResourceTracker doesn't clean up compute_nodes or stats entries Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: This was noted in review: https://review.openstack.org/#/c/587636/4/nova/compute/resource_tracker.py@141 That the ResourceTracker.compute_nodes and ResourceTracker.stats (and old_resources) entries only grow and are never cleaned up as we rebalance nodes or nodes are deleted, which means it just takes up memory over time. When we cleanup compute nodes here: https://github.com/openstack/nova/blob/47ef500f4492c731ebfa33a12822ef6b5db4e7e2/nova/compute/manager.py#L7759 We should probably call a cleanup hook into the ResourceTracker to cleanup those entries as well. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1784874/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1840159] [NEW] nova-grenade-live-migration intermittently fails with "Error monitoring migration: Timed out during operation: cannot acquire state change lock (held by remoteDisp
Public bug reported: Seen here: https://logs.opendev.org/21/655721/14/check/nova-grenade-live- migration/2ee634d/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_Aug_13_10_03_49_974378 Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: WARNING nova.virt.libvirt.driver [-] [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] Error monitoring migration: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMigratePerform3Params): libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMigratePerform3Params) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] Traceback (most recent call last): Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 8052, in _live_migration Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] finish_event, disk_paths) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7857, in _live_migration_monitor Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] info = guest.get_job_info() Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/opt/stack/old/nova/nova/virt/libvirt/guest.py", line 709, in get_job_info Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] stats = self._domain.jobStats() Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in doit Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] result = proxy_call(self._autowrap, f, *args, **kwargs) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in proxy_call Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] rv = execute(f, *args, **kwargs) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in execute Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] six.reraise(c, e, tb) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] rv = meth(*args, **kwargs) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1403, in jobStats Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] if ret is None: raise libvirtError ('virDomainGetJobStats() failed', dom=self) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920 nova-compute[25863]: ERROR nova.virt.libvirt.driver [instance: a1637e8b-6f2d-4127-9799-31cefb3f43a6] libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMigratePerform3Params) Aug 13 10:03:49.974378 ubuntu-bionic-limestone-regionone-0010083920
[Yahoo-eng-team] [Bug 1784874] Re: ResourceTracker doesn't clean up compute_nodes or stats entries
** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova/queens Importance: Undecided => Low ** Changed in: nova/queens Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1784874 Title: ResourceTracker doesn't clean up compute_nodes or stats entries Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Confirmed Bug description: This was noted in review: https://review.openstack.org/#/c/587636/4/nova/compute/resource_tracker.py@141 That the ResourceTracker.compute_nodes and ResourceTracker.stats (and old_resources) entries only grow and are never cleaned up as we rebalance nodes or nodes are deleted, which means it just takes up memory over time. When we cleanup compute nodes here: https://github.com/openstack/nova/blob/47ef500f4492c731ebfa33a12822ef6b5db4e7e2/nova/compute/manager.py#L7759 We should probably call a cleanup hook into the ResourceTracker to cleanup those entries as well. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1784874/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1840068] Re: (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len()
This filter code goes back to 2012 so we could backport the fix further (to pike and ocata) but no one is really using the libvirt+lxc code as far as I can tell, at least not with python3, so we can just backport to the non-extended-maintenance branches unless someone wants to backport them upstream to pike and ocata. ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Summary changed: - (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len() + (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len() - python3 ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova Importance: High => Medium ** Changed in: nova/queens Status: New => Confirmed ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/stein Importance: Undecided => Medium ** Changed in: nova/rocky Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1840068 Title: (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len() - python3 Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: Seen in the nova-lxc CI job here: https://logs.opendev.org/24/676024/2/experimental/nova- lxc/f9a892c/controller/logs/screen-n-cpu.txt.gz#_Aug_12_23_31_05_043911 Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [None req-55d6dd1b-96ca-4afe-9a0c-cec902d3bd87 tempest-ServerAddressesTestJSON-1311986476 tempest-ServerAddressesTestJSON-1311986476] [instance: 842a9704-3700-42ef-adb3-b038ca525127] Instance failed to spawn: TypeError: object of type 'filter' has no len() Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] Traceback (most recent call last): Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/compute/manager.py", line 2495, in _build_resources Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] yield resources Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/compute/manager.py", line 2256, in _build_and_run_instance Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] block_device_info=block_device_info) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3231, in spawn Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] destroy_disks_on_failure=True) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5823, in _create_domain_and_network Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] destroy_disks_on_failure) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] self.force_reraise() Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127]
[Yahoo-eng-team] [Bug 1840068] [NEW] (lxc) Instance failed to spawn: TypeError: object of type 'filter' has no len()
Public bug reported: Seen in the nova-lxc CI job here: https://logs.opendev.org/24/676024/2/experimental/nova- lxc/f9a892c/controller/logs/screen-n-cpu.txt.gz#_Aug_12_23_31_05_043911 Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [None req-55d6dd1b-96ca-4afe-9a0c-cec902d3bd87 tempest-ServerAddressesTestJSON-1311986476 tempest-ServerAddressesTestJSON-1311986476] [instance: 842a9704-3700-42ef-adb3-b038ca525127] Instance failed to spawn: TypeError: object of type 'filter' has no len() Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] Traceback (most recent call last): Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/compute/manager.py", line 2495, in _build_resources Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] yield resources Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/compute/manager.py", line 2256, in _build_and_run_instance Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] block_device_info=block_device_info) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3231, in spawn Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] destroy_disks_on_failure=True) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5823, in _create_domain_and_network Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] destroy_disks_on_failure) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] self.force_reraise() Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] six.reraise(self.type_, self.value, self.tb) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] raise value Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5789, in _create_domain_and_network Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] block_device_info): Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] return next(self.gen) Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5701, in _lxc_disk_handler Aug 12 23:31:05.043911 ubuntu-bionic-rax-ord-0010072710 nova-compute[27015]: ERROR nova.compute.manager [instance: 842a9704-3700-42ef-adb3-b038ca525127] block_device_info) Aug 12
[Yahoo-eng-team] [Bug 1839961] Re: Test tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc failing often
*** This bug is a duplicate of bug 1669468 *** https://bugs.launchpad.net/bugs/1669468 ** This bug has been marked a duplicate of bug 1669468 tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc fails intermittently in neutron multinode nv job -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839961 Title: Test tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc failing often Status in OpenStack Compute (nova): New Bug description: I see that Tempest API test tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc is failing quite often on tempest-multinode-full and tempest-multinode-full-py3 jobs. Example: https://logs.opendev.org/12/672612/4/check/tempest-multinode-full-py3/72623e0/testr_results.html.gz Logstash query which I used to find other occurrences: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20True%20is%20not%20false%20%3A%20Token%20must%20be%20invalid%20because%20the%20connection%20closed.%5C%22 I found 61 entries in last 7 days. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839961/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839853] [NEW] Misuse of nova.objects.base.obj_equal_prims in tests
Public bug reported: There are some tests, mostly related to BuildRequest objects, that are calling nova.objects.base.obj_equal_prims which does not assert anything, it only returns True or False - the test code itself must assert the expected result of the obj_equal_prims method. https://github.com/openstack/nova/blob/ab34c941be28f3486cd2699af8d9237e9edac351/nova/tests/functional/db/test_build_request.py https://github.com/openstack/nova/blob/d89579a66ac38fd1e30cea55306e6e7b69bab5b9/nova/tests/unit/objects/test_build_request.py ** Affects: nova Importance: Medium Status: Confirmed ** Tags: low-hanging-fruit testing ** Changed in: nova Status: New => Confirmed ** Tags added: testing ** Changed in: nova Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839853 Title: Misuse of nova.objects.base.obj_equal_prims in tests Status in OpenStack Compute (nova): Confirmed Bug description: There are some tests, mostly related to BuildRequest objects, that are calling nova.objects.base.obj_equal_prims which does not assert anything, it only returns True or False - the test code itself must assert the expected result of the obj_equal_prims method. https://github.com/openstack/nova/blob/ab34c941be28f3486cd2699af8d9237e9edac351/nova/tests/functional/db/test_build_request.py https://github.com/openstack/nova/blob/d89579a66ac38fd1e30cea55306e6e7b69bab5b9/nova/tests/unit/objects/test_build_request.py To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839853/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839674] Re: ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails
** Also affects: nova/ocata Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova/ocata Status: New => Triaged ** Changed in: nova/pike Status: New => Triaged ** Changed in: nova/queens Status: New => Triaged ** Changed in: nova/stein Status: New => Triaged ** Changed in: nova/pike Importance: Undecided => Medium ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova/queens Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839674 Title: ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: Triaged Bug description: I found this while writing a functional recreate test for bug 1839560. As of this change in Ocata: https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537 The ResourceTracker.compute_nodes dict will store the ComputeNode object *before* trying to create it: https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571 The problem is if ComputeNode.create() fails for whatever reason, the next run through update_available_resource won't try to create the ComputeNode again because of this: https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546 And eventually you get errors like this: b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.' b'Traceback (most recent call last):' b' File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node' b'startup=startup)' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource' b'self._update_available_resource(context, resources, startup=startup)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner' b'return f(*args, **kwargs)' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 796, in _update_available_resource' b'self._update(context, cn, startup=startup)' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1052, in _update' b'self.old_resources[nodename] = old_compute' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__' b'self.force_reraise()' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise' b'six.reraise(self.type_, self.value, self.tb)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py", line 693, in reraise' b'raise value' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1046, in _update' b'compute_node.save()' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper' b'return fn(self, *args, **kwargs)' b' File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, in save' b'db_compute = db.compute_node_update(self._context, self.id, updates)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter' b'self.obj_load_attr(name)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr' b'_("Cannot load \'%s\' in the base class") % attrname)' b"NotImplementedError: Cannot load 'id' in the base class" We should only map the ComputeNode when we've successfully created it. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839674/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe :
[Yahoo-eng-team] [Bug 1839674] [NEW] ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails
Public bug reported: I found this while writing a functional recreate test for bug 1839560. As of this change in Ocata: https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537 The ResourceTracker.compute_nodes dict will store the ComputeNode object *before* trying to create it: https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571 The problem is if ComputeNode.create() fails for whatever reason, the next run through update_available_resource won't try to create the ComputeNode again because of this: https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546 And eventually you get errors like this: b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.' b'Traceback (most recent call last):' b' File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node' b'startup=startup)' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource' b'self._update_available_resource(context, resources, startup=startup)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner' b'return f(*args, **kwargs)' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 796, in _update_available_resource' b'self._update(context, cn, startup=startup)' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1052, in _update' b'self.old_resources[nodename] = old_compute' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__' b'self.force_reraise()' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise' b'six.reraise(self.type_, self.value, self.tb)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py", line 693, in reraise' b'raise value' b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1046, in _update' b'compute_node.save()' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper' b'return fn(self, *args, **kwargs)' b' File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, in save' b'db_compute = db.compute_node_update(self._context, self.id, updates)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter' b'self.obj_load_attr(name)' b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr' b'_("Cannot load \'%s\' in the base class") % attrname)' b"NotImplementedError: Cannot load 'id' in the base class" We should only map the ComputeNode when we've successfully created it. ** Affects: nova Importance: Medium Assignee: Matt Riedemann (mriedem) Status: Triaged ** Tags: resource-tracker -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839674 Title: ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails Status in OpenStack Compute (nova): Triaged Bug description: I found this while writing a functional recreate test for bug 1839560. As of this change in Ocata: https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537 The ResourceTracker.compute_nodes dict will store the ComputeNode object *before* trying to create it: https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571 The problem is if ComputeNode.create() fails for whatever reason, the next run through update_available_resource won't try to create the ComputeNode again because of this: https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546 And eventually you get errors like this: b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.' b'Traceback (most recent call last):' b' File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node' b'startup=startup)' b' File &q
[Yahoo-eng-team] [Bug 1833278] Re: nova-status upgrade check should fail if db sync has not been performed
Some related discussion in IRC today: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2019-08-09.log.html#t2019-08-09T17:21:09 ** Changed in: nova Status: In Progress => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1833278 Title: nova-status upgrade check should fail if db sync has not been performed Status in OpenStack Compute (nova): Opinion Bug description: When performing an upgrade, the upgrade check is supposed to be run after the DB schema syncs and data migration. This should be something that is checked by the upgrade check command. Steps to reproduce == Tested in Queens -> Rocky upgrade. Prior to an upgrade, using new code: nova-status upgrade check Expected results Command fails, saying DB sync needs to be performed. Actual results == Command succeeds. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1833278/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839621] Re: Inappropriate split of transport_url string
** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839621 Title: Inappropriate split of transport_url string Status in OpenStack Compute (nova): Invalid Bug description: In /etc/nova/nova.conf line 3085 if your password for messaging provider (such as rabbit) contains "#" character then string will be splitted inaccurately preventing nova service from starting. Steps to reproduce 1. In /etc/nova/nova.conf set transport url to transport_url=rabbit://openstack:test#passw...@controller.host.example.com 2. systemctl start openstack-nova-api.service openstack-nova- consoleauth.service openstack-nova-scheduler.service openstack-nova- conductor.service openstack-nova-novncproxy.service this will produce: Job for openstack-nova-consoleauth.service failed because the control process exited with error code. See "systemctl status openstack-nova-consoleauth.service" and "journalctl -xe" for details. Job for openstack-nova-api.service failed because the control process exited with error code. See "systemctl status openstack-nova-api.service" and "journalctl -xe" for details. Job for openstack-nova-conductor.service failed because the control process exited with error code. See "systemctl status openstack-nova-conductor.service" and "journalctl -xe" for details. Job for openstack-nova-scheduler.service failed because the control process exited with error code. See "systemctl status openstack-nova-scheduler.service" and "journalctl -xe" for details. 3. Check journalctl -xe logs and notice: nova-conductor[31437]: ValueError: invalid literal for int() with base 10: 'test' systemd[1]: openstack-nova-conductor.service: main process exited, code=exited, status=1/FAILURE systemd[1]: Failed to start OpenStack Nova Conductor Server. Environment: OS: CentOS Linux release 7.6.1810 kernel: 3.10.0-957.21.3.el7.x86_64 rpm -qa | grep nova python2-novaclient-13.0.1-1.el7.noarch openstack-nova-conductor-19.0.1-1.el7.noarch openstack-nova-console-19.0.1-1.el7.noarch openstack-nova-common-19.0.1-1.el7.noarch openstack-nova-novncproxy-19.0.1-1.el7.noarch python2-nova-19.0.1-1.el7.noarch openstack-nova-api-19.0.1-1.el7.noarch openstack-nova-scheduler-19.0.1-1.el7.noarch To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839621/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1669468] Re: tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc fails intermittently in neutron multinode nv job
Patch here: https://review.opendev.org/#/c/675652/ ** Also affects: devstack Importance: Undecided Status: New ** No longer affects: nova ** Changed in: devstack Status: New => In Progress ** Changed in: devstack Importance: Undecided => Medium ** Changed in: devstack Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1669468 Title: tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc fails intermittently in neutron multinode nv job Status in devstack: In Progress Bug description: Example output: 2017-02-21 06:42:10.010442 | == 2017-02-21 06:42:10.010458 | Failed 1 tests - output below: 2017-02-21 06:42:10.010471 | == 2017-02-21 06:42:10.010477 | 2017-02-21 06:42:10.010507 | tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc[id-c640fdff-8ab4-45a4-a5d8-7e6146cbd0dc] 2017-02-21 06:42:10.010542 | --- 2017-02-21 06:42:10.010548 | 2017-02-21 06:42:10.010558 | Captured traceback: 2017-02-21 06:42:10.010569 | ~~~ 2017-02-21 06:42:10.010583 | Traceback (most recent call last): 2017-02-21 06:42:10.010606 | File "tempest/api/compute/servers/test_novnc.py", line 152, in test_novnc 2017-02-21 06:42:10.010621 | self._validate_rfb_negotiation() 2017-02-21 06:42:10.010646 | File "tempest/api/compute/servers/test_novnc.py", line 77, in _validate_rfb_negotiation 2017-02-21 06:42:10.010665 | 'Token must be invalid because the connection ' 2017-02-21 06:42:10.010721 | File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/unittest2/case.py", line 696, in assertFalse 2017-02-21 06:42:10.010737 | raise self.failureException(msg) 2017-02-21 06:42:10.010762 | AssertionError: True is not false : Token must be invalid because the connection closed. 2017-02-21 06:42:10.010768 | 2017-02-21 06:42:10.010774 | 2017-02-21 06:42:10.010785 | Captured pythonlogging: 2017-02-21 06:42:10.010796 | ~~~ 2017-02-21 06:42:10.010848 | 2017-02-21 06:07:18,545 16286 INFO [tempest.lib.common.rest_client] Request (NoVNCConsoleTestJSON:test_novnc): 200 POST https://10.27.33.58:8774/v2.1/servers/82d4d4ca-c263-4ac5-85bc-a33488af5ff5/action 0.165s 2017-02-21 06:42:10.010905 | 2017-02-21 06:07:18,545 16286 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Accept': 'application/json', 'X-Auth-Token': '', 'Content-Type': 'application/json'} 2017-02-21 06:42:10.010925 | Body: {"os-getVNCConsole": {"type": "novnc"}} 2017-02-21 06:42:10.011109 | Response - Headers: {u'content-type': 'application/json', 'content-location': 'https://10.27.33.58:8774/v2.1/servers/82d4d4ca-c263-4ac5-85bc-a33488af5ff5/action', u'date': 'Tue, 21 Feb 2017 06:07:18 GMT', u'x-openstack-nova-api-version': '2.1', 'status': '200', u'content-length': '121', u'server': 'Apache/2.4.18 (Ubuntu)', u'connection': 'close', u'openstack-api-version': 'compute 2.1', u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', u'x-compute-request-id': 'req-d9681919-5b5e-4477-b38d-2734b660a099'} 2017-02-21 06:42:10.011153 | Body: {"console": {"url": "http://10.27.33.58:6080/vnc_auto.html?token=f8a52df3-8e0d-4d64-8877-07f607f84b74;, "type": "novnc"}} 2017-02-21 06:42:10.011161 | 2017-02-21 06:42:10.011167 | 2017-02-21 06:42:10.011172 | Full logs at: http://logs.openstack.org/38/431038/3/check/gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv/5e1d485/console.html#_2017-02-21_06_07_18_740230 This started at 2017-02-21 The very first change which failed here was https://review.openstack.org/#/c/431038/ but is not related to the error. To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1669468/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839524] Re: resize on same host
Just because you configure the API to allow resizing to the same host doesn't mean the scheduler is going to pick the same host, e.g. if the host the instance is on is already full, or does not have spare capacity for the new flavor you're resizing *to* then the scheduler will pick another host. Or if the scheduler weights are configured such that the scheduler picks another host, etc. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839524 Title: resize on same host Status in OpenStack Compute (nova): Invalid Bug description: resize instance on the same host is not work when there are more than one openstack compute . allow_resize_to_same_host=True is just work on all in one openstack and after adding another computes its not work any more . To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839524/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839560] Re: ironic: moving node to maintenance makes it unusable afterwards
There are some ideas about hard-deleting the compute nodes records when they (soft) deleted but only if ironic nodes, but that gets messy (and called from lots of places, like when a nova-compute service record is deleted), so it's probably easiest to just revert this: https://review.opendev.org/#/c/571535/ Note you'd also have to revert this to avoid conflicts: https://review.opendev.org/#/c/611162/ ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/rocky Importance: Undecided => High ** Changed in: nova/stein Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839560 Title: ironic: moving node to maintenance makes it unusable afterwards Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: If you use the Ironic API to set a node into a maintenance (for whatever reason), it will no longer be included in the list of available nodes to Nova. When Nova refreshes it's resources periodically, it will find that it is no longer in the list of available nodes and delete it from the database. Once you enable the node again and Nova attempts to create the ComputeNode again, it fails due to the duplicate UUID in the database, because the old record is soft deleted and had the same UUID. ref: https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65 - this made computenode.uuid match the baremetal uuid https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316 - this soft-deletes the computenode record when it doesn't see it in the list of active nodes traces: 2019-08-08 17:20:13.921 6379 INFO nova.compute.manager [req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', u'a634fab2-ecea-4cfa-be09-032dce6eaf51', u'2dee290d-ef73-46bc-8fc2-af248841ca12']) ... 2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker [req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for ctl1-:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: Compute host ctl1- could not be found. Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'") To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839515] [NEW] Weird functional test failures hitting neutron API in unrelated resize flows since 8/5
Public bug reported: Noticed here: https://logs.opendev.org/32/634832/43/check/nova-tox-functional- py36/d4f3be5/testr_results.html.gz With this test: nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled That's a simple test which disables a service and then asserts there is a service.update notification, but there is another notification happening as well: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled 'uuid': self.service_uuid}) File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification base._verify_notification(sample_file_name, replacements, actual) File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS)) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual self.assertThat(observed, matcher, message) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: 1 != 2 And in the error output, we can see this weird traceback of a resize revert failure b/c the NeutronFixture isn't being used: 2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/ 2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception yield File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance migration_p) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start client = _get_ksa_client(context, admin=True) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client auth_plugin = _get_auth_plugin(context, admin=admin) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin _ADMIN_AUTH = _load_auth_plugin(CONF) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin raise neutron_client_exc.Unauthorized(message=err_msg) neutronclient.common.exceptions.Unauthorized: Unknown auth type: None According to logstash this started showing up around 8/5: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d Which makes me think this change, which is restarting a compute service and sleeping in a stub: https://review.opendev.org/#/c/670393/ Might be screwing up concurrently running tests. Looking at when that test runs and the ones that fails: 2019-08-07 23:21:54.157918 | ubuntu-bionic | {4} nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration [4.063814s] ... ok 2019-08-07 23:25:00.073443 | ubuntu-bionic | {4} nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled [160.155643s] ... FAILED We can see they are on the same worker process and run at about the same time. Furthermore, we can see that TestServiceUpdateNotificationSampleLatest.test_service_disabled eventually times out after 160 seconds and this is in the error output: 2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs. Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status self.rpcapi.set_host_enabled(context, service.host, enabled) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled
[Yahoo-eng-team] [Bug 1735009] Re: Cannot rebuild baremetal instance when vm_state is ERROR
** Also affects: nova/ocata Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** Tags added: rebuild -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1735009 Title: Cannot rebuild baremetal instance when vm_state is ERROR Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) ocata series: New Status in OpenStack Compute (nova) pike series: New Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Status in OpenStack Compute (nova) stein series: New Bug description: You can rebuild an instance in ERROR since Havana: http://git.openstack.org/cgit/openstack/nova/commit/?id=99c51e34230394cadf0b82e364ea10c38e193979 This change broke this feature for Ironic since Liberty: http://git.openstack.org/cgit/openstack/nova/commit/?id=ea3967a1fb47297608defd680286fd9051ff5bbe The change adds a check for vm_state=ERROR when waiting for baremetal instance to be ACTIVE. The vm_state is restored to ACTIVE only restored after a successful build. This means rebuilding a baremetal instance using the Ironic driver is impossible because wait_for_active fails if vm_state=ERROR is found. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1735009/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1788527] Re: Redundant instance group lookup during scheduling of move operations
** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/rocky Status: New => In Progress ** Changed in: nova/rocky Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer) ** Changed in: nova/rocky Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1788527 Title: Redundant instance group lookup during scheduling of move operations Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: Fix Committed Bug description: This change: https://github.com/openstack/nova/commit/459ca56de2366aea53efc9ad3295fdf4ddcd452c Added code to the setup_instance_group flow to get the instance group fresh so we had the latest hosts for members of the group. Then change: https://github.com/openstack/nova/commit/94fd36f0582c5dbcf2b9886da7c7bf986d3ad5d1 #diff-cbbdc4d7c140314a7e0b2d97ebcd1f9c Was added to not persist group hosts/members in the RequestSpec since they could be stale after the initial server create. This means when we move a server (evacuate, resize, unshelve, live migrate), we get the request spec with the group plus the current hosts/members of the group. So if the request spec has the group hosts set by the time it gets to setup_instance_group, the call in _get_group_details to get the group fresh is redundant. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1788527/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839391] Re: archive_deleted_rows docs and user-facing messages say CONF.api_database.connection
** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/queens Status: New => Confirmed ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/rocky Importance: Undecided => Critical ** Changed in: nova/rocky Importance: Critical => Low ** Changed in: nova/stein Importance: Undecided => Low ** Changed in: nova/queens Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839391 Title: archive_deleted_rows docs and user-facing messages say CONF.api_database.connection Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: The docs here: https://docs.openstack.org/nova/latest/cli/nova-manage.html and error message here: https://github.com/openstack/nova/blob/af40e3d1a67c8542683368fd6927ac9c0363a3b8/nova/cmd/manage.py#L526 Those are talking about a variable in code and should be saying something like [api_database]/connection instead. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839391/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839391] [NEW] archive_deleted_rows docs and user-facing messages say CONF.api_database.connection
Public bug reported: The docs here: https://docs.openstack.org/nova/latest/cli/nova-manage.html and error message here: https://github.com/openstack/nova/blob/af40e3d1a67c8542683368fd6927ac9c0363a3b8/nova/cmd/manage.py#L526 Those are talking about a variable in code and should be saying something like [api_database]/connection instead. ** Affects: nova Importance: Low Assignee: Matt Riedemann (mriedem) Status: In Progress ** Tags: docs nova-manage -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839391 Title: archive_deleted_rows docs and user-facing messages say CONF.api_database.connection Status in OpenStack Compute (nova): In Progress Bug description: The docs here: https://docs.openstack.org/nova/latest/cli/nova-manage.html and error message here: https://github.com/openstack/nova/blob/af40e3d1a67c8542683368fd6927ac9c0363a3b8/nova/cmd/manage.py#L526 Those are talking about a variable in code and should be saying something like [api_database]/connection instead. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839391/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839360] Re: nova-compute fails with DBNotAllowed error
https://review.opendev.org/#/q/Icddbe4760eaff30e4e13c1e8d3d5d3f489dac3c4 goes back to stable/rocky so this should go back that far as well. ** Changed in: nova Importance: Undecided => Medium ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova/stein Importance: Undecided => Medium ** Changed in: nova/stein Status: New => Confirmed ** Tags added: docs serviceability -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839360 Title: nova-compute fails with DBNotAllowed error Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: Description === During routine operations or things like running regular tempest checks nova-compute tries to reach database and fails with DBNotAllowed error: https://logs.opendev.org/33/660333/10/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/97d8bc3/logs/host/nova-compute.service.journal-23-20-40.log.txt.gz#_Aug_06_22_51_25 Steps to reproduce == This might be reproduced with deploying all nova components (like api, scheduler, conductor, compute) on the same host (OSA all-in-one deployment). During such setup single configuration file is used (nova.conf). As a solution it's possible to log more helpful information why this happens and add some description into docs. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839360/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838811] Re: /opt/stack/devstack/tools/outfilter.py failing in neutron functional jobs since 8/2
** No longer affects: devstack ** Changed in: neutron Importance: Undecided => Critical -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1838811 Title: /opt/stack/devstack/tools/outfilter.py failing in neutron functional jobs since 8/2 Status in neutron: Fix Released Bug description: Seen here: https://logs.opendev.org/86/673486/4/gate/neutron-functional- python27/c3fe4df/ara-report/result/28d8d223-313a-49ba-b8aa- 8af15fdda973/ ++ ./stack.sh:main:500 : /opt/stack/devstack/tools/outfilter.py -v --no-timestamp -o /opt/stack/logs/devstacklog.txt.2019-08-02-160322 Traceback (most recent call last): File "/opt/stack/devstack/tools/outfilter.py", line 104, in sys.exit(main()) File "/opt/stack/devstack/tools/outfilter.py", line 61, in main outfile = open(opts.outfile, 'ab', 0) IOError: [Errno 13] Permission denied: '/opt/stack/logs/devstacklog.txt.2019-08-02-160322' Looks like it's a result of: https://review.opendev.org/#/c/203698/ Based on logstash data of that failure: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22IOError%3A%20%5BErrno%2013%5D%20Permission%20denied%3A%20'%2Fopt%2Fstack%2Flogs%2Fdevstacklog.txt%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1838811/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838819] [NEW] Docs needed for tunables at large scale
Public bug reported: Various things come up in IRC every once in a while about configuration options that need to be tweaked at large scale (blizzard, cern, etc) which once you hit hundreds or thousands of compute nodes need to be changed to avoid killing the control plane. One such option is this: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval >From a blizzard operator: (3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane (3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes (3:06:26 PM) eandersson: We also ended up adding a variance Similarly, CERN had to totally disable this one: https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh And rely on SIGHUP / restart of the service if they needed to refresh that cache. We should put these things in the admin docs as we come across them so we don't forget about this stuff when new operators/users come along and hit scaling issues. ** Affects: nova Importance: Undecided Status: New ** Tags: docs performance -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838819 Title: Docs needed for tunables at large scale Status in OpenStack Compute (nova): New Bug description: Various things come up in IRC every once in a while about configuration options that need to be tweaked at large scale (blizzard, cern, etc) which once you hit hundreds or thousands of compute nodes need to be changed to avoid killing the control plane. One such option is this: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval From a blizzard operator: (3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane (3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes (3:06:26 PM) eandersson: We also ended up adding a variance Similarly, CERN had to totally disable this one: https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh And rely on SIGHUP / restart of the service if they needed to refresh that cache. We should put these things in the admin docs as we come across them so we don't forget about this stuff when new operators/users come along and hit scaling issues. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838819/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838817] [NEW] neutron: refreshing vif model for a server gets the same network multiple times
Public bug reported: As of this change in Rocky: https://review.opendev.org/#/c/585339/ When refreshing the vif model for an instance, e.g. when we get a network-changed event with a specific port ID: https://logs.opendev.org/26/674326/2/experimental/nova-osprofiler- redis/899a204/controller/logs/screen-n-cpu.txt.gz#_Aug_02_18_35_50_884613 Aug 02 18:35:50.884613 ubuntu-bionic-vexxhost-sjc1-0009723918 nova- compute[20428]: DEBUG nova.network.neutronv2.api [req-1a4c2dbf-df86-4044 -a59f-f751a53c5ea6 req-b0e1e2f7-d126-4e55-909d-4803816ca80f service nova] [instance: 5bbe0419-fbeb-4667-8c56-785fdc1d0a62] Refreshing network info cache for port 252040d6-4469-46ec-88c3-85e599a43104 {{(pid=20428) _get_instance_nw_info /opt/stack/nova/nova/network/neutronv2/api.py:1756}} We get the network for the port multiple times, first here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2966 https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L392 And then we pass that list of 1 network dict to _build_vif_model here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2850 and pass it to _nw_info_build_network here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2883 Which then calls _get_physnet_tunneled_info which gets the network again here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L1904 and/or here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L1928 Furthermore, when we're doing forced _heal_instance_info_cache (stein+) we'll refresh the vif model for all ports that are currently attached to the server: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L3015 And rebuild the vif model per port here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L3027 If there is more than one port on the same network attached to the server, we'll be calling show_network for each port even though we're getting the same data when those ports are on the same network. I noticed this while checking some osprofiler results and noticed the network-changed event on the port-targeted refresh took a relatively long time: https://logs.opendev.org/26/674326/2/experimental/nova-osprofiler- redis/899a204/osprofiler-traces/trace-fc50ca23-a6c2-474a- ac07-e61e706eb27d.html.gz ** Affects: nova Importance: Medium Assignee: Matt Riedemann (mriedem) Status: Triaged ** Affects: nova/rocky Importance: Medium Status: Confirmed ** Affects: nova/stein Importance: Medium Status: Confirmed ** Tags: neutron performance ** Changed in: nova Status: New => Triaged ** Changed in: nova Importance: Undecided => Medium ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/stein Importance: Undecided => Medium ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838817 Title: neutron: refreshing vif model for a server gets the same network multiple times Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: As of this change in Rocky: https://review.opendev.org/#/c/585339/ When refreshing the vif model for an instance, e.g. when we get a network-changed event with a specific port ID: https://logs.opendev.org/26/674326/2/experimental/nova-osprofiler- redis/899a204/controller/logs/screen-n-cpu.txt.gz#_Aug_02_18_35_50_884613 Aug 02 18:35:50.884613 ubuntu-bionic-vexxhost-sjc1-0009723918 nova- compute[20428]: DEBUG nova.network.neutronv2.api [req-1a4c2dbf- df86-4044-a59f-f751a53c5ea6 req-b0e1e2f7-d126-4e55-909d-4803816ca80f service nova] [instance: 5bbe0419-fbeb-4667-8c56-785fdc1d0a62] Refreshing network info cache for port 252040d6-4469-46ec- 88c3-85e599a43104 {{(pid=20428) _get_instance_nw_info /opt/stack/nova/nova/network/neutronv2/api.py:1756}} We get the network for the port multiple times, first here: https://github.com/openstack/nova/blob/600ecf3d9a5116d040cd18023ff270b91b06247d/nova/network/neutronv2/api.py#L2966
[Yahoo-eng-team] [Bug 1838811] Re: /opt/stack/devstack/tools/outfilter.py failing in neutron functional jobs since 8/2
** Also affects: neutron Importance: Undecided Status: New ** Changed in: neutron Status: New => Confirmed ** Changed in: devstack Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1838811 Title: /opt/stack/devstack/tools/outfilter.py failing in neutron functional jobs since 8/2 Status in devstack: Confirmed Status in neutron: Confirmed Bug description: Seen here: https://logs.opendev.org/86/673486/4/gate/neutron-functional- python27/c3fe4df/ara-report/result/28d8d223-313a-49ba-b8aa- 8af15fdda973/ ++ ./stack.sh:main:500 : /opt/stack/devstack/tools/outfilter.py -v --no-timestamp -o /opt/stack/logs/devstacklog.txt.2019-08-02-160322 Traceback (most recent call last): File "/opt/stack/devstack/tools/outfilter.py", line 104, in sys.exit(main()) File "/opt/stack/devstack/tools/outfilter.py", line 61, in main outfile = open(opts.outfile, 'ab', 0) IOError: [Errno 13] Permission denied: '/opt/stack/logs/devstacklog.txt.2019-08-02-160322' Looks like it's a result of: https://review.opendev.org/#/c/203698/ Based on logstash data of that failure: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22IOError%3A%20%5BErrno%2013%5D%20Permission%20denied%3A%20'%2Fopt%2Fstack%2Flogs%2Fdevstacklog.txt%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1838811/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838807] [NEW] UnsupportedServiceVersion: Service placement has no discoverable version. The resulting Proxy object will only have direct passthrough REST capabilities.
Public bug reported: I'm seeing this all over the nova tox functional job console logs since the placement client code in nova was changed to use the openstacksdk: https://logs.opendev.org/61/673961/1/gate/nova-tox-functional- py36/a4cb2af/job-output.txt.gz#_2019-08-01_17_51_24_070487 2019-08-01 17:51:24.070487 | ubuntu-bionic | b'/home/zuul/src/opendev.org/openstack/nova/.tox/functional- py36/lib/python3.6/site-packages/openstack/service_description.py:224: UnsupportedServiceVersion: Service placement has no discoverable version. The resulting Proxy object will only have direct passthrough REST capabilities.' I don't know if this is a nova problem, or an sdk problem, or a placement problem, but it's chewing up the functional job logs so if it's external to nova we should add a warnings filter in our tests to only log this once. ** Affects: nova Importance: Medium Status: Confirmed ** Tags: placement testing -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838807 Title: UnsupportedServiceVersion: Service placement has no discoverable version. The resulting Proxy object will only have direct passthrough REST capabilities. Status in OpenStack Compute (nova): Confirmed Bug description: I'm seeing this all over the nova tox functional job console logs since the placement client code in nova was changed to use the openstacksdk: https://logs.opendev.org/61/673961/1/gate/nova-tox-functional- py36/a4cb2af/job-output.txt.gz#_2019-08-01_17_51_24_070487 2019-08-01 17:51:24.070487 | ubuntu-bionic | b'/home/zuul/src/opendev.org/openstack/nova/.tox/functional- py36/lib/python3.6/site-packages/openstack/service_description.py:224: UnsupportedServiceVersion: Service placement has no discoverable version. The resulting Proxy object will only have direct passthrough REST capabilities.' I don't know if this is a nova problem, or an sdk problem, or a placement problem, but it's chewing up the functional job logs so if it's external to nova we should add a warnings filter in our tests to only log this once. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838807/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838541] Re: Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute
Technically this goes back to Pike but I'm not sure we care about fixing it there at this point since Pike is in Extended Maintenance mode upstream. Someone can backport it to stable/pike if they care to. ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838541 Title: Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}. Skipping heal of allocation because we do not know what to do. Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: This warning log from the ResourceTracker is logged quite a bit in CI runs: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d 2601 hits in 7 days. Looking at one of these the warning shows up while spawning the instance during an unshelve operation. This is a possible race for the rt.instance_claim call because the instance.host/node are set here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208 before the instance would be added to the rt.tracked_instances dict started here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217 If the update_available_resource periodic task runs between those times, we'll call _remove_deleted_instances_allocations with the instance and it will have allocations on the node, created by the scheduler, but may not be in tracked_instances yet so we don't short- circuit here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339 And hit the log condition here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397 We should probably downgrade that warning to DEBUG if the instance task_state is set since clearly the instance is undergoing some state transition. We should log the task_state and only log the message as a warning if the instance does not have a task_state set but is also not tracked on the host. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838541/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838541] [NEW] Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this comput
Public bug reported: This warning log from the ResourceTracker is logged quite a bit in CI runs: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d 2601 hits in 7 days. Looking at one of these the warning shows up while spawning the instance during an unshelve operation. This is a possible race for the rt.instance_claim call because the instance.host/node are set here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208 before the instance would be added to the rt.tracked_instances dict started here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217 If the update_available_resource periodic task runs between those times, we'll call _remove_deleted_instances_allocations with the instance and it will have allocations on the node, created by the scheduler, but may not be in tracked_instances yet so we don't short-circuit here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339 And hit the log condition here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397 We should probably downgrade that warning to DEBUG if the instance task_state is set since clearly the instance is undergoing some state transition. We should log the task_state and only log the message as a warning if the instance does not have a task_state set but is also not tracked on the host. ** Affects: nova Importance: Medium Status: Triaged ** Tags: resource-tracker serviceability -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838541 Title: Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}. Skipping heal of allocation because we do not know what to do. Status in OpenStack Compute (nova): Triaged Bug description: This warning log from the ResourceTracker is logged quite a bit in CI runs: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22=7d 2601 hits in 7 days. Looking at one of these the warning shows up while spawning the instance during an unshelve operation. This is a possible race for the rt.instance_claim call because the instance.host/node are set here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208 before the instance would be added to the rt.tracked_instances dict started here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217 If the update_available_resource periodic task runs between those times, we'll call _remove_deleted_instances_allocations with the instance and it will have allocations on the node, created by the scheduler, but may not be in tracked_instances yet so we don't short- circuit here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339 And hit the log condition here: https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397 We should probably downgrade that warning to DEBUG if the instance task_state is set since clearly the instance is undergoing some state transition. We should log the task_state and only log the message as a warning if the instance does not have a task_state set but is also not tracked on the host. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838541/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1819460] Re: instance stuck in BUILD state due to unhandled exceptions in conductor
Actually ignore comment 15, claim_resources didn't raise AllocationUpdateFailed until Stein: https://github.com/openstack/nova/commit/37301f2f278a3702369eec957402e36d53068973 So the bug doesn't apply to Rocky or Queens. ** No longer affects: nova/rocky ** No longer affects: nova/queens -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1819460 Title: instance stuck in BUILD state due to unhandled exceptions in conductor Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Committed Bug description: There are two calls[1][2] during ConductorTaskManager.build_instances, used during re-schedule, that could potentially raise exceptions which leads to that the instance is stuck in BUILD state instead of going to ERROR state. [1] https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L670 [2] https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L679 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1819460/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1819460] Re: instance stuck in BUILD state due to unhandled exceptions in conductor
I'll be backporting the non-fill provider mapping part of this to rocky and queens since the code fix and functional tests related to bug 1837955 rely on changes from the series that fixed this bug. ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1819460 Title: instance stuck in BUILD state due to unhandled exceptions in conductor Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Fix Committed Bug description: There are two calls[1][2] during ConductorTaskManager.build_instances, used during re-schedule, that could potentially raise exceptions which leads to that the instance is stuck in BUILD state instead of going to ERROR state. [1] https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L670 [2] https://github.com/openstack/nova/blob/892ead1438abc9a8a876209343e6a85c80f0059f/nova/conductor/manager.py#L679 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1819460/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1838389] Re: Nova-compute try to flush wrong device mapper when live migrate VM
What version of os-brick are you using? There might be fixes in newer releases of os-brick but you'd have to check the change log probably. Lee Yarwood might be familiar with any related changes to os-brick as well. ** Tags added: libvirt live-migration volumes ** Also affects: os-brick Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838389 Title: Nova-compute try to flush wrong device mapper when live migrate VM Status in OpenStack Compute (nova): New Status in os-brick: New Bug description: Description === When I live-migrate a VM boot from volume on 3par storage (we're using multipath for redundancy), it failed because of nova-compute calling os-brick to flush wrong device mapper, that device mapper is belong to the volume of another VM that lie on the same compute host. Environment === OpenStack version Rocky Hypervisors: Libvirt + KVM Multipath version 0.4.9-123.el7.x86_64 Storage: 3par8440 Networking: Neutron with OpenVSwitch compute_server_1 have 10 VMs, 2 of them is VM-1 with UUID 35940aef-cf19-465a-84e7-8aa14da7fe28, - boots from volume /dev/vda with wwn 360002ac0031a0002107b - has a volume attached to /dev/vdb with wwn 360002ac003190002107b VM-2 with UUID b2c3f475-b916-4811-9614-2c81a79868e8, - boots from volume /dev/vda with wwn 360002ac003130002107b - has a volume attached to /dev/vdb with wwn 360002ac001ac0002107b Try to live-migrate VM-1 to anther compute host but it's failed because os-brick try to flush device mapper with wwn 360002ac001ac0002107b of VM-2 I also tried to live migrate some other VMs on this compute_server_1 but all of that is ok. Expected result === OS-brick flush the right device mapper of the VM. Actual result = OS-brick flush the wrong device mapper of another VM that lied on the same compute host of VM live-migrating. Logs of nova-compute == 2019-07-30 14:16:09.293 6 INFO nova.virt.libvirt.driver [-] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] Migration running for 30 secs, memory 0% remaining; (bytes processed=20294869659, remaining=298622976, total=34377375744) 2019-07-30 14:16:09.628 6 INFO nova.compute.manager [-] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] VM Migration completed (Lifecycle Event) 2019-07-30 14:16:09.760 6 INFO nova.compute.manager [req-99b22dd0-8cb2-45d8-b7b7-4241e1ffcfe0 - - - - -] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] During sync_power_state the instance has a pending task (migrating). Skip. 2019-07-30 14:16:10.521 6 WARNING nova.compute.manager [req-ea4ac52f-9cac-4d1f-b282-d9e99d76f3d7 f295657702674882b2aab02bd9b15b42 c7fe4b7c1a824f738fe12e32b31c1650 - default default] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] Received unexpected event network-vif-unplugged-883d1c97-164f-4c73-a423-afdd8b6ee0f6 for instance with vm_state active and task_state migrating. 2019-07-30 14:16:11.254 6 INFO nova.virt.libvirt.driver [-] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] Migration operation has completed 2019-07-30 14:16:11.254 6 INFO nova.compute.manager [-] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] _post_live_migration() is started.. 2019-07-30 14:16:11.319 6 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', '--config-file', '/usr/share/nova/nova-dist.conf', '--config-file', '/etc/nova/nova.conf', '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpzyR_mV/privsep.sock'] 2019-07-30 14:16:12.131 6 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap 2019-07-30 14:16:12.050 260 INFO oslo.privsep.daemon [-] privsep daemon starting 2019-07-30 14:16:12.054 260 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2019-07-30 14:16:12.056 260 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none 2019-07-30 14:16:12.057 260 INFO oslo.privsep.daemon [-] privsep daemon running as pid 260 2019-07-30 14:16:12.575 6 INFO os_brick.initiator.linuxscsi [-] Find Multipath device file for volume WWN 360002ac001ac0002107b 2019-07-30 14:16:14.065 6 WARNING nova.compute.manager [req-e1ecb028-7af8-4d2c-8a3c-10ecbd627337 f295657702674882b2aab02bd9b15b42 c7fe4b7c1a824f738fe12e32b31c1650 - default default] [instance: 35940aef-cf19-465a-84e7-8aa14da7fe28] Received unexpected event network-vif-plugged-883d1c97-164f-4c73-a423-afdd8b6ee0f6 for instance with vm_state active and task_state migrating. 2019-07-30 14:16:26.253 6 INFO nova.compute.manager [-] [instance:
[Yahoo-eng-team] [Bug 1781391] Re: cellv2_delete_host when host not found by ComputeNodeList
** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova/queens Status: New => In Progress ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1781391 Title: cellv2_delete_host when host not found by ComputeNodeList Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Bug description: Problematic Situation: 1 check the hosts visible to nova compute nova hypervisor-list id hypervisor hostname state status xx compute2 upenbled 2 check the hosts visible to cellv2 nova-manage cell_v2 list_hosts cell name cell uuid hostname cell1 compute1 cell1 compute2 Here compute1 that actually does not exist (like renamed) still remains in cell_mappings 3 now try to delete host compute1 nova-manage cell_v2 delete_host --cell_uuid --host compute1 then the following error is shown: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1620, in main ret = fn(*fn_args, **fn_kwargs) File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1558, in delete_host nodes = objects.ComputeNodeList.get_all_by_host(cctxt, host) File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper result = fn(cls, context, *args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/objects/compute_node.py", line 437, in get_all_by_host use_slave=use_slave) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 225, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/objects/compute_node.py", line 432, in _db_compute_node_get_all_by_host return db.compute_node_get_all_by_host(context, host) File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 297, in compute_node_get_all_by_host return IMPL.compute_node_get_all_by_host(context, host) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 270, in wrapped return f(context, *args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 672, in compute_node_get_all_by_host raise exception.ComputeHostNotFound(host=host) ComputeHostNotFound: Compute host compute1 could not be found. Not quite sure the exact way to reproduce it, but I think it would be nicer to allow the delete_host operation for situations like this. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1781391/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp