[Yahoo-eng-team] [Bug 1928675] Re: [stein][neutron] l3 agent error
[Expired for neutron because there has been no activity for 60 days.] ** Changed in: neutron Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1928675 Title: [stein][neutron] l3 agent error Status in neutron: Expired Bug description: I've just upgraded from openstack queens to rocky and then to stein on centos 7. In my configuration I have router high availability. After the upgrade and rebooting each controller one by one I get the following errors on all my 3 controllers under /var/log/neutron/l3-agent.log http://paste.openstack.org/show/805407/ If I run openstack router show for for one of uuid in the log: http://paste.openstack.org/show/805408/ Namespace for router in present on all 3 controllers. After the controllers reboot,some routers lost their routing tables, but restarting l3-agent they went ok. The l3 agent respawning error, cause neutron to fill controller memory and the controller stops to responding and it is fenced by the others. So the router ha move some jobs to another controller and it fills its memory and so on. I stopped neutron services and I cleaned directory /var/lib/neutron/ha_confs Restarting neutron services respawning errors disappeared but I had to create again some router static routes (not all). Probably it is a bug ? Ignazio To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1928675/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1849425] Re: Extend attached volume case failed
Reviewed: https://review.opendev.org/c/openstack/nova/+/801714 Committed: https://opendev.org/openstack/nova/commit/49ba5a763f9d2c7c496f391ca8149a18541bfec7 Submitter: "Zuul (22348)" Branch:master commit 49ba5a763f9d2c7c496f391ca8149a18541bfec7 Author: Lee Yarwood Date: Thu Jul 22 09:05:51 2021 +0100 libvirt: Handle silent failures to extend volume within os-brick As seen in bug #1849425 os-brick can fail to extend an underlying volume device on the compute silently, returning a new_size of None to n-cpu. While this should ultimatley be addressed in os-brick n-cpu can also handle this before we eventually run into a type error when attemting floor division later in the volume extend flow. Change-Id: Ic8091537274a5ad27fb5af8939f81ed154b7ad7c Closes-Bug: #1849425 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1849425 Title: Extend attached volume case failed Status in Cinder: New Status in OpenStack Compute (nova): Fix Released Bug description: The online expansion volume use case failed, tempest.api.volume.test_volumes_extend:VolumesExtendAttachedTest.test_extend_attached_volume. This use case was originally successfully executed and failed after reinstalling devstack (stein-->train). The cinder driver has not changed, and the multipath.conf configuration has not changed. There are two kinds of errors in devstack@n-cpu.service, one is the error under the ubuntu18 system, and the other is the error under the centos7 system. as follows ubuntu18: Oct 23 09:34:52 stack nova-compute[11706]: WARNING nova.compute.manager [req-0ce8f571-5c04-4a72-8049-3839c934c7c9 req-421b45b2-be78-4bc3-8a1b-c469e4d7504d service nova] [instance: 6a4b5b05-20b7-46c1-b903-b6fe40a38aad] Extend volume failed, volume_id=bc3c5af6-b6c6-4e01-8d0e-774ad637debc, reason: unsupported operand type(s) for //: 'NoneType' and 'int' Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server [req-0ce8f571-5c04-4a72-8049-3839c934c7c9 req-421b45b2-be78-4bc3-8a1b-c469e4d7504d service nova] Exception during message handling: TypeError: unsupported operand type(s) for //: 'NoneType' and 'int' Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/dist-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 79, in wrapped Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb) Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server self.force_reraise() Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.7/dist-packages/six.py", line 693, in reraise Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server raise value Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 69, in wrapped Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) Oct 23 09:34:52 stack nova-compute[11706]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 8587, in external_instance_event Oct 23 09:34:52 stack nova-comp
[Yahoo-eng-team] [Bug 1939169] [NEW] glance md-tag-create-multiple overwrites existing tags
Public bug reported: Our md-tag-create-multiple (/v2/metadefs/namespaces/{namespace_name}/tags) [1] API overwrites existing tags for specified namespace rather than creating new one in addition to the existing tags. Where as if you try to create different tags using md-tag-create (/v2/metadefs/namespaces/{namespace_name}/tags/{tag_name}) it is working as expected, means adding new tag in addition to existing ones. Steps to reproduce: 1. source using admin credentials $ source devstack/openrc admin admin 2. Create new public namespace $ glance md-namespace-create TagsBugNamespace --visibility public ++--+ | Property | Value| ++--+ | created_at | 2021-08-06T17:43:03Z | | namespace | TagsBugNamespace | | owner | a14a058e2d1540c3a0dc7c397c55174e | | protected | False| | schema | /v2/schemas/metadefs/namespace | | updated_at | 2021-08-06T17:43:03Z | | visibility | public | ++--+ 3. Create single tag using md-tag-create command $ glance md-tag-create TagsBugNamespace --name tag1 ++--+ | Property | Value| ++--+ | created_at | 2021-08-06T17:57:37Z | | name | tag1 | | updated_at | 2021-08-06T17:57:37Z | ++--+ 4. Create another tag $ glance md-tag-create TagsBugNamespace --name tag2 ++--+ | Property | Value| ++--+ | created_at | 2021-08-06T17:57:37Z | | name | tag2 | | updated_at | 2021-08-06T17:57:37Z | ++--+ 5. Verify that we have two tags in the list $ glance md-tag-list TagsBugNamespace +--+ | name | +--+ | tag2 | | tag1 | +--+ 6. Add more tags using md-tag-crate-multiple command $ glance md-tag-create-multiple TagsBugNamespace --names TestTag1141=TestTag2411 --delim = +-+ | name| +-+ | TestTag1141 | | TestTag2411 | +-+ 7. Now run tags list command again $ glance md-tag-list TagsBugNamespace +-+ | name| +-+ | TestTag2411 | | TestTag1141 | +-+ Expected result: These new tags should have been added to existing tags. Actual result: Existing tags gets deleted and only newly added tags using md-tag-crate-multiple command remains. * This is further to show that adding new tag using md-tag-create command now will add a new tag and does not overwrite existing ones. $ glance md-tag-create TagsBugNamespace --name tag3 ++--+ | Property | Value| ++--+ | created_at | 2021-08-06T18:12:14Z | | name | tag3 | | updated_at | 2021-08-06T18:12:14Z | ++--+ * Verify that we have not overwritten existing tags now; $ glance md-tag-list TagsBugNamespace +-+ | name| +-+ | tag3| | TestTag2411 | | TestTag1141 | +-+ [1] https://docs.openstack.org/api-ref/image/v2/metadefs- index.html?expanded=create-tag-definition-detail,create-tags-detail,get- tag-definition-detail,delete-all-tag-definitions-detail#create-tags ** Affects: glance Importance: High Status: New ** Affects: glance/xena Importance: High Status: New ** Changed in: glance Importance: Undecided => High ** Description changed: Our md-tag-create-multiple (/v2/metadefs/namespaces/{namespace_name}/tags) [1] API overwrites existing tags for specified namespace rather than creating new one in addition to the existing tags. Where as if you try to create different tags using md-tag-create (/v2/metadefs/namespaces/{namespace_name}/tags/{tag_name}) it is working as expected, means adding new tag in addition to existing ones. Steps to reproduce: 1. source using admin credentials $ source devstack/openrc admin admin 2. Create new public namespace - $ glance md-namespace-create TagsBugNamespace --visibility public + $ glance md-namespace-create TagsBugNamespace --visibility public ++--+ | Property | Value| ++--+ | created_at | 2021-08-06T17:43:03Z | | namespace | TagsBugNamespace | | owner | a14a058e2d1540c3a0dc7c397c55174e | | protected | False| | schema | /v2/schemas/metadefs/namespace | | updated_at | 2021-08-06T17:43:03Z | | visibility | public | ++--+ 3. Create single tag using md-tag-create command $ glanc
[Yahoo-eng-team] [Bug 1936408] Re: [RFE] Neutron quota change should check available existing resources
We discussed that RFE on today's drivers meeting https://meetings.opendev.org/meetings/neutron_drivers/2021/neutron_drivers.2021-08-06-14.03.log.html#l-14 and we all agreed that current behavior is actually a feature and we shouldn't change it. It also aligns with the comment #6 from Brian. So we decided to reject that RFE. ** Tags removed: rfe-triaged ** Tags added: rfe ** Changed in: neutron Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1936408 Title: [RFE] Neutron quota change should check available existing resources Status in neutron: Won't Fix Bug description: Neutron quota change should check available existing resources. This is done, for example, in Nova. When a quota resource limit is changed, the available resource count is checked first. If the new quota upper limit (lower than the previous one) is lower than the amount of resources in use, the quota driver should raise an exception. This RFE implies a change in the Neutron quota current behaviour. Some users are expecting the new quota limit to be applied, regardless of being lower than the current resource usage. However, other users (Octavia) expect the quota driver to fail when lowering the quota limit under the existing resource usage. My recommendation is to use a config knob to decide the behaviour of the quota driver; by default, the current behaviour will prevail. Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1980728 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1936408/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1939144] [NEW] [OVN] Router Availability Zones doesn't work with segmented networks
Public bug reported: Hi, Looking at the external networks from the edge environment I see that these fields are None: | provider:network_type | None | | provider:physical_network | None | Instead we have this: | segments | [{'provider:network_type': 'flat', 'provider:physical_network': 'leaf0', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf1', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf2', 'provider:segmentation_id': None}] | When building a list of candidates nodes to scheduler the gateway router ports to, the ML2/OVN driver tries to check if there's a physical network on the nodes, see [0][1]. And in order to do that it uses the "provider:network_type" and "provider:physical_network" fields (see [1]). So the physnet attribute is now None (see [0]) and when it gets to the get_candidates_for_scheduling() method [2] the list of candidates will be empty because no gateway node matched this physnet. Also it is in this method that we filter the candidates based on the AZs. Now, the reason why it does not fail and the gw port still get scheduled to any other gw node is because once it gets to the scheduler code if the list candidates is empty it will then just fetch a list of gw chassis without any consideration [3] regarding the physnets and use it as candidates. As you can see the code is messy and a future refactor may be needed. For this problem specifically I would recommend doing a simpler fix where get_candidates_for_scheduling() would consider all GW chassis independent of the physnet in case it's None and then filter these Chassis based on their AZ. That would be a simpler fix that is backportable. [0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370 [1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317 [2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296 [3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62 ** Affects: neutron Importance: High Assignee: Lucas Alvares Gomes (lucasagomes) Status: Confirmed ** Tags: ovn ** Changed in: neutron Status: New => Confirmed ** Changed in: neutron Importance: Undecided => High ** Changed in: neutron Assignee: (unassigned) => Lucas Alvares Gomes (lucasagomes) ** Tags added: ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939144 Title: [OVN] Router Availability Zones doesn't work with segmented networks Status in neutron: Confirmed Bug description: Hi, Looking at the external networks from the edge environment I see that these fields are None: | provider:network_type | None | | provider:physical_network | None | Instead we have this: | segments | [{'provider:network_type': 'flat', 'provider:physical_network': 'leaf0', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf1', 'provider:segmentation_id': None}, {'provider:network_type': 'flat', 'provider:physical_network': 'leaf2', 'provider:segmentation_id': None}] | When building a list of candidates nodes to scheduler the gateway router ports to, the ML2/OVN driver tries to check if there's a physical network on the nodes, see [0][1]. And in order to do that it uses the "provider:network_type" and "provider:physical_network" fields (see [1]). So the physnet attribute is now None (see [0]) and when it gets to the get_candidates_for_scheduling() method [2] the list of candidates will be empty because no gateway node matched this physnet. Also it is in this method that we filter the candidates based on the AZs. Now, the reason why it does not fail and the gw port still get scheduled to any other gw node is because once it gets to the scheduler code if the list candidates is empty it will then just fetch a list of gw chassis without any consideration [3] regarding the physnets and use it as candidates. As you can see the code is messy and a future refactor may be needed. For this problem specifically I would recommend doing a simpler fix where get_candidates_for_scheduling() would consider all GW chassis independent of the physnet in case it's None and then filter these Chassis based on their AZ. That would be a simpler fix that is backportable. [0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb
[Yahoo-eng-team] [Bug 1939137] [NEW] ovn + log service plugin reports AttributeError: 'NoneType' object has no attribute 'tenant_id'
Public bug reported: Originally noticed in a TripleO job[1], and after enabling log service plugin in devstack seeing the similar error in neutron service log. Following Traceback is seen:- ERROR neutron_lib.callbacks.manager [None req-131d215c-1d03-48ce-a16e-28175c0f58ba tempest-DefaultSnatToExternal-1368745793 tempest-DefaultSnatToExternal-1368745793-project] Error during notification for neutron.services.logapi.common.sg_callback.SecurityGroupRuleCallBack.handle_event-423586 security_group_rule, after_create: AttributeError: 'NoneType' object has no attribute 'tenant_id' Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager Traceback (most recent call last): Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py", line 197, in _notify_loop Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, **kwargs) Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/services/logapi/common/sg_callback.py", line 32, in handle_event Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager log_resources = db_api.get_logs_bound_sg(context, sg_id) Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/services/logapi/common/db_api.py", line 186, in get_logs_bound_sg Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager project_id = context.tenant_id Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager AttributeError: 'NoneType' object has no attribute 'tenant_id' Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager Example log:- https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c84/803712/1/check/neutron-tempest-plugin-scenario-ovn/c84b228/controller/logs/screen-q-svc.txt https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_65b/797823/16/check/tripleo-ci-centos-8-standalone/65b6831/logs/undercloud/var/log/containers/neutron/server.log The support was added as part of https://bugs.launchpad.net/neutron/+bug/1914757 Test patch:- https://review.opendev.org/c/openstack/neutron-tempest- plugin/+/803712 ** Affects: neutron Importance: Undecided Assignee: Kamil Sambor (ksambor) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939137 Title: ovn + log service plugin reports AttributeError: 'NoneType' object has no attribute 'tenant_id' Status in neutron: New Bug description: Originally noticed in a TripleO job[1], and after enabling log service plugin in devstack seeing the similar error in neutron service log. Following Traceback is seen:- ERROR neutron_lib.callbacks.manager [None req-131d215c-1d03-48ce-a16e-28175c0f58ba tempest-DefaultSnatToExternal-1368745793 tempest-DefaultSnatToExternal-1368745793-project] Error during notification for neutron.services.logapi.common.sg_callback.SecurityGroupRuleCallBack.handle_event-423586 security_group_rule, after_create: AttributeError: 'NoneType' object has no attribute 'tenant_id' Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager Traceback (most recent call last): Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py", line 197, in _notify_loop Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, **kwargs) Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/services/logapi/common/sg_callback.py", line 32, in handle_event Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager log_resources = db_api.get_logs_bound_sg(context, sg_id) Aug 06 09:28:02.985925 ubuntu-focal-airship-kna1-0025793663 neutron-server[107737]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/services/logapi/common/db_api.py", line 186,
[Yahoo-eng-team] [Bug 1939125] Re: Incorect Auto schedule new network segments notification listner
** Also affects: neutron/stein Importance: Undecided Status: New ** Also affects: neutron/queens Importance: Undecided Status: New ** Also affects: neutron/rocky Importance: Undecided Status: New ** Changed in: neutron/queens Status: New => Triaged ** Changed in: neutron/rocky Status: New => Triaged ** Changed in: neutron/stein Status: New => Triaged ** Changed in: neutron/queens Importance: Undecided => Medium ** Changed in: neutron/stein Importance: Undecided => Medium ** Changed in: neutron/rocky Importance: Undecided => Medium ** Changed in: neutron Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939125 Title: Incorect Auto schedule new network segments notification listner Status in neutron: New Status in neutron queens series: Triaged Status in neutron rocky series: Triaged Status in neutron stein series: Triaged Bug description: auto_schedule_new_network_segments() added in Ic9e64aa4ecdc3d56f00c26204ad931b810db7599 uses new payload notification listener in old stable branches of Neutron that still use old notify syntax. Following branches are affected: stable/stein, stable/rocky, stable/queens To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1939125/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938021] Re: nova.tests.functional.test_cross_cell_migrate.TestMultiCellMigrate.test_delete_while_in_verify_resize_status hits oslo.messaging._drivers.impl_fake.send failure
Reviewed: https://review.opendev.org/c/openstack/nova/+/803714 Committed: https://opendev.org/openstack/nova/commit/d4dbcd5fa05ac2f988b65d611f71805f90411581 Submitter: "Zuul (22348)" Branch:master commit d4dbcd5fa05ac2f988b65d611f71805f90411581 Author: Lee Yarwood Date: Fri Aug 6 10:02:15 2021 +0100 func: Increase rpc_response_timeout in TestMultiCellMigrate tests This was previously set really low to 1 second that was leading to more involved flows such as test_delete_while_in_verify_resize_status timing out when the target calls the conductor to confirm the resize on the source. This change simply increases the timeout in the test but we might want to think about moving this call over to rpc_long_timeout this could be an issue in real world deployments. Closes-Bug: #1938021 Change-Id: Ibba2d1506a0b026d35d7bf35384ec6439f438b01 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1938021 Title: nova.tests.functional.test_cross_cell_migrate.TestMultiCellMigrate.test_delete_while_in_verify_resize_status hits oslo.messaging._drivers.impl_fake.send failure Status in OpenStack Compute (nova): Fix Released Status in oslo.messaging: New Bug description: https://a8ba7f0ac14669316775-62d3a5548ea094caef4a9963ba6c55d1.ssl.cf1.rackcdn.com/798145/4/gate/nova- tox-functional-centos8-py36/1ee0272/testr_results.html 2021-07-25 02:45:22,896 ERROR [nova.api.openstack.wsgi] Unexpected exception in API method Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 207, in _send reply, failure = reply_q.get(timeout=timeout) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get return waiter.wait() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait return get_hub().switch() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch return self.greenlet.switch() queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/api/openstack/wsgi.py", line 658, in wrapped return f(*args, **kwargs) File "/home/zuul/src/opendev.org/openstack/nova/nova/api/openstack/compute/servers.py", line 1070, in delete self._delete(req.environ['nova.context'], req, id) File "/home/zuul/src/opendev.org/openstack/nova/nova/api/openstack/compute/servers.py", line 883, in _delete self.compute_api.delete(context, instance) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 226, in inner return function(self, context, instance, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 153, in inner return f(self, context, instance, *args, **kw) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2541, in delete self._delete_instance(context, instance) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2533, in _delete_instance task_state=task_states.DELETING) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2311, in _delete self._confirm_resize_on_deleting(context, instance) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2405, in _confirm_resize_on_deleting context, instance, migration, do_cast=False) File "/home/zuul/src/opendev.org/openstack/nova/nova/conductor/api.py", line 182, in confirm_snapshot_based_resize ctxt, instance, migration, do_cast=do_cast) File "/home/zuul/src/opendev.org/openstack/nova/nova/conductor/rpcapi.py", line 468, in confirm_snapshot_based_resize return cctxt.call(ctxt, 'confirm_snapshot_based_resize', **kw) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call transport_options=self.transport_options) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send transport_options=transport_options) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 223, in send transport_options) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/
[Yahoo-eng-team] [Bug 1937375] Re: Duplicate BlockDeviceMapping if attaching volumes too fast
Reviewed: https://review.opendev.org/c/openstack/nova/+/801990 Committed: https://opendev.org/openstack/nova/commit/2209b0007fe85d7c5439e0bfdfe2120c63898fa2 Submitter: "Zuul (22348)" Branch:master commit 2209b0007fe85d7c5439e0bfdfe2120c63898fa2 Author: Felix Huettner Date: Fri Jul 23 10:43:32 2021 +0200 compute: Avoid duplicate BDMs during reserve_block_device_name When attaching a volume to a running instance the nova-api validates that the volume is not already attached to the instance. However nova-compute is responsible for actually creating the BDM entry in the database. If sending attach requests fast enough it can be possible that the same "attach_volume" request can be sent to nova-compute for the same volume/instance combination. To work around this we add a check in nova-compute to validate that the volume has not been attached in the mean time. Closes-Bug: #1937375 Change-Id: I92f35514efddcb071c7094370b79d91d34c5bc72 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1937375 Title: Duplicate BlockDeviceMapping if attaching volumes too fast Status in OpenStack Compute (nova): Fix Released Bug description: Description === When attaching a volume to a running instance the nova-api validates that the volume is not already attached to the instance. However nova-compute is responsible for actually creating the BDM entry in the database. If sending attach requests fast enough it can be possible that the same "attach_volume" request can be sent to nova-compute for the same volume/instance combination. Steps to reproduce == * Create an Instance and a volume * run in 2 terminals in parallel "openstack server add volume " * when being fast enough this results in "openstack server show" to report the volume id twice Expected result === The volume is attached to the instance just once Actual result = The volume is attached to the instance two (or with more parallelity) or more times. Environment === 1. Happens on master till queens (possibly also earlier releases) 2. Which hypervisor did you use? Indepentent of Hypervisor, observed with Libvirt 2. Which storage type did you use? Independent of Storage, observed with ceph + nfs 3. Which networking type did you use? Independent of Network, observed with ovs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1937375/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1939125] [NEW] Incorect Auto schedule new network segments notification listner
Public bug reported: auto_schedule_new_network_segments() added in Ic9e64aa4ecdc3d56f00c26204ad931b810db7599 uses new payload notification listener in old stable branches of Neutron that still use old notify syntax. Following branches are affected: stable/stein, stable/rocky, stable/queens ** Affects: neutron Importance: Undecided Assignee: Szymon Wróblewski (bluex) Status: New ** Changed in: neutron Assignee: (unassigned) => Szymon Wróblewski (bluex) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939125 Title: Incorect Auto schedule new network segments notification listner Status in neutron: New Bug description: auto_schedule_new_network_segments() added in Ic9e64aa4ecdc3d56f00c26204ad931b810db7599 uses new payload notification listener in old stable branches of Neutron that still use old notify syntax. Following branches are affected: stable/stein, stable/rocky, stable/queens To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1939125/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1939119] [NEW] cloud-init not parsing the instance metadata when provided under "/etc/cloud/cloud.cfg.d/
Public bug reported: According to cloud-init document, https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html#using-instance-data Both user-data scripts and #cloud-config data support jinja template rendering. When the first line of the provided user-data begins with, ## template: jinja cloud-init will use jinja to render that file. I have tested user-data with jinja variable, it works as expected. now my question is, does cloud-init support parsing the instance metadata when provided under "/etc/cloud/cloud.cfg.d/ ? For example, A config file /etc/cloud/cloud.cfg.d/99-lob_unmanaged.cfg is created with the below content in a VM on Azure cloud. === ## template: jinja #cloud-config runcmd: - echo "{{ ds.meta_data.imds.compute.tags }}" > /root/vm_tags.txt === However, during the cloud-init initialization, the instance metadata is not parsed and the file is created with the below data. $ cat /root/vm_tags.txt {{ ds.meta_data.imds.compute.tags }} The scripts also have unparsed data. ]$ cat var/lib/cloud/instance/scripts/runcmd #!/bin/sh echo "{{ ds.meta_data.imds.compute.tags }}" > /root/vm_tags.txt Could anyone please help to confirm, is it a bug or expected result? ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1939119 Title: cloud-init not parsing the instance metadata when provided under "/etc/cloud/cloud.cfg.d/ Status in cloud-init: New Bug description: According to cloud-init document, https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html#using-instance-data Both user-data scripts and #cloud-config data support jinja template rendering. When the first line of the provided user-data begins with, ## template: jinja cloud-init will use jinja to render that file. I have tested user-data with jinja variable, it works as expected. now my question is, does cloud-init support parsing the instance metadata when provided under "/etc/cloud/cloud.cfg.d/ ? For example, A config file /etc/cloud/cloud.cfg.d/99-lob_unmanaged.cfg is created with the below content in a VM on Azure cloud. === ## template: jinja #cloud-config runcmd: - echo "{{ ds.meta_data.imds.compute.tags }}" > /root/vm_tags.txt === However, during the cloud-init initialization, the instance metadata is not parsed and the file is created with the below data. $ cat /root/vm_tags.txt {{ ds.meta_data.imds.compute.tags }} The scripts also have unparsed data. ]$ cat var/lib/cloud/instance/scripts/runcmd #!/bin/sh echo "{{ ds.meta_data.imds.compute.tags }}" > /root/vm_tags.txt Could anyone please help to confirm, is it a bug or expected result? To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1939119/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1939116] [NEW] File system freeze before end of mirroring generate kernel task blocked for more than 120 seconds
Public bug reported: Description === On the live snapshot feature to have data consistency, the instance can be boot with image property hw_qemu_guest_agent set to yes and QEMU agent install in the image. The instance with local storage creates a disk mirror to have a copy and the disk. With property hw_qemu_guest_agent the compute contacts the instance to freeze the filesystem before starting the mirror and contacts the instance again after the end of mirroring to unfreeze the filesystem. The issue is for instance with a "big" disk the filesystem is frozen for a long time and the impact is instance hung "task blocked for more than 120 seconds." Steps to reproduce == 1/ Create an image with property hw_qemu_guest_agent=yes and QEMU agent install in the image 2/ Create an instance with local storage (disk size 200Go) with real data on the disk 3/ Start live-snapshot 4/ Instance frozen during the mirroring Expected result === Nova compute can wait until the end of mirroring to freeze the file system in the instance. Actual result = The instance is frozen during the mirroring ** Affects: nova Importance: Undecided Assignee: Pierre Libeau (pierre-libeau) Status: New ** Changed in: nova Assignee: (unassigned) => Pierre Libeau (pierre-libeau) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1939116 Title: File system freeze before end of mirroring generate kernel task blocked for more than 120 seconds Status in OpenStack Compute (nova): New Bug description: Description === On the live snapshot feature to have data consistency, the instance can be boot with image property hw_qemu_guest_agent set to yes and QEMU agent install in the image. The instance with local storage creates a disk mirror to have a copy and the disk. With property hw_qemu_guest_agent the compute contacts the instance to freeze the filesystem before starting the mirror and contacts the instance again after the end of mirroring to unfreeze the filesystem. The issue is for instance with a "big" disk the filesystem is frozen for a long time and the impact is instance hung "task blocked for more than 120 seconds." Steps to reproduce == 1/ Create an image with property hw_qemu_guest_agent=yes and QEMU agent install in the image 2/ Create an instance with local storage (disk size 200Go) with real data on the disk 3/ Start live-snapshot 4/ Instance frozen during the mirroring Expected result === Nova compute can wait until the end of mirroring to freeze the file system in the instance. Actual result = The instance is frozen during the mirroring To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1939116/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1939108] [NEW] tempest.api.compute.volumes.test_attach_volume_negative.AttachVolumeNegativeTest fails due to detach failure during cleanup
Public bug reported: Creating a fresh bug for this as we haven't seen it in a while since the libvirt event rewrite landed but it did just cause a gate failure... https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_5d6/801990/6/gate/tempest- integrated-compute/5d6a7ec/testr_results.html 13 Aug 05 16:01:08.964854 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: ERROR nova.virt.libvirt.driver [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tempe st-AttachVolumeNegativeTest-1260221999-project] Waiting for libvirt event about the detach of device vdb with device alias virtio-disk1 from instance ce4b444f-1587-43e9-b708-d2ae2c9cd59f is timed out. 47114 Aug 05 16:01:08.969793 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: DEBUG nova.virt.libvirt.driver [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tempe st-AttachVolumeNegativeTest-1260221999-project] Failed to detach device vdb with device alias virtio-disk1 from instance ce4b444f-1587-43e9-b708-d2ae2c9cd59f from the live domain config. Libvirt did not re port any error but the device is still in the config. {{(pid=104299) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2394}} 47115 Aug 05 16:01:08.970117 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: ERROR nova.virt.libvirt.driver [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tempe st-AttachVolumeNegativeTest-1260221999-project] Run out of retry while detaching device vdb with device alias virtio-disk1 from instance ce4b444f-1587-43e9-b708-d2ae2c9cd59f from the live domain config. De vice is still attached to the guest. 47116 Aug 05 16:01:08.971007 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: WARNING nova.virt.block_device [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tempe st-AttachVolumeNegativeTest-1260221999-project] [instance: ce4b444f-1587-43e9-b708-d2ae2c9cd59f] Guest refused to detach volume ff8d0867-1e92-4f04-8e52-01bf05f52ac3: nova.exception.DeviceDetachFailed: Devi ce detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance ce4b444f-1587-43e9-b708-d2ae2c9cd59f from the live domain config. Device is still attached to the guest. 47117 Aug 05 16:01:09.024848 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: DEBUG oslo_concurrency.lockutils [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tem pest-AttachVolumeNegativeTest-1260221999-project] Lock "ce4b444f-1587-43e9-b708-d2ae2c9cd59f" released by "nova.compute.manager.ComputeManager.detach_volume..do_detach_volume" :: held 201.180s {{(p id=104299) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:367}} 47118 Aug 05 16:01:09.095134 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: ERROR oslo_messaging.rpc.server [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 temp est-AttachVolumeNegativeTest-1260221999-project] Exception during message handling: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance ce4b444f-1587-43e9-b708-d2ae2c9cd59f from the live domain config. Device is still attached to the guest. $ logsearch storedsearch bug-xxx-detach Running stored search: bug-xxx-detach: branches: - master files: - controller/logs/screen-n-cpu.txt job-groups: - nova-devstack limit: 100 project: openstack/nova regex: from the live domain config. Device is still attached to the guest. result: FAILURE [..] 5d6a7ec36f104f10a1f49fe44909aa23:.logsearch/5d6a7ec36f104f10a1f49fe44909aa23/controller/logs/screen-n-cpu.txt:47115:Aug 05 16:01:08.970117 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: ERROR nova.virt.libvirt.driver [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tempest-AttachVolumeNegativeTest-1260221999-project] Run out of retry while detaching device vdb with device alias virtio-disk1 from instance ce4b444f-1587-43e9-b708-d2ae2c9cd59f from the live domain config. Device is still attached to the guest. 5d6a7ec36f104f10a1f49fe44909aa23:.logsearch/5d6a7ec36f104f10a1f49fe44909aa23/controller/logs/screen-n-cpu.txt:47116:Aug 05 16:01:08.971007 ubuntu-focal-ovh-gra1-0025782813 nova-compute[104299]: WARNING nova.virt.block_device [None req-54714613-ee7e-4c9f-8e60-8251a77a526e tempest-AttachVolumeNegativeTest-1260221999 tempest-AttachVolumeNegativeTest-1260221999-project] [instance: ce4b444f-1587-43e9-b708-d2ae2c9cd59f] Guest refused to detach volume ff8d0867-1e92-4f04-8e52-01bf05f52ac3: nova.exception.DeviceDeta