[Yahoo-eng-team] [Bug 2063459] [NEW] DHCP agent might use default gateway of unrouted subnets
Public bug reported: When creating a network with two subnets the DHCP agent will choose any subnet that has a gateway_ip (which is allocated and set by default). In cases where only one of the subnets is attached to a router (which the DHCP agent needs to access resources outside of the internal network like an upstream DNS server), the DHCP agent will still arbitrarily choose one of the available subnets. If it chooses the subnet not on a router, service will be disrupted for that network namespace. The sort order of subnet is arbitrary, but stable, but if a subnet is being added to a network the DHCP agent might also switch over to this network. Problem was observed with Neutron Yoga, but can be reproduced with current upstream code. To fix this I would propose to sort subnet gateway selection to first use subnets that have a port that matches the gateway ip of the subnet, as they'll have a higher likelihood of being connected. ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2063459 Title: DHCP agent might use default gateway of unrouted subnets Status in neutron: In Progress Bug description: When creating a network with two subnets the DHCP agent will choose any subnet that has a gateway_ip (which is allocated and set by default). In cases where only one of the subnets is attached to a router (which the DHCP agent needs to access resources outside of the internal network like an upstream DNS server), the DHCP agent will still arbitrarily choose one of the available subnets. If it chooses the subnet not on a router, service will be disrupted for that network namespace. The sort order of subnet is arbitrary, but stable, but if a subnet is being added to a network the DHCP agent might also switch over to this network. Problem was observed with Neutron Yoga, but can be reproduced with current upstream code. To fix this I would propose to sort subnet gateway selection to first use subnets that have a port that matches the gateway ip of the subnet, as they'll have a higher likelihood of being connected. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2063459/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2062009] [NEW] Neutron-server + uwsgi deadlocks whenr unning rpc workers
Public bug reported: In certain situations we observe that neutron-server + uwsgi shares locks between its native threads and its eventlet threads. As eventlet relies on being informed when a lock is released, this may lead to a deadlock, as the evenlet thread waits indefinitely for an already released lock. In our infrastructure this leads to API requests being performed on Neutron side, but the caller never gets a response. On actions like port creations from e.g. Nova or Manila this will lead to orphaned ports, as the implementation will just try again with creating the port. To better debug this we have reintroduced guru meditation reports into neutron-server[0] and configured uwsgi to send a SIGWINCH on a harakiri[1] to trigger the guru meditation whenever a uwsgi worker deadlocks. The two most interesting candidates seem to be a shared lock inside oslo_messaging and python's logging lock, which seems to also be called from oslo_messaging. Both cases identified by the traceback seem to point to oslo_messaging and its RPC Server (see attached guru meditation). As all RPC Servers should run inside neutron-rpc-server anyway (due to the uwsgi/neutron-rpc-server split) we should move these instances over there. This will also fix #1864418. One easy way to find instances of this would be to check via backdoor (or a manual manhole installation, if backdoor is not available) and search instances of oslo_messaging.server.MessageHandlingServer via fo(). In our setup (due to the service_plugins enabled) we see rpc servers running from trunk and logapi: >>> [ep for mhs in fo(oslo_messaging.server.MessageHandlingServer) for ep in >>> mhs.dispatcher.endpoints] [, ] The RPC servers should be started via start_rpc_listeners() Nova has had similar problems with eventlet and logging in the past, see here[2][3]. Tests done with Neutron Yoga (or our own brand stable/yoga-m3), but issue is present in current master. [0] https://github.com/sapcc/neutron/commit/a7c44263b70089d8106bed6d8d5d0e3ddf44d5ad [1] https://github.com/sapcc/helm-charts/blob/7a93e91c3af16ad2eb91e0a1d176d56a26faa393/openstack/neutron/templates/etc/_uwsgi.ini.tpl#L46-L50 [2] https://github.com/sapcc/nova/blob/f61bd589796f0cd7ae37683de3d676e5edd9cf80/nova/virt/libvirt/host.py#L197-L201 [3] https://github.com/sapcc/nova/blob/f61bd589796f0cd7ae37683de3d676e5edd9cf80/nova/virt/libvirt/migration.py#L406-L407 ** Affects: neutron Importance: Undecided Status: New ** Attachment added: "guru-meditation-report.txt" https://bugs.launchpad.net/bugs/2062009/+attachment/5766806/+files/guru-meditation-report.txt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2062009 Title: Neutron-server + uwsgi deadlocks whenr unning rpc workers Status in neutron: New Bug description: In certain situations we observe that neutron-server + uwsgi shares locks between its native threads and its eventlet threads. As eventlet relies on being informed when a lock is released, this may lead to a deadlock, as the evenlet thread waits indefinitely for an already released lock. In our infrastructure this leads to API requests being performed on Neutron side, but the caller never gets a response. On actions like port creations from e.g. Nova or Manila this will lead to orphaned ports, as the implementation will just try again with creating the port. To better debug this we have reintroduced guru meditation reports into neutron-server[0] and configured uwsgi to send a SIGWINCH on a harakiri[1] to trigger the guru meditation whenever a uwsgi worker deadlocks. The two most interesting candidates seem to be a shared lock inside oslo_messaging and python's logging lock, which seems to also be called from oslo_messaging. Both cases identified by the traceback seem to point to oslo_messaging and its RPC Server (see attached guru meditation). As all RPC Servers should run inside neutron-rpc-server anyway (due to the uwsgi/neutron-rpc-server split) we should move these instances over there. This will also fix #1864418. One easy way to find instances of this would be to check via backdoor (or a manual manhole installation, if backdoor is not available) and search instances of oslo_messaging.server.MessageHandlingServer via fo(). In our setup (due to the service_plugins enabled) we see rpc servers running from trunk and logapi: >>> [ep for mhs in fo(oslo_messaging.server.MessageHandlingServer) for ep in mhs.dispatcher.endpoints] [, ] The RPC servers should be started via start_rpc_listeners() Nova has had similar problems with eventlet and logging in the past, see here[2][3]. Tests done with Neutron Yoga (or our own brand stable/yoga-m3), but issue is present in current master. [0] https://github.com/sapcc/neutron/commit/a7c44263b70089d8106bed6d8d5d0e3ddf44d5ad [1]
[Yahoo-eng-team] [Bug 2057698] [NEW] Concurrent routerroute update fails on deletion with AttributeError
Public bug reported: When updating a router and providing a set of extra routes / routerroutes that result in some routes being deleted, it might happen that two workers fetch the routes at the same time and then both try to delete the route. As the route is fetched before deletion, in one of the two workers the get_object() will return None, on which delete() is called, resulting in an AttributeError: AttributeError: 'NoneType' object has no attribute 'delete' The result is not fulfilled properly and a 500 is returned to the user. This was observed on neutron yoga, though the same code (+ a breaking test) seem to confirm this on current master. ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2057698 Title: Concurrent routerroute update fails on deletion with AttributeError Status in neutron: In Progress Bug description: When updating a router and providing a set of extra routes / routerroutes that result in some routes being deleted, it might happen that two workers fetch the routes at the same time and then both try to delete the route. As the route is fetched before deletion, in one of the two workers the get_object() will return None, on which delete() is called, resulting in an AttributeError: AttributeError: 'NoneType' object has no attribute 'delete' The result is not fulfilled properly and a 500 is returned to the user. This was observed on neutron yoga, though the same code (+ a breaking test) seem to confirm this on current master. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2057698/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1998621] [NEW] dnsmasq on DHCP Agent does not listen on tcp/53 after dnsmasq restart
Public bug reported: When talking to dnsmasq using DNS over tcp dnsmasq will fork out for TCP connections. Forked processes will stay until all connections have been closed, meaning that dangling connections will keep the processes and with that will also keep the tcp/53 port in listening state. On dnsmasq restart (e.g. on network update, subnet create, ...) the parent process is killed with SIGKILL and a new process is started. This new process cannot listen on tcp/53, as it is still in use by the old child with the dangling connection. This could be prevented by sending SIGTERM instead of SIGKILL, as dnsmasq then does a proper cleanup of its forks and all tcp/53 connections are properly closed. This only happens when starting the dnsmasq with --bind-dynamic, as with this flag dnsmasq will ignore any errors resulting form it not being able to bind on tcp/53, see here: https://github.com/imp/dnsmasq/blob/f186bdcbc76cd894133a043b115b4510c0ee1fcf/src/network.c#L725-L726 The flag has been introduced here: https://bugs.launchpad.net/neutron/+bug/1828473 ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1998621 Title: dnsmasq on DHCP Agent does not listen on tcp/53 after dnsmasq restart Status in neutron: In Progress Bug description: When talking to dnsmasq using DNS over tcp dnsmasq will fork out for TCP connections. Forked processes will stay until all connections have been closed, meaning that dangling connections will keep the processes and with that will also keep the tcp/53 port in listening state. On dnsmasq restart (e.g. on network update, subnet create, ...) the parent process is killed with SIGKILL and a new process is started. This new process cannot listen on tcp/53, as it is still in use by the old child with the dangling connection. This could be prevented by sending SIGTERM instead of SIGKILL, as dnsmasq then does a proper cleanup of its forks and all tcp/53 connections are properly closed. This only happens when starting the dnsmasq with --bind-dynamic, as with this flag dnsmasq will ignore any errors resulting form it not being able to bind on tcp/53, see here: https://github.com/imp/dnsmasq/blob/f186bdcbc76cd894133a043b115b4510c0ee1fcf/src/network.c#L725-L726 The flag has been introduced here: https://bugs.launchpad.net/neutron/+bug/1828473 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1998621/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1959699] [NEW] Disallow users to allocate gateway ip of external subnets as floating ip
Public bug reported: Currently a user can allocate the gateway ip of an external network as a floating ip. This is possible, as the only validation on a user specified ip address is done by the ipam module, which checks that an ip is in the range of the subnet(s) and that it is not already allocated. Because OpenStack has no port for the external gateway the subnet of an external network is marked as free. This is a problem because now a user can allocate an IP address that might be otherwise in use (externally of OpenStack / inside a provider network). Depending on the network plugins used, the user could either end up with an unusable floating ip or (in the worst case) create something that arps for this IP and redirects traffic away from the original gateway, causing an outage. Therefore I propose we forbid users from allocating floatingips that are also the gateway ip in a floating ip network. Note that OpenStack would not allocate the gateway ip itself, as it only allocates from the subnet's allocation pool by default. To fix this I'd propose we either explicitly deny using the gateway ip or require the user-specified IP for a subnet to be from the allocation pool. I'd be happy to provide a patch once we have decided how to approach this. This can be recreated with a simple cli command: openstack floating ip create $fip_network --floating-ip-address $gateway_ip_of_subnet A similar bug was filed and fixed for putting routers into provider networks: https://bugs.launchpad.net/neutron/+bug/1757482 Breaking testcase (neutron/tests/unit/extensions/test_l3.py): class L3NatTestCaseBase(L3NatTestCaseMixin): def test_create_floatingip_on_external_subnet_gateway_fails(self): with self.subnet(cidr='11.0.0.0/24') as public_sub: self._set_net_external(public_sub['subnet']['network_id']) self._make_floatingip( self.fmt, public_sub['subnet']['network_id'], floating_ip=public_sub['subnet']['gateway_ip'], http_status=exc.HTTPBadRequest.code) Preliminary discussion in IRC: https://meetings.opendev.org/irclogs/%23openstack-neutron/%23openstack-neutron.2022-02-01.log.html#t2022-02-01T15:02:10 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1959699 Title: Disallow users to allocate gateway ip of external subnets as floating ip Status in neutron: New Bug description: Currently a user can allocate the gateway ip of an external network as a floating ip. This is possible, as the only validation on a user specified ip address is done by the ipam module, which checks that an ip is in the range of the subnet(s) and that it is not already allocated. Because OpenStack has no port for the external gateway the subnet of an external network is marked as free. This is a problem because now a user can allocate an IP address that might be otherwise in use (externally of OpenStack / inside a provider network). Depending on the network plugins used, the user could either end up with an unusable floating ip or (in the worst case) create something that arps for this IP and redirects traffic away from the original gateway, causing an outage. Therefore I propose we forbid users from allocating floatingips that are also the gateway ip in a floating ip network. Note that OpenStack would not allocate the gateway ip itself, as it only allocates from the subnet's allocation pool by default. To fix this I'd propose we either explicitly deny using the gateway ip or require the user-specified IP for a subnet to be from the allocation pool. I'd be happy to provide a patch once we have decided how to approach this. This can be recreated with a simple cli command: openstack floating ip create $fip_network --floating-ip-address $gateway_ip_of_subnet A similar bug was filed and fixed for putting routers into provider networks: https://bugs.launchpad.net/neutron/+bug/1757482 Breaking testcase (neutron/tests/unit/extensions/test_l3.py): class L3NatTestCaseBase(L3NatTestCaseMixin): def test_create_floatingip_on_external_subnet_gateway_fails(self): with self.subnet(cidr='11.0.0.0/24') as public_sub:
[Yahoo-eng-team] [Bug 1926428] [NEW] allocate_dynamic_segment() returns different segment dicts if segment exists
Public bug reported: neutron.plugins.ml2.managers.TypeManager.allocate_dynamic_segment() returns a different segment dict describing the segment, depending upon if the segment exists or not. If the segment already exists neutron returns segments_db.get_dynamic_segment() which generated the dict via neutron.db.segments_db._make_segment_dict(). If it does not exist it is created and a dict is returned generated by a TypeDriver. In the testcase below this is done via VlanTypeDriver.allocate_tenant_segment(), which does not return a network_id but a MTU instead. class TestMultiSegmentNetworks(Ml2PluginV2TestCase): ... def test_allocate_dynamic_segment_twice(self): data = {'network': {'name': 'net1', 'tenant_id': 'tenant_one'}} network_req = self.new_create_request('networks', data) network = self.deserialize(self.fmt, network_req.get_response(self.api)) segment = {driver_api.NETWORK_TYPE: 'vlan', driver_api.PHYSICAL_NETWORK: 'physnet1'} network_id = network['network']['id'] seg1 = self.driver.type_manager.allocate_dynamic_segment( self.context, network_id, segment) seg2 = self.driver.type_manager.allocate_dynamic_segment( self.context, network_id, segment) self.assertEqual(seg1, seg2) Which results in this output: testtools.matchers._impl.MismatchError: !=: reference = {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce', 'mtu': 1500, 'network_type': 'vlan', 'physical_network': 'physnet1', 'segmentation_id': 83} actual= {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce', 'network_id': '9ac10cc3-d3d0-46f7-9f6b-31767fadacec', 'network_type': 'vlan', 'physical_network': 'physnet1', 'segmentation_id': 83} This was tested on current neutron master (98c934ef6a8041bbd7b99ac49f53986798e8ef81). The easiest way to fix this would probably be to just fetch the segment again from the db if it was created. Then we also would not be at the mercy of whatever the typedriver returns on create, though this would result in an extra query on segment create. The other option I see would be to make _make_segment_dict() public and let each TypeDriver use this, though this would not work for externally written TypeDrivers. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1926428 Title: allocate_dynamic_segment() returns different segment dicts if segment exists Status in neutron: New Bug description: neutron.plugins.ml2.managers.TypeManager.allocate_dynamic_segment() returns a different segment dict describing the segment, depending upon if the segment exists or not. If the segment already exists neutron returns segments_db.get_dynamic_segment() which generated the dict via neutron.db.segments_db._make_segment_dict(). If it does not exist it is created and a dict is returned generated by a TypeDriver. In the testcase below this is done via VlanTypeDriver.allocate_tenant_segment(), which does not return a network_id but a MTU instead. class TestMultiSegmentNetworks(Ml2PluginV2TestCase): ... def test_allocate_dynamic_segment_twice(self): data = {'network': {'name': 'net1', 'tenant_id': 'tenant_one'}} network_req = self.new_create_request('networks', data) network = self.deserialize(self.fmt, network_req.get_response(self.api)) segment = {driver_api.NETWORK_TYPE: 'vlan', driver_api.PHYSICAL_NETWORK: 'physnet1'} network_id = network['network']['id'] seg1 = self.driver.type_manager.allocate_dynamic_segment( self.context, network_id, segment) seg2 = self.driver.type_manager.allocate_dynamic_segment( self.context, network_id, segment) self.assertEqual(seg1, seg2) Which results in this output: testtools.matchers._impl.MismatchError: !=: reference = {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce', 'mtu': 1500, 'network_type': 'vlan', 'physical_network': 'physnet1', 'segmentation_id': 83} actual= {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce', 'network_id': '9ac10cc3-d3d0-46f7-9f6b-31767fadacec', 'network_type': 'vlan', 'physical_network': 'physnet1', 'segmentation_id': 83} This was tested on current neutron master (98c934ef6a8041bbd7b99ac49f53986798e8ef81). The easiest way to fix this would probably be to just fetch the segment again from the db if it was created. Then we also would not be at the mercy of whatever the typedriver returns on create, though this would result in an extra query on segment create. The other option I see would be to make _make_segment_dict() public and let
[Yahoo-eng-team] [Bug 1888666] [NEW] Disabling/enabling networks in neutron causes traffic loop with linxbridge agent
Public bug reported: We observed traffic looping between two linuxbridge agents after a user disabled and then reenabled the network. Disabling a network causes all vethX interfaces to be cleaned from the bridge, but the physical interface remains in the bridge. Disabling the network will also cause a segment release for the segment the network agent is on in a hierarchical port binding setup. When reenabling the network a new VLAN id might be generated for the segment the network agent is on and thus the physical interface will be added with the new VLAN id. With two network agents bridging two different VLANs we get a loop. Quick mitigation would be to identify the bridge with two physical interfaces in it, identify the stale interface and remove it. Tested with Neutron Queens. Network was disabled/enabled via openstack network set --disable/--enable $uuid ** Affects: neutron Importance: Undecided Status: New ** Tags: linuxbridge -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1888666 Title: Disabling/enabling networks in neutron causes traffic loop with linxbridge agent Status in neutron: New Bug description: We observed traffic looping between two linuxbridge agents after a user disabled and then reenabled the network. Disabling a network causes all vethX interfaces to be cleaned from the bridge, but the physical interface remains in the bridge. Disabling the network will also cause a segment release for the segment the network agent is on in a hierarchical port binding setup. When reenabling the network a new VLAN id might be generated for the segment the network agent is on and thus the physical interface will be added with the new VLAN id. With two network agents bridging two different VLANs we get a loop. Quick mitigation would be to identify the bridge with two physical interfaces in it, identify the stale interface and remove it. Tested with Neutron Queens. Network was disabled/enabled via openstack network set --disable/--enable $uuid To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1888666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1827363] [NEW] Additional port list / get_ports() failures when filtering and limiting at the same time
Public bug reported: When doing a openstack port list that filters for a fixed-ip/subnet and at the same time limits the amount of results neutron returns a 500 internal server error. This was already addressed in https://bugs.launchpad.net/neutron/+bug/1826186 but this bug is also present in other places. While running tempest against a Neutron Queens installation I came across another _get_ports_query() in neutron/plugins/ml2/plugin.py where filter is again called onto the result of an already limited query. See https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/plugins/ml2/plugin.py#L2206 InvalidRequestError: Query.filter() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first. Otherwise, call filter() before limit() or offset() are applied. File "pecan/core.py", line 683, in __call__ self.invoke_controller(controller, args, kwargs, state) [...] File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports page_reverse=page_reverse) File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query query = query.filter(substr_filter) File "", line 2, in filter File "sqlalchemy/orm/base.py", line 200, in generate assertion(self, fn.__name__) File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset % (meth, meth) I applied a patch similar to the one Gabriele Cerami proposed in https://review.opendev.org/#/c/656066/ on our production setup and this seems to have fixed the bug there as well. When doing a grep for _get_ports_query() in the neutron codebase I find a function with this name being called in neutron/db/dvr_mac_db.py in get_ports_on_host_by_subnet(), I do not have a stacktrace or test for that though. See https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/db/dvr_mac_db.py#L162 ** Affects: neutron Importance: Undecided Status: New ** Description changed: When doing a openstack port list that filters for a fixed-ip/subnet and at the same time limits the amount of results neutron returns a 500 internal server error. This was already addressed in https://bugs.launchpad.net/neutron/+bug/1826186 but this bug is also present in other places. While running tempest against a Neutron Queens installation I came across another _get_ports_query() in neutron/plugins/ml2/plugin.py where filter is again called onto the result of an already limited query. See https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/plugins/ml2/plugin.py#L2206 InvalidRequestError: Query.filter() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first. Otherwise, call filter() before limit() or offset() are applied. - File "pecan/core.py", line 683, in __call__ - self.invoke_controller(controller, args, kwargs, state) + File "pecan/core.py", line 683, in __call__ + self.invoke_controller(controller, args, kwargs, state) [...] - File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports - page_reverse=page_reverse) - File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query - query = query.filter(substr_filter) - File "", line 2, in filter - File "sqlalchemy/orm/base.py", line 200, in generate - assertion(self, fn.__name__) - File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset - % (meth, meth) + File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports + page_reverse=page_reverse) + File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query + query = query.filter(substr_filter) + File "", line 2, in filter + File "sqlalchemy/orm/base.py", line 200, in generate + assertion(self, fn.__name__) + File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset + % (meth, meth) I applied a patch similar to the one Gabriele Cerami proposed in - https://review.opendev.org/#/c/656066/ + https://review.opendev.org/#/c/656066/ on our production setup and this + seems to have fixed the bug there as well. When doing a grep for _get_ports_query() in the neutron codebase I find a function with this name being called in neutron/db/dvr_mac_db.py in get_ports_on_host_by_subnet(), I do not have a stacktrace or test for that though. See https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/db/dvr_mac_db.py#L162 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1827363 Title: Additional port list / get_ports() failures when filtering and limiting at the same time Status in neutron: New Bug description: When doing a openstack port list that filters for a fixed-ip/subnet and at the same time limits
[Yahoo-eng-team] [Bug 1826186] [NEW] port list / get_ports() fails when filtering and limiting at the same time
Public bug reported: When doing a openstack port list that filters for a fixed-ip/subnet and at the same time limits the amount of results neutron returns a 500 internal server error. Example command: openstack port list --fixed-ip ip-address=192.0.2.23 Limits should be applied automatically with a recent version of the openstacksdk with pagination turned on by default. Additionally, I attached a testcase that triggers this bug. This bug was found on neutron-queens, but the test-case also breaks current master (tested on commit id 1214e59cc2d818f6fde9c3e24c7f26c50d2a8a74). It looks like _get_ports_query() gets a query with pre-applied limits by calling model_query.get_collection_query() and then tries to filter the results, which triggers a sqlalchemy assertion that disallows filtering after a limit has been applied. The corresponding exception neutron exception would be the following: InvalidRequestError: Query.filter() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first. Otherwise, call filter() before limit() or offset() are applied. File "pecan/core.py", line 683, in __call__ self.invoke_controller(controller, args, kwargs, state) File "pecan/core.py", line 574, in invoke_controller result = controller(*args, **kwargs) File "neutron/db/api.py", line 91, in wrapped setattr(e, '_RETRY_EXCEEDED', True) File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "neutron/db/api.py", line 87, in wrapped return f(*args, **kwargs) File "oslo_db/api.py", line 147, in wrapper ectxt.value = e.inner_exc File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "oslo_db/api.py", line 135, in wrapper return f(*args, **kwargs) File "neutron/db/api.py", line 126, in wrapped LOG.debug("Retry wrapper got retriable exception: %s", e) File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "neutron/db/api.py", line 122, in wrapped return f(*dup_args, **dup_kwargs) File "neutron/pecan_wsgi/controllers/utils.py", line 76, in wrapped return f(*args, **kwargs) File "neutron/pecan_wsgi/controllers/resource.py", line 131, in index return self.get(*args, **kwargs) File "neutron/pecan_wsgi/controllers/resource.py", line 141, in get **query_params)} File "neutron/db/api.py", line 161, in wrapped return method(*args, **kwargs) File "neutron/db/api.py", line 91, in wrapped setattr(e, '_RETRY_EXCEEDED', True) File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "neutron/db/api.py", line 87, in wrapped return f(*args, **kwargs) File "oslo_db/api.py", line 147, in wrapper ectxt.value = e.inner_exc File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "oslo_db/api.py", line 135, in wrapper return f(*args, **kwargs) File "neutron/db/api.py", line 126, in wrapped LOG.debug("Retry wrapper got retriable exception: %s", e) File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "neutron/db/api.py", line 122, in wrapped return f(*dup_args, **dup_kwargs) File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports page_reverse=page_reverse) File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query query = query.filter(substr_filter) File "", line 2, in filter File "sqlalchemy/orm/base.py", line 200, in generate assertion(self, fn.__name__) File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset % (meth, meth) ** Affects: neutron Importance: Undecided Status: New ** Patch added: "neutron testcase for filtering and limiting on a get ports at the same time" https://bugs.launchpad.net/bugs/1826186/+attachment/5258554/+files/test_list_ports_filtered_by_fixed_ip_with_limit.patch -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1826186 Title: port list / get_ports() fails when filtering and limiting at the same time Status in neutron: New Bug description: When doing a openstack port list that filters for a fixed-ip/subnet and at the same time limits