[Yahoo-eng-team] [Bug 2063459] [NEW] DHCP agent might use default gateway of unrouted subnets

2024-04-25 Thread Sebastian Lohff
Public bug reported:

When creating a network with two subnets the DHCP agent will choose any
subnet that has a gateway_ip (which is allocated and set by default). In
cases where only one of the subnets is attached to a router (which the
DHCP agent needs to access resources outside of the internal network
like an upstream DNS server), the DHCP agent will still arbitrarily
choose one of the available subnets. If it chooses the subnet not on a
router, service will be disrupted for that network namespace. The sort
order of subnet is arbitrary, but stable, but if a subnet is being added
to a network the DHCP agent might also switch over to this network.

Problem was observed with Neutron Yoga, but can be reproduced with
current upstream code.

To fix this I would propose to sort subnet gateway selection to first use
subnets that have a port that matches the gateway ip of the subnet, as they'll 
have a higher likelihood of being connected.

** Affects: neutron
 Importance: Undecided
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2063459

Title:
  DHCP agent might use default gateway of unrouted subnets

Status in neutron:
  In Progress

Bug description:
  When creating a network with two subnets the DHCP agent will choose
  any subnet that has a gateway_ip (which is allocated and set by
  default). In cases where only one of the subnets is attached to a
  router (which the DHCP agent needs to access resources outside of the
  internal network like an upstream DNS server), the DHCP agent will
  still arbitrarily choose one of the available subnets. If it chooses
  the subnet not on a router, service will be disrupted for that network
  namespace. The sort order of subnet is arbitrary, but stable, but if a
  subnet is being added to a network the DHCP agent might also switch
  over to this network.

  Problem was observed with Neutron Yoga, but can be reproduced with
  current upstream code.

  To fix this I would propose to sort subnet gateway selection to first use
  subnets that have a port that matches the gateway ip of the subnet, as 
they'll have a higher likelihood of being connected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2063459/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2062009] [NEW] Neutron-server + uwsgi deadlocks whenr unning rpc workers

2024-04-17 Thread Sebastian Lohff
Public bug reported:

In certain situations we observe that neutron-server + uwsgi shares
locks between its native threads and its eventlet threads. As eventlet
relies on being informed when a lock is released, this may lead to a
deadlock, as the evenlet thread waits indefinitely for an already
released lock. In our infrastructure this leads to API requests being
performed on Neutron side, but the caller never gets a response. On
actions like port creations from e.g. Nova or Manila this will lead to
orphaned ports, as the implementation will just try again with creating
the port.

To better debug this we have reintroduced guru meditation reports into
neutron-server[0] and configured uwsgi to send a SIGWINCH on a
harakiri[1] to trigger the guru meditation whenever a uwsgi worker
deadlocks.

The two most interesting candidates seem to be a shared lock inside
oslo_messaging and python's logging lock, which seems to also be called
from oslo_messaging. Both cases identified by the traceback seem to
point to oslo_messaging and its RPC Server (see attached guru
meditation).

As all RPC Servers should run inside neutron-rpc-server anyway (due to
the uwsgi/neutron-rpc-server split) we should move these instances over
there. This will also fix #1864418. One easy way to find instances of
this would be to check via backdoor (or a manual manhole installation,
if backdoor is not available) and search instances of
oslo_messaging.server.MessageHandlingServer via fo(). In our setup (due
to the service_plugins enabled) we see rpc servers running from trunk
and logapi:

>>> [ep for mhs in fo(oslo_messaging.server.MessageHandlingServer) for ep in 
>>> mhs.dispatcher.endpoints]
[, ]

The RPC servers should be started via start_rpc_listeners()

Nova has had similar problems with eventlet and logging in the past, see
here[2][3]. Tests done with Neutron Yoga (or our own brand
stable/yoga-m3), but issue is present in current master.

[0] 
https://github.com/sapcc/neutron/commit/a7c44263b70089d8106bed6d8d5d0e3ddf44d5ad
[1] 
https://github.com/sapcc/helm-charts/blob/7a93e91c3af16ad2eb91e0a1d176d56a26faa393/openstack/neutron/templates/etc/_uwsgi.ini.tpl#L46-L50
[2] 
https://github.com/sapcc/nova/blob/f61bd589796f0cd7ae37683de3d676e5edd9cf80/nova/virt/libvirt/host.py#L197-L201
[3] 
https://github.com/sapcc/nova/blob/f61bd589796f0cd7ae37683de3d676e5edd9cf80/nova/virt/libvirt/migration.py#L406-L407

** Affects: neutron
 Importance: Undecided
 Status: New

** Attachment added: "guru-meditation-report.txt"
   
https://bugs.launchpad.net/bugs/2062009/+attachment/5766806/+files/guru-meditation-report.txt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2062009

Title:
  Neutron-server + uwsgi deadlocks whenr unning rpc workers

Status in neutron:
  New

Bug description:
  In certain situations we observe that neutron-server + uwsgi shares
  locks between its native threads and its eventlet threads. As eventlet
  relies on being informed when a lock is released, this may lead to a
  deadlock, as the evenlet thread waits indefinitely for an already
  released lock. In our infrastructure this leads to API requests being
  performed on Neutron side, but the caller never gets a response. On
  actions like port creations from e.g. Nova or Manila this will lead to
  orphaned ports, as the implementation will just try again with
  creating the port.

  To better debug this we have reintroduced guru meditation reports into
  neutron-server[0] and configured uwsgi to send a SIGWINCH on a
  harakiri[1] to trigger the guru meditation whenever a uwsgi worker
  deadlocks.

  The two most interesting candidates seem to be a shared lock inside
  oslo_messaging and python's logging lock, which seems to also be
  called from oslo_messaging. Both cases identified by the traceback
  seem to point to oslo_messaging and its RPC Server (see attached guru
  meditation).

  As all RPC Servers should run inside neutron-rpc-server anyway (due to
  the uwsgi/neutron-rpc-server split) we should move these instances
  over there. This will also fix #1864418. One easy way to find
  instances of this would be to check via backdoor (or a manual manhole
  installation, if backdoor is not available) and search instances of
  oslo_messaging.server.MessageHandlingServer via fo(). In our setup
  (due to the service_plugins enabled) we see rpc servers running from
  trunk and logapi:

  >>> [ep for mhs in fo(oslo_messaging.server.MessageHandlingServer) for ep in 
mhs.dispatcher.endpoints]
  [, ]

  The RPC servers should be started via start_rpc_listeners()

  Nova has had similar problems with eventlet and logging in the past,
  see here[2][3]. Tests done with Neutron Yoga (or our own brand
  stable/yoga-m3), but issue is present in current master.

  [0] 
https://github.com/sapcc/neutron/commit/a7c44263b70089d8106bed6d8d5d0e3ddf44d5ad
  [1] 

[Yahoo-eng-team] [Bug 2057698] [NEW] Concurrent routerroute update fails on deletion with AttributeError

2024-03-12 Thread Sebastian Lohff
Public bug reported:

When updating a router and providing a set of extra routes /
routerroutes that result in some routes being deleted, it might happen
that two workers fetch the routes at the same time and then both try to
delete the route. As the route is fetched before deletion, in one of the
two workers the get_object() will return None, on which delete() is
called, resulting in an AttributeError:

AttributeError: 'NoneType' object has no attribute 'delete'

The result is not fulfilled properly and a 500 is returned to the user.

This was observed on neutron yoga, though the same code (+ a breaking
test) seem to confirm this on current master.

** Affects: neutron
 Importance: Undecided
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2057698

Title:
  Concurrent routerroute update fails on deletion with AttributeError

Status in neutron:
  In Progress

Bug description:
  When updating a router and providing a set of extra routes /
  routerroutes that result in some routes being deleted, it might happen
  that two workers fetch the routes at the same time and then both try
  to delete the route. As the route is fetched before deletion, in one
  of the two workers the get_object() will return None, on which
  delete() is called, resulting in an AttributeError:

  AttributeError: 'NoneType' object has no attribute 'delete'

  The result is not fulfilled properly and a 500 is returned to the
  user.

  This was observed on neutron yoga, though the same code (+ a breaking
  test) seem to confirm this on current master.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2057698/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1998621] [NEW] dnsmasq on DHCP Agent does not listen on tcp/53 after dnsmasq restart

2022-12-02 Thread Sebastian Lohff
Public bug reported:

When talking to dnsmasq using DNS over tcp dnsmasq will fork out for TCP
connections. Forked processes will stay until all connections have been
closed, meaning that dangling connections will keep the processes and
with that will also keep the tcp/53 port in listening state. On dnsmasq
restart (e.g. on network update, subnet create, ...) the parent process
is killed with SIGKILL and a new process is started. This new process
cannot listen on tcp/53, as it is still in use by the old child with the
dangling connection.

This could be prevented by sending SIGTERM instead of SIGKILL, as
dnsmasq then does a proper cleanup of its forks and all tcp/53
connections are properly closed.

This only happens when starting the dnsmasq with --bind-dynamic, as with this 
flag dnsmasq will ignore any errors resulting form it not being able to bind on 
tcp/53, see here:
https://github.com/imp/dnsmasq/blob/f186bdcbc76cd894133a043b115b4510c0ee1fcf/src/network.c#L725-L726
The flag has been introduced here:
https://bugs.launchpad.net/neutron/+bug/1828473

** Affects: neutron
 Importance: Undecided
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1998621

Title:
  dnsmasq on DHCP Agent does not listen on tcp/53 after dnsmasq restart

Status in neutron:
  In Progress

Bug description:
  When talking to dnsmasq using DNS over tcp dnsmasq will fork out for
  TCP connections. Forked processes will stay until all connections have
  been closed, meaning that dangling connections will keep the processes
  and with that will also keep the tcp/53 port in listening state. On
  dnsmasq restart (e.g. on network update, subnet create, ...) the
  parent process is killed with SIGKILL and a new process is started.
  This new process cannot listen on tcp/53, as it is still in use by the
  old child with the dangling connection.

  This could be prevented by sending SIGTERM instead of SIGKILL, as
  dnsmasq then does a proper cleanup of its forks and all tcp/53
  connections are properly closed.

  This only happens when starting the dnsmasq with --bind-dynamic, as with this 
flag dnsmasq will ignore any errors resulting form it not being able to bind on 
tcp/53, see here:
  
https://github.com/imp/dnsmasq/blob/f186bdcbc76cd894133a043b115b4510c0ee1fcf/src/network.c#L725-L726
  The flag has been introduced here:
  https://bugs.launchpad.net/neutron/+bug/1828473

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1998621/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1959699] [NEW] Disallow users to allocate gateway ip of external subnets as floating ip

2022-02-01 Thread Sebastian Lohff
Public bug reported:

Currently a user can allocate the gateway ip of an external network as a
floating ip. This is possible, as the only validation on a user
specified ip address is done by the ipam module, which checks that an ip
is in the range of the subnet(s) and that it is not already allocated.
Because OpenStack has no port for the external gateway the subnet of an
external network is marked as free.

This is a problem because now a user can allocate an IP address that
might be otherwise in use (externally of OpenStack / inside a provider
network). Depending on the network plugins used, the user could either
end up with an unusable floating ip or (in the worst case) create
something that arps for this IP and redirects traffic away from the
original gateway, causing an outage. Therefore I propose we forbid users
from allocating floatingips that are also the gateway ip in a floating
ip network. Note that OpenStack would not allocate the gateway ip
itself, as it only allocates from the subnet's allocation pool by
default.

To fix this I'd propose we either explicitly deny using the gateway ip
or require the user-specified IP for a subnet to be from the allocation
pool. I'd be happy to provide a patch once we have decided how to
approach this.

This can be recreated with a simple cli command: openstack floating ip
create $fip_network --floating-ip-address $gateway_ip_of_subnet

A similar bug was filed and fixed for putting routers into provider
networks: https://bugs.launchpad.net/neutron/+bug/1757482

Breaking testcase (neutron/tests/unit/extensions/test_l3.py):
class L3NatTestCaseBase(L3NatTestCaseMixin):
def test_create_floatingip_on_external_subnet_gateway_fails(self):  

with self.subnet(cidr='11.0.0.0/24') as public_sub: 

self._set_net_external(public_sub['subnet']['network_id'])  

self._make_floatingip(  
  
self.fmt,   

public_sub['subnet']['network_id'], 

floating_ip=public_sub['subnet']['gateway_ip'], 

http_status=exc.HTTPBadRequest.code)

Preliminary discussion in IRC:
https://meetings.opendev.org/irclogs/%23openstack-neutron/%23openstack-neutron.2022-02-01.log.html#t2022-02-01T15:02:10

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1959699

Title:
  Disallow users to allocate gateway ip of external subnets as floating
  ip

Status in neutron:
  New

Bug description:
  Currently a user can allocate the gateway ip of an external network as
  a floating ip. This is possible, as the only validation on a user
  specified ip address is done by the ipam module, which checks that an
  ip is in the range of the subnet(s) and that it is not already
  allocated. Because OpenStack has no port for the external gateway the
  subnet of an external network is marked as free.

  This is a problem because now a user can allocate an IP address that
  might be otherwise in use (externally of OpenStack / inside a provider
  network). Depending on the network plugins used, the user could either
  end up with an unusable floating ip or (in the worst case) create
  something that arps for this IP and redirects traffic away from the
  original gateway, causing an outage. Therefore I propose we forbid
  users from allocating floatingips that are also the gateway ip in a
  floating ip network. Note that OpenStack would not allocate the
  gateway ip itself, as it only allocates from the subnet's allocation
  pool by default.

  To fix this I'd propose we either explicitly deny using the gateway ip
  or require the user-specified IP for a subnet to be from the
  allocation pool. I'd be happy to provide a patch once we have decided
  how to approach this.

  This can be recreated with a simple cli command: openstack floating ip
  create $fip_network --floating-ip-address $gateway_ip_of_subnet

  A similar bug was filed and fixed for putting routers into provider
  networks: https://bugs.launchpad.net/neutron/+bug/1757482

  Breaking testcase (neutron/tests/unit/extensions/test_l3.py):
  class L3NatTestCaseBase(L3NatTestCaseMixin):
  def test_create_floatingip_on_external_subnet_gateway_fails(self):
  
  with self.subnet(cidr='11.0.0.0/24') as public_sub:   
  
  

[Yahoo-eng-team] [Bug 1926428] [NEW] allocate_dynamic_segment() returns different segment dicts if segment exists

2021-04-28 Thread Sebastian Lohff
Public bug reported:

neutron.plugins.ml2.managers.TypeManager.allocate_dynamic_segment()
returns a different segment dict describing the segment, depending upon
if the segment exists or not. If the segment already exists neutron
returns segments_db.get_dynamic_segment() which generated the dict via
neutron.db.segments_db._make_segment_dict(). If it does not exist it is
created and a dict is returned generated by a TypeDriver. In the
testcase below this is done via
VlanTypeDriver.allocate_tenant_segment(), which does not return a
network_id but a MTU instead.


class TestMultiSegmentNetworks(Ml2PluginV2TestCase):
   ...
def test_allocate_dynamic_segment_twice(self):
data = {'network': {'name': 'net1',
'tenant_id': 'tenant_one'}}
network_req = self.new_create_request('networks', data)
network = self.deserialize(self.fmt,
   network_req.get_response(self.api))
segment = {driver_api.NETWORK_TYPE: 'vlan',
   driver_api.PHYSICAL_NETWORK: 'physnet1'}
network_id = network['network']['id']

seg1 = self.driver.type_manager.allocate_dynamic_segment(
self.context, network_id, segment)
seg2 = self.driver.type_manager.allocate_dynamic_segment(
self.context, network_id, segment)
self.assertEqual(seg1, seg2)


Which results in this output:

testtools.matchers._impl.MismatchError: !=:
reference = {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce',
 'mtu': 1500,
 'network_type': 'vlan',
 'physical_network': 'physnet1',
 'segmentation_id': 83}
actual= {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce',
 'network_id': '9ac10cc3-d3d0-46f7-9f6b-31767fadacec',
 'network_type': 'vlan',
 'physical_network': 'physnet1',
 'segmentation_id': 83}

This was tested on current neutron master
(98c934ef6a8041bbd7b99ac49f53986798e8ef81).

The easiest way to fix this would probably be to just fetch the segment
again from the db if it was created. Then we also would not be at the
mercy of whatever the typedriver returns on create, though this would
result in an extra query on segment create. The other option I see would
be to make _make_segment_dict() public and let each TypeDriver use this,
though this would not work for externally written TypeDrivers.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1926428

Title:
  allocate_dynamic_segment() returns different segment dicts if segment
  exists

Status in neutron:
  New

Bug description:
  neutron.plugins.ml2.managers.TypeManager.allocate_dynamic_segment()
  returns a different segment dict describing the segment, depending
  upon if the segment exists or not. If the segment already exists
  neutron returns segments_db.get_dynamic_segment() which generated the
  dict via neutron.db.segments_db._make_segment_dict(). If it does not
  exist it is created and a dict is returned generated by a TypeDriver.
  In the testcase below this is done via
  VlanTypeDriver.allocate_tenant_segment(), which does not return a
  network_id but a MTU instead.

  
  class TestMultiSegmentNetworks(Ml2PluginV2TestCase):
 ...
  def test_allocate_dynamic_segment_twice(self):
  data = {'network': {'name': 'net1',
  'tenant_id': 'tenant_one'}}
  network_req = self.new_create_request('networks', data)
  network = self.deserialize(self.fmt,
 network_req.get_response(self.api))
  segment = {driver_api.NETWORK_TYPE: 'vlan',
 driver_api.PHYSICAL_NETWORK: 'physnet1'}
  network_id = network['network']['id']

  seg1 = self.driver.type_manager.allocate_dynamic_segment(
  self.context, network_id, segment)
  seg2 = self.driver.type_manager.allocate_dynamic_segment(
  self.context, network_id, segment)
  self.assertEqual(seg1, seg2)

  
  Which results in this output:

testtools.matchers._impl.MismatchError: !=:
  reference = {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce',
   'mtu': 1500,
   'network_type': 'vlan',
   'physical_network': 'physnet1',
   'segmentation_id': 83}
  actual= {'id': 'a5c92a94-e182-47fb-ae8f-fe9d75ef10ce',
   'network_id': '9ac10cc3-d3d0-46f7-9f6b-31767fadacec',
   'network_type': 'vlan',
   'physical_network': 'physnet1',
   'segmentation_id': 83}

  This was tested on current neutron master
  (98c934ef6a8041bbd7b99ac49f53986798e8ef81).

  The easiest way to fix this would probably be to just fetch the
  segment again from the db if it was created. Then we also would not be
  at the mercy of whatever the typedriver returns on create, though this
  would result in an extra query on segment create. The other option I
  see would be to make _make_segment_dict() public and let 

[Yahoo-eng-team] [Bug 1888666] [NEW] Disabling/enabling networks in neutron causes traffic loop with linxbridge agent

2020-07-23 Thread Sebastian Lohff
Public bug reported:

We observed traffic looping between two linuxbridge agents after a user
disabled and then reenabled the network. Disabling a network causes all
vethX interfaces to be cleaned from the bridge, but the physical
interface remains in the bridge. Disabling the network will also cause a
segment release for the segment the network agent is on in a
hierarchical port binding setup. When reenabling the network a new VLAN
id might be generated for the segment the network agent is on and thus
the physical interface will be added with the new VLAN id. With two
network agents bridging two different VLANs we get a loop.

Quick mitigation would be to identify the bridge with two physical
interfaces in it, identify the stale interface and remove it.

Tested with Neutron Queens. Network was disabled/enabled via openstack
network set --disable/--enable $uuid

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: linuxbridge

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1888666

Title:
  Disabling/enabling networks in neutron causes traffic loop with
  linxbridge agent

Status in neutron:
  New

Bug description:
  We observed traffic looping between two linuxbridge agents after a
  user disabled and then reenabled the network. Disabling a network
  causes all vethX interfaces to be cleaned from the bridge, but the
  physical interface remains in the bridge. Disabling the network will
  also cause a segment release for the segment the network agent is on
  in a hierarchical port binding setup. When reenabling the network a
  new VLAN id might be generated for the segment the network agent is on
  and thus the physical interface will be added with the new VLAN id.
  With two network agents bridging two different VLANs we get a loop.

  Quick mitigation would be to identify the bridge with two physical
  interfaces in it, identify the stale interface and remove it.

  Tested with Neutron Queens. Network was disabled/enabled via openstack
  network set --disable/--enable $uuid

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1888666/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1827363] [NEW] Additional port list / get_ports() failures when filtering and limiting at the same time

2019-05-02 Thread Sebastian Lohff
Public bug reported:

When doing a openstack port list that filters for a fixed-ip/subnet and
at the same time limits the amount of results neutron returns a 500
internal server error. This was already addressed in
https://bugs.launchpad.net/neutron/+bug/1826186 but this bug is also
present in other places.

While running tempest against a Neutron Queens installation I came
across another _get_ports_query() in neutron/plugins/ml2/plugin.py where
filter is again called onto the result of an already limited query.

See
https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/plugins/ml2/plugin.py#L2206

InvalidRequestError: Query.filter() being called on a Query which already has 
LIMIT or OFFSET applied. To modify the row-limited results of a Query, call 
from_self() first. Otherwise, call filter() before limit() or offset() are 
applied.
  File "pecan/core.py", line 683, in __call__
self.invoke_controller(controller, args, kwargs, state)
[...]
  File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports
page_reverse=page_reverse)
  File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query
query = query.filter(substr_filter)
  File "", line 2, in filter
  File "sqlalchemy/orm/base.py", line 200, in generate
assertion(self, fn.__name__)
  File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset
% (meth, meth)

I applied a patch similar to the one Gabriele Cerami proposed in
https://review.opendev.org/#/c/656066/ on our production setup and this
seems to have fixed the bug there as well.

When doing a grep for _get_ports_query() in the neutron codebase I find
a function with this name being called in neutron/db/dvr_mac_db.py in
get_ports_on_host_by_subnet(), I do not have a stacktrace or test for
that though.

See
https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/db/dvr_mac_db.py#L162

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  When doing a openstack port list that filters for a fixed-ip/subnet and
  at the same time limits the amount of results neutron returns a 500
  internal server error. This was already addressed in
  https://bugs.launchpad.net/neutron/+bug/1826186 but this bug is also
  present in other places.
  
  While running tempest against a Neutron Queens installation I came
  across another _get_ports_query() in neutron/plugins/ml2/plugin.py where
  filter is again called onto the result of an already limited query.
  
  See
  
https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/plugins/ml2/plugin.py#L2206
  
  InvalidRequestError: Query.filter() being called on a Query which already has 
LIMIT or OFFSET applied. To modify the row-limited results of a Query, call 
from_self() first. Otherwise, call filter() before limit() or offset() are 
applied.
-   File "pecan/core.py", line 683, in __call__
- self.invoke_controller(controller, args, kwargs, state)
+   File "pecan/core.py", line 683, in __call__
+ self.invoke_controller(controller, args, kwargs, state)
  [...]
-   File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports
- page_reverse=page_reverse)
-   File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query
- query = query.filter(substr_filter)
-   File "", line 2, in filter
-   File "sqlalchemy/orm/base.py", line 200, in generate
- assertion(self, fn.__name__)
-   File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset
- % (meth, meth)
+   File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports
+ page_reverse=page_reverse)
+   File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query
+ query = query.filter(substr_filter)
+   File "", line 2, in filter
+   File "sqlalchemy/orm/base.py", line 200, in generate
+ assertion(self, fn.__name__)
+   File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset
+ % (meth, meth)
  
  I applied a patch similar to the one Gabriele Cerami proposed in
- https://review.opendev.org/#/c/656066/
+ https://review.opendev.org/#/c/656066/ on our production setup and this
+ seems to have fixed the bug there as well.
  
  When doing a grep for _get_ports_query() in the neutron codebase I find
  a function with this name being called in neutron/db/dvr_mac_db.py in
  get_ports_on_host_by_subnet(), I do not have a stacktrace or test for
  that though.
  
  See
  
https://github.com/openstack/neutron/blob/6f4962dcf89aebf2552ee8ec0993c6389a953024/neutron/db/dvr_mac_db.py#L162

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1827363

Title:
  Additional  port list / get_ports() failures when filtering and
  limiting at the same time

Status in neutron:
  New

Bug description:
  When doing a openstack port list that filters for a fixed-ip/subnet
  and at the same time limits 

[Yahoo-eng-team] [Bug 1826186] [NEW] port list / get_ports() fails when filtering and limiting at the same time

2019-04-24 Thread Sebastian Lohff
Public bug reported:

When doing a openstack port list that filters for a fixed-ip/subnet and
at the same time limits the amount of results neutron returns a 500
internal server error.

Example command: openstack port list --fixed-ip ip-address=192.0.2.23
Limits should be applied automatically with a recent version of the 
openstacksdk with pagination turned on by default. Additionally, I attached a 
testcase that triggers this bug. This bug was found on neutron-queens, but the 
test-case also breaks current master (tested on commit id 
1214e59cc2d818f6fde9c3e24c7f26c50d2a8a74).

It looks like _get_ports_query() gets a query with pre-applied limits by
calling model_query.get_collection_query() and then tries to filter the
results, which triggers a sqlalchemy assertion that disallows filtering
after a limit has been applied.

The corresponding exception neutron exception would be the following:
InvalidRequestError: Query.filter() being called on a Query which already has 
LIMIT or OFFSET applied. To modify the row-limited results of a  Query, call 
from_self() first.  Otherwise, call filter() before limit() or offset() are 
applied.
  File "pecan/core.py", line 683, in __call__
self.invoke_controller(controller, args, kwargs, state)
  File "pecan/core.py", line 574, in invoke_controller
result = controller(*args, **kwargs)
  File "neutron/db/api.py", line 91, in wrapped
setattr(e, '_RETRY_EXCEEDED', True)
  File "oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
  File "oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "neutron/db/api.py", line 87, in wrapped
return f(*args, **kwargs)
  File "oslo_db/api.py", line 147, in wrapper
ectxt.value = e.inner_exc
  File "oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
  File "oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "oslo_db/api.py", line 135, in wrapper
return f(*args, **kwargs)
  File "neutron/db/api.py", line 126, in wrapped
LOG.debug("Retry wrapper got retriable exception: %s", e)
  File "oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
  File "oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "neutron/db/api.py", line 122, in wrapped
return f(*dup_args, **dup_kwargs)
  File "neutron/pecan_wsgi/controllers/utils.py", line 76, in wrapped
return f(*args, **kwargs)
  File "neutron/pecan_wsgi/controllers/resource.py", line 131, in index
return self.get(*args, **kwargs)
  File "neutron/pecan_wsgi/controllers/resource.py", line 141, in get
**query_params)}
  File "neutron/db/api.py", line 161, in wrapped
return method(*args, **kwargs)
  File "neutron/db/api.py", line 91, in wrapped
setattr(e, '_RETRY_EXCEEDED', True)
  File "oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
  File "oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "neutron/db/api.py", line 87, in wrapped
return f(*args, **kwargs)
  File "oslo_db/api.py", line 147, in wrapper
ectxt.value = e.inner_exc
  File "oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
  File "oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "oslo_db/api.py", line 135, in wrapper
return f(*args, **kwargs)
  File "neutron/db/api.py", line 126, in wrapped
LOG.debug("Retry wrapper got retriable exception: %s", e)
  File "oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
  File "oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "neutron/db/api.py", line 122, in wrapped
return f(*dup_args, **dup_kwargs)
  File "neutron/db/db_base_plugin_v2.py", line 1417, in get_ports
page_reverse=page_reverse)
  File "neutron/plugins/ml2/plugin.py", line 1941, in _get_ports_query
query = query.filter(substr_filter)
  File "", line 2, in filter
  File "sqlalchemy/orm/base.py", line 200, in generate
assertion(self, fn.__name__)
  File "sqlalchemy/orm/query.py", line 435, in _no_limit_offset
% (meth, meth)

** Affects: neutron
 Importance: Undecided
 Status: New

** Patch added: "neutron testcase for filtering and limiting on a get ports at 
the same time"
   
https://bugs.launchpad.net/bugs/1826186/+attachment/5258554/+files/test_list_ports_filtered_by_fixed_ip_with_limit.patch

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1826186

Title:
  port list / get_ports() fails when filtering and limiting at the same
  time

Status in neutron:
  New

Bug description:
  When doing a openstack port list that filters for a fixed-ip/subnet
  and at the same time limits