[Yahoo-eng-team] [Bug 2024481] Re: [ndr] neutron-bgp-dragent is racy when a service restart is made just before a speaker is added

2023-06-20 Thread Dmitrii Shcherbakov
** Description changed:

  Hit a race with the Antelope (22.0.0) version of NDR in one of our
  functional test runs:
  
  1) neutron-bgp-dragent got restarted right before creating a speaker and 
adding an external network and tenant network to it;
- 2) As can be seen in the service log below, just after neutron-bgp-dragent 
started, it tried to advertise a route (00:03:21.766) before a speaker got 
added to it (00:03:22.251).
+ 2) As can be seen in the service log below, just after neutron-bgp-dragent 
started, it tried to advertise a route (00:03:21.766) before a speaker got 
added to it (00:03:22.251) - it failed with the `BgpSpeakerNotAdded` exception:
  
https://github.com/openstack/neutron-dynamic-routing/blob/13e0d8a63dbdbd9e1a863144999794d4fc9af22d/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L150-L154
  
  3) As a result, the peer (FRR in our case) only got a floating IP route
  (/32) in the test result in the tenant network route (/24) was never
  advertised.
  
  Test steps (downstream) that generated the log lines: 
https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/configure/bgp_speaker.py#L65-L100
  The service restart is done prior to calling the test code above (notably, it 
was done as a workaround for something else but inadvertently helped to trigger 
this edge case):
  
https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/charm_tests/dragent/configure.py#L92-L103
  
  The lack of a route at the peer side can be seen at 2023-06-19 00:03:32 here:
  
https://openstack-ci-reports.ubuntu.com/artifacts/e4c/886157/8/check/jammy-antelope-ovn/e4c9b5d/job-output.txt
  2023-06-19 00:03:32.346994 | focal-medium |
  2023-06-19 00:03:32.347012 | focal-medium | B>* 100.64.0.144/32 [20/0] via 
172.16.27.207, ens3, weight 1, 00:00:07
  2023-06-19 00:03:32.347045 | focal-medium |
  
  Summary: It looks like neutron-bgp-dragent may try to advertise routes
  it gets from a DB before a speaker is added by it. It should properly
  make sure a speaker is present before trying to advertise routes. If
  speakers aren't scheduled to it yet, it should attempt to advertise as
  soon as one is present on it.
  
  ---
  
  Functional test log:
  
  2023-06-19 00:03:19.709430 | focal-medium | 2023-06-19 00:03:19 [INFO] 
Setting up BGP speaker
  2023-06-19 00:03:20.307141 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Creating BGP Speaker
  2023-06-19 00:03:20.434428 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Advertising BGP routes
  2023-06-19 00:03:20.678231 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Advertising ext_net network on BGP Speaker bgp-speaker
  2023-06-19 00:03:20.919232 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Advertising private network on BGP Speaker bgp-speaker
  2023-06-19 00:03:21.155337 | focal-medium | 2023-06-19 00:03:21 [INFO] 
Setting up BGP peer
  2023-06-19 00:03:22.099859 | focal-medium | 2023-06-19 00:03:22 [INFO] 
Creating BGP Peer
  2023-06-19 00:03:22.142524 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding 
BGP peer to BGP speaker
  2023-06-19 00:03:22.143374 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding 
peer osci-frr on BGP Speaker bgp-speaker
  2023-06-19 00:03:22.208265 | focal-medium | 2023-06-19 00:03:22 [INFO] 
Creating floating IP to advertise
  2023-06-19 00:03:22.301280 | focal-medium | 2023-06-19 00:03:22 [INFO] 
Creating port: NDR_TEST_FIP
  2023-06-19 00:03:23.599942 | focal-medium | 2023-06-19 00:03:23 [INFO] 
Creating floatingip
  2023-06-19 00:03:26.351808 | focal-medium | 2023-06-19 00:03:26 [INFO] 
Advertised floating IP: 100.64.0.144
  
  neutron-bgp-dragent.log:
  
  2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] Logging enabled!
  2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] 
/usr/bin/neutron-bgp-dragent version 22.0.0
  2023-06-19 00:03:21.533 26428 INFO 
neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] 
Initializing os-ken driver for BGP functionality.
  2023-06-19 00:03:21.533 26428 INFO 
neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] Initialized 
os-ken BGP Speaker driver interface with bgp_router_id=172.16.0.46
  2023-06-19 00:03:21.578 26428 INFO 
neutron_dynamic_routing.services.bgp.agent.bgp_dragent [-] BGP dynamic routing 
agent started
  2023-06-19 00:03:21.748 26428 INFO bgpspeaker.api.base [None 
req-3e563ce5-7b78-46d2-9dd3-02067da4e197 - - - - - -] API method core.start 
called with args: {'waiter': , 
'local_as': 4279238701, 'router_id': '172.16.0.46', 'bgp_server_hosts': 
('0.0.0.0', '::'), 'bgp_server_port': 0, 'refresh_stalepath_time': 0, 
'refresh_max_eor_time': 0, 'label_range': (100, 10), 
'allow_local_as_in_count': 0, 'cluster_id': None, 'local_pref': 100}
  2023-06-19 00:03:21.766 26428 ERROR 
neutron_dynamic_routing.services.bgp.agent.bgp_dragent [None 
req-f082fe6d-cb70-4761-b02d-19f38bda7ae2 - - 

[Yahoo-eng-team] [Bug 2024481] [NEW] [ndr] neutron-bgp-dragent is racy when a service restart is made just before a speaker is added

2023-06-20 Thread Dmitrii Shcherbakov
Public bug reported:

Hit a race with the Antelope (22.0.0) version of NDR in one of our
functional test runs:

1) neutron-bgp-dragent got restarted right before creating a speaker and adding 
an external network and tenant network to it;
2) As can be seen in the service log below, just after neutron-bgp-dragent 
started, it tried to advertise a route (00:03:21.766) before a speaker got 
added to it (00:03:22.251) - it failed with the `BgpSpeakerNotAdded` exception:
https://github.com/openstack/neutron-dynamic-routing/blob/13e0d8a63dbdbd9e1a863144999794d4fc9af22d/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L150-L154

3) As a result, the peer (FRR in our case) only got a floating IP route
(/32) in the test result in the tenant network route (/24) was never
advertised.

Test steps (downstream) that generated the log lines: 
https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/configure/bgp_speaker.py#L65-L100
The service restart is done prior to calling the test code above (notably, it 
was done as a workaround for something else but inadvertently helped to trigger 
this edge case):
https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/charm_tests/dragent/configure.py#L92-L103

The lack of a route at the peer side can be seen at 2023-06-19 00:03:32 here:
https://openstack-ci-reports.ubuntu.com/artifacts/e4c/886157/8/check/jammy-antelope-ovn/e4c9b5d/job-output.txt
2023-06-19 00:03:32.346994 | focal-medium |
2023-06-19 00:03:32.347012 | focal-medium | B>* 100.64.0.144/32 [20/0] via 
172.16.27.207, ens3, weight 1, 00:00:07
2023-06-19 00:03:32.347045 | focal-medium |

Summary: It looks like neutron-bgp-dragent may try to advertise routes
it gets from a DB before a speaker is added by it. It should properly
make sure a speaker is present before trying to advertise routes. If
speakers aren't scheduled to it yet, it should attempt to advertise as
soon as one is present on it.

---

Functional test log:

2023-06-19 00:03:19.709430 | focal-medium | 2023-06-19 00:03:19 [INFO] Setting 
up BGP speaker
2023-06-19 00:03:20.307141 | focal-medium | 2023-06-19 00:03:20 [INFO] Creating 
BGP Speaker
2023-06-19 00:03:20.434428 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Advertising BGP routes
2023-06-19 00:03:20.678231 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Advertising ext_net network on BGP Speaker bgp-speaker
2023-06-19 00:03:20.919232 | focal-medium | 2023-06-19 00:03:20 [INFO] 
Advertising private network on BGP Speaker bgp-speaker
2023-06-19 00:03:21.155337 | focal-medium | 2023-06-19 00:03:21 [INFO] Setting 
up BGP peer
2023-06-19 00:03:22.099859 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating 
BGP Peer
2023-06-19 00:03:22.142524 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding 
BGP peer to BGP speaker
2023-06-19 00:03:22.143374 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding 
peer osci-frr on BGP Speaker bgp-speaker
2023-06-19 00:03:22.208265 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating 
floating IP to advertise
2023-06-19 00:03:22.301280 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating 
port: NDR_TEST_FIP
2023-06-19 00:03:23.599942 | focal-medium | 2023-06-19 00:03:23 [INFO] Creating 
floatingip
2023-06-19 00:03:26.351808 | focal-medium | 2023-06-19 00:03:26 [INFO] 
Advertised floating IP: 100.64.0.144

neutron-bgp-dragent.log:

2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] Logging enabled!
2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] 
/usr/bin/neutron-bgp-dragent version 22.0.0
2023-06-19 00:03:21.533 26428 INFO 
neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] 
Initializing os-ken driver for BGP functionality.
2023-06-19 00:03:21.533 26428 INFO 
neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] Initialized 
os-ken BGP Speaker driver interface with bgp_router_id=172.16.0.46
2023-06-19 00:03:21.578 26428 INFO 
neutron_dynamic_routing.services.bgp.agent.bgp_dragent [-] BGP dynamic routing 
agent started
2023-06-19 00:03:21.748 26428 INFO bgpspeaker.api.base [None 
req-3e563ce5-7b78-46d2-9dd3-02067da4e197 - - - - - -] API method core.start 
called with args: {'waiter': , 
'local_as': 4279238701, 'router_id': '172.16.0.46', 'bgp_server_hosts': 
('0.0.0.0', '::'), 'bgp_server_port': 0, 'refresh_stalepath_time': 0, 
'refresh_max_eor_time': 0, 'label_range': (100, 10), 
'allow_local_as_in_count': 0, 'cluster_id': None, 'local_pref': 100}
2023-06-19 00:03:21.766 26428 ERROR 
neutron_dynamic_routing.services.bgp.agent.bgp_dragent [None 
req-f082fe6d-cb70-4761-b02d-19f38bda7ae2 - - - - - -] Call to driver for BGP 
Speaker 04d9b59c-e4b9-4756-92b3-df364fa7bd0d advertise_route has failed with 
exception BGP Speaker for local_as=4279238701 with router_id=172.16.0.46 not 
added yet..: 
neutron_dynamic_routing.services.bgp.agent.driver.exceptions.BgpSpeakerNotAdded:
 BGP Speaker

[Yahoo-eng-team] [Bug 1959666] Re: Neutron-dynamic-routing does not work with OVN

2023-06-20 Thread Dmitrii Shcherbakov
When it comes to the NDR charm we enabled it in the charms (neutron-api-
plugin-ovn specifically needed a code change) documenting those
limitations in the charm-guide.

https://review.opendev.org/q/topic:2023-enable-ndr
https://review.opendev.org/q/topic:2023-ovn-ndr

Also we are adding some data plane testing to make sure that the
advertised routes are actually possible to reach.

https://review.opendev.org/c/openstack/charm-neutron-dynamic-
routing/+/886157

** Also affects: charm-neutron-api-plugin-ovn
   Importance: Undecided
   Status: New

** Changed in: charm-neutron-api-plugin-ovn
   Status: New => Fix Committed

** Changed in: charm-neutron-dynamic-routing
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1959666

Title:
  Neutron-dynamic-routing does not work with OVN

Status in OpenStack Neutron API OVN Plugin Charm:
  Fix Committed
Status in OpenStack Neutron Dynamic Routing charm:
  In Progress
Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released

Bug description:
  When using OVN as Neutron backend, announcing prefixes with neutron-
  dynamic-routing is currently not working due to changes in the
  database structure. Some attempt to fix this has been made in
  https://review.opendev.org/c/openstack/neutron-dynamic-
  routing/+/814055 but wasn't successful.

  This is a major stop gap for production deployments which are using
  BGP to provide connectivity for IPv6 subnets in tenant networks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-api-plugin-ovn/+bug/1959666/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2022058] [NEW] [ovn] l3ha and disitributed router extra attributes do not reflect OVN state

2023-06-01 Thread Dmitrii Shcherbakov
Public bug reported:

With https://bugs.launchpad.net/neutron/+bug/1995974 fixed and
https://review.opendev.org/c/openstack/neutron/+/864051 merged extra
attributes such as `distributed` and `ha` are now created for OVN
routers as well.

Their default values are taken from the global configuration options
more relevant for default L3 service plugin implementation based on
Linux network namespaces

https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/db/l3_attrs_db.py#L24-L27
https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_hamode_db.py#L21
https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_dvr_db.py#L19-L27

as opposed to relying on the OVN-specific options. For instance, it
order to enable the support for distributed floating IPs there is an
OVN-specific global option that enables this mode for all OVN routers:

https://github.com/openstack/neutron/blob/598fcb437a0ad3d564435799c70f38429ab4f0eb/neutron/conf/plugins/ml2/drivers/ovn/ovn_conf.py#L133-L140

As a result, OVN routers now have the `distributed` property set to
`False` by default (unless the global ML2/ovs-specific default is
changed) and it does not reflect the state of the
`ovn/enable_distributed_floating_ip` option. It can also be changed via
the API on the router without any apparent effect.

The ML2/ovs and ML2/ovn comparison docs still refer to OVN-based router having 
no `l3ha` or `distributed` attributes whereas this is not the case anymore: 
https://github.com/openstack/neutron/blame/cd66232c2b26cb4141c2e9426ce2dec0f38c364c/doc/source/ovn/faq/index.rst#L16-L29
 

One place where it becomes relevant is the neutron-dynamic-routing
project which relies on the `distributed` property to determine whether
to add /32 routes with next-hops set to a router gateway port IP
(centralized FIPs case) or not (distributed FIPs case).

https://github.com/openstack/neutron-dynamic-routing/blob/513ea649be9fd652b0c5b391167f851bc3d653bb/neutron_dynamic_routing/db/bgp_db.py#L564
https://github.com/openstack/neutron-dynamic-routing/blob/513ea649be9fd652b0c5b391167f851bc3d653bb/neutron_dynamic_routing/db/bgp_db.py#L567-L580

For distributed routers the logic is such that IP addresses of ports
with a device owner set to `floatingip_agent_gateway` are used as a next
hop for /32 routes, however, the OVN-based L3 service plugin
implementation (OVNL3RouterPlugin) does not create those on a per
external network bases much like the core L3RouterPlugin-based
implementation does with DVR.

As a result, if an operator uses distributed FIPs with OVN with the
router attribute `distributed == False`, neutron-dynamic-routing will
advertise /32 routes with the centralized FIP logic (the southbound
traffic would go via the router gateway port).

On the other hand, if an operator uses distributed FIPs with OVN with
the router attribute `distributed == True`, neutron-dynamic-routing will
not advertise anything because the centralized routes will not be added
as the router seems to be distributed whereas there are no
`floatingip_agent_gateway` ports created with OVN.

There are at least two outputs to expect as a fix:

1) Make sure the distributed state is reflected correctly for OVN routers based 
on the OVN-specific config option;
2) Fix neutron-dynamic routing to still create centralized /32 routes if there 
are not any `floatingip_agent_gateway` ports
OR change the OVN implementation to create those for the purposes of direct 
southbound routing purposes.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: bgp ndr neutron-dynamic-routing ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2022058

Title:
  [ovn] l3ha and disitributed router extra attributes do not reflect OVN
  state

Status in neutron:
  New

Bug description:
  With https://bugs.launchpad.net/neutron/+bug/1995974 fixed and
  https://review.opendev.org/c/openstack/neutron/+/864051 merged extra
  attributes such as `distributed` and `ha` are now created for OVN
  routers as well.

  Their default values are taken from the global configuration options
  more relevant for default L3 service plugin implementation based on
  Linux network namespaces

  
https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/db/l3_attrs_db.py#L24-L27
  
https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_hamode_db.py#L21
  
https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_dvr_db.py#L19-L27

  as opposed to relying on the OVN-specific options. For instance, it
  order to enable the support for distributed floating IPs there is an
  OVN-specific global option that enables this mode for all OVN routers:

  
https://github.com

[Yahoo-eng-team] [Bug 2003842] [NEW] [OVN] A route inferred from a subnet's default gateway is not added to ovn-nb if segment_id is not None for a subnet

2023-01-25 Thread Dmitrii Shcherbakov
Public bug reported:

Context:

* Neutron is configured to use OVN
* An external provider network with one segment is created
* A subnet with a default gateway IP set is associated with this segment 
explicitly (segment_id != None)
* A router's gateway port is set to use the provider network 
(external_gateway_info is set with a network_id passed)

Result: OVN NB does not contain a default route and instance traffic is
blackholed.

--
Detailed description:

The first time a external gateway info is set as follows

$ openstack router set --external-gateway pubnet r1

does not result in OVN getting a default route with the next-hop set to
the subnet's gateway IP:

$ sudo ovn-nbctl list logical_router_static_route ; echo $?
0

Doing it twice in a row does (the default route appears in the table
after the second command):

$ openstack router set --external-gateway pubnet r1 && openstack router
set --external-gateway pubnet r1

$ sudo ovn-nbctl list logical_router_static_route
_uuid   : df7c6020-83e7-446c-8f5c-31db96eb2dd3
bfd : []
external_ids: {"neutron:is_ext_gw"="true", 
"neutron:subnet_id"="abdae752-034c-4845-b6b3-92bf40cf24a6"}
ip_prefix   : "0.0.0.0/0"
nexthop : "10.1.1.1"
options : {}
output_port : []
policy  : []
route_table : ""

The inferred route is normally installed by this portion of code:
https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1264-L1279

Based on the result from _get_gw_info: 
https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1197-L1204

`_get_gw_info` returns an empty list since `external_fixed_ips` is an
empty list:

self._l3_plugin.get_router(context, 'd51ec4b0-c847-41e0-b43d-5dbf8ddcca32')
{'id': 'd51ec4b0-c847-41e0-b43d-5dbf8ddcca32', 'name': 'r1', 'tenant_id': 
'dbfcc6c6a50f481685fda546abd00cd3', 'admin_state_up': True, 'status': 'ACTIVE', 
'external_gateway_info': {'network_id': 'eef0120b-d01f-4cf7-9d1a-65f1da1eb67c', 
'external_fixed_ips': [], 'enable_snat': True}, 'gw_port_id': 
'2da99728-b04e-4a4f-ac6f-d0930de8264a', 'description': '', 
'availability_zones': [], 'distributed': False, 'ha': False, 'ha_vr_id': 0, 
'availability_zone_hints': [], 'routes': [], 'tags': [], 'created_at': 
'2023-01-20T09:45:55Z', 'updated_at': '2023-01-24T12:44:14Z', 
'revision_number': 35, 'project_id': 'dbfcc6c6a50f481685fda546abd00cd3'}

Meanwhile, the `external_fixed_ips` field is empty because of the
deferred IPAM logic triggered by the presence of `segment_id != None`
for the subnet on the external network. Based on this logic, the port is
unbound and does not get an IP allocation until a port update & port
binding:

https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/objects/subnet.py#L341-L343
 (subnets attached to segments are excluded if a host isn't known)
https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/objects/subnet.py#L481-L486
 (ipam_exceptions.DeferIpam is raised)
https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/db/db_base_plugin_v2.py#L1472-L1478
 (DeferIpam is caught and the port gets IP_ALLOCATION_NONE for its IP 
allocation as it has no fixed ips.

Port state after it gets created in the unbound state (the code trying
to add a default route is trying to find fixed IPs at the same time the
gateway port is unbound and does not have any):

openstack port list --router r1
+--+--+---+++
| ID   | Name | MAC Address   | Fixed IP 
Addresses | Status |
+--+--+---+++
| 2da99728-b04e-4a4f-ac6f-d0930de8264a |  | fa:16:3e:eb:cf:76 | 
   | DOWN   |
| 97d604f2-addb-46b8-9eaf-745257dddb2f |  | fa:16:3e:c8:73:8b | 
ip_address='192.168.0.1', subnet_id='89227e7b-d2b0-4953-afe7-2b471736f85a' | 
ACTIVE |
+--+--+---+++

openstack port show 2da99728-b04e-4a4f-ac6f-d0930de8264a
+-+--+
| Field   | Value|
+-+--+
| admin_state_up  | UP   |
| allowed_address_pairs   |  |
| bind

[Yahoo-eng-team] [Bug 2002687] [NEW] [RFE] Active-active L3 Gateway with Multihoming

2023-01-12 Thread Dmitrii Shcherbakov
Public bug reported:

Some network designs include multiple L3 gateways to:

* Share the load across different gateways;
* Provide independent network paths for the north-south direction (e.g. via
  different ISPs).

Having multi-homing implemented at the instance level imposes additional burden
on the end user of a cloud and support requirements for the guest OS, whereas
utilizing ECMP and BFD at the router side alleviates the need for instance-side
awareness of a more complex routing setup.

Adding more than one gateway port implies extending the existing data model
which was described in the multiple external gateways spec 
(https://specs.openstack.org/openstack/neutron-specs/specs/xena/multiple-external-gateways.html).
 However, it left
adding additional gateway routes out of scope leaving this to future
improvements around dynamic routing. Also the focus of neutron-dynamic-routing
has so far been around advertising routes, not accepting new ones from the
external peers - so dynamic routing support like this is a very different
subject. However, manual addition of extra routes does not utilize the default
gateway IP information available from subnets in Neutron while this could be
addressed by implementing an extra conditional behavior when adding more than
one gateway port to a router.

ECMP routes can result in black-holing of traffic should the next-hop of a
route becomes unreachable. BFD is a standard protocol adopted by IETF
for next-hop failure detection which can be used for route eviction. OVN 
supports BFD as of v21.03.0 
(https://github.com/ovn-org/ovn/commit/6e0a69ad4bcdf9e4cace5c73ef48ab06065e8519)
 with a data model that allows enabling
BFD on a per next-hop basis by associating BFD session information with routes,
however, it is not modeled at the Neutron level even if a backend supports it.

>From the Neutron data model perspective, ECMP for routes is already a supported
concept since ECMP support spec got implemented 
(https://specs.openstack.org/openstack/neutron-specs/specs/wallaby/l3-router-support-ecmp.html)
 in Wallaby (albeit the
spec focused on the L3-agent based implementation only).

As for OVN and BFD, the OVN database state needs to be populated by Neutron
based on the data from the Neutron database, therefore, data model changes to
the Neutron DB are needed to represent the BFD session parameters.

---

The previous work on multiple gateway ports did not get completed and
the neutron-lib changes were reverted. Likewise, the scope of this RFE
is bigger with some overlap and augmentation compared to prior art. The
spec will follow for this RFE with more details as to how the data model
and API changes are proposed to be made.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2002687

Title:
  [RFE] Active-active L3 Gateway with Multihoming

Status in neutron:
  New

Bug description:
  Some network designs include multiple L3 gateways to:

  * Share the load across different gateways;
  * Provide independent network paths for the north-south direction (e.g. via
different ISPs).

  Having multi-homing implemented at the instance level imposes additional 
burden
  on the end user of a cloud and support requirements for the guest OS, whereas
  utilizing ECMP and BFD at the router side alleviates the need for 
instance-side
  awareness of a more complex routing setup.

  Adding more than one gateway port implies extending the existing data model
  which was described in the multiple external gateways spec 
(https://specs.openstack.org/openstack/neutron-specs/specs/xena/multiple-external-gateways.html).
 However, it left
  adding additional gateway routes out of scope leaving this to future
  improvements around dynamic routing. Also the focus of neutron-dynamic-routing
  has so far been around advertising routes, not accepting new ones from the
  external peers - so dynamic routing support like this is a very different
  subject. However, manual addition of extra routes does not utilize the default
  gateway IP information available from subnets in Neutron while this could be
  addressed by implementing an extra conditional behavior when adding more than
  one gateway port to a router.

  ECMP routes can result in black-holing of traffic should the next-hop of a
  route becomes unreachable. BFD is a standard protocol adopted by IETF
  for next-hop failure detection which can be used for route eviction. OVN 
  supports BFD as of v21.03.0 
(https://github.com/ovn-org/ovn/commit/6e0a69ad4bcdf9e4cace5c73ef48ab06065e8519)
 with a data model that allows enabling
  BFD on a per next-hop basis by associating BFD session information with 
routes,
  however, it is not modeled at the Neutron level even if a backend supports it.

  From the Neutron data model perspective, ECMP for routes is already a 

[Yahoo-eng-team] [Bug 1973276] Re: OVN port loses its virtual type after port update

2022-08-23 Thread Dmitrii Shcherbakov
** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1973276

Title:
  OVN port loses its virtual type after port update

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  Bug found in Octavia (master)

  Octavia creates at least 2 ports for each load balancer:
  - the VIP port, it is down, it keeps/stores the IP address of the LB
  - the VRRP port, plugged into a VM, it has the VIP address in the 
allowed-address list (and the VIP address is configured on the interface in the 
VM)

  When sending an ARP request for the VIP address, the VRRP port should
  reply with its mac-address.

  In OVN the VIP port is marked as "type: virtual".

  But when the VIP port is updated, it loses its "port: virtual" status
  and that breaks the ARP resolution (OVN replies to the ARP request by
  sending the mac-address of the VIP port - which is not used/down).

  Quick reproducer that simulates the Octavia behavior:

  
  ===

  import subprocess
  import time
   
  import openstack
   
  conn = openstack.connect(cloud="devstack-admin-demo")
   
  network = conn.network.find_network("public")
   
  sg = conn.network.find_security_group('sg')
  if not sg:
  sg = conn.network.create_security_group(name='sg')
   
  vip_port = conn.network.create_port(
  name="lb-vip",
  network_id=network.id,
  device_id="lb-1",
  device_owner="me",
  is_admin_state_up=False)
   
  vip_address = [
  fixed_ip['ip_address']
  for fixed_ip in vip_port.fixed_ips
  if '.' in fixed_ip['ip_address']][0]
   
  vrrp_port = conn.network.create_port(
  name="lb-vrrp",
  device_id="vrrp",
  device_owner="vm",
  network_id=network.id)
  vrrp_port = conn.network.update_port(
  vrrp_port,
  allowed_address_pairs=[
  {"ip_address": vip_address,
   "mac_address": vrrp_port.mac_address}])
   
  time.sleep(1)
   
  output = subprocess.check_output(
  f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'",
  shell=True)
  output = output.decode('utf-8')
   
  if 'type: virtual' in output:
  print("Port is virtual, this is ok.")
  print(output)
   
  conn.network.update_port(
  vip_port,
  security_group_ids=[sg.id])
   
  time.sleep(1)
   
  output = subprocess.check_output(
  f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'",
  shell=True)
  output = output.decode('utf-8')
   
  if 'type: virtual' not in output:
  print("Port is not virtual, this is an issue.")
  print(output)

  ===

  
  In my env (devstack master on c9s):
  $ python3 /mnt/host/virtual_port_issue.py
  Port is virtual, this is ok.
  port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip)
  type: virtual
  addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"]

  Port is not virtual, this is an issue.
  port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip)
  addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"]
  port 8ec36278-82b1-436b-bc5e-ea03ef22192f

  
  In Octavia, the "port: virtual" is _sometimes_ back after other updates of 
the ports, but in some cases the LB is unreachable.

  (and "ovn-nbctl lsp-set-type  virtual" fixes the LB)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1973276/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1964995] [NEW] [yoga][regression] network capabilities in extra info are overridden if vpd is present for a PCI device

2022-03-15 Thread Dmitrii Shcherbakov
Public bug reported:

VPD capability handling was added in
https://opendev.org/openstack/nova/commit/ab49f97b2c08294234c7bfd3dedb75780ca519e6

and now does a device dict update as follows

https://opendev.org/openstack/nova/src/commit/dde15d9c475c8ef709578310d304c9d8ecb9d493/nova/virt/libvirt/host.py#L1428
device.update(_get_device_capabilities(device, dev, net_devs))
device.update(_get_vpd_details(device, dev, pci_devs))


Which results in, for example, this content in the capabilities field:

 'capabilities': {'vpd': {'card_serial_number': 'testserial'}},

instead of this

 'capabilities': {'network': ['rx',
  'tx',
  'sg',
  'tso',
  'gso',
  'gro',
  'rxvlan',
  'txvlan'],
  'vpd': {'card_serial_number': 'testserial'}}

This is a regression from the earlier behavior, however, current unit
and functional tests do not cover this.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1964995

Title:
  [yoga][regression] network capabilities in extra info are overridden
  if vpd is present for a PCI device

Status in OpenStack Compute (nova):
  New

Bug description:
  VPD capability handling was added in
  
https://opendev.org/openstack/nova/commit/ab49f97b2c08294234c7bfd3dedb75780ca519e6

  and now does a device dict update as follows

  
https://opendev.org/openstack/nova/src/commit/dde15d9c475c8ef709578310d304c9d8ecb9d493/nova/virt/libvirt/host.py#L1428
  device.update(_get_device_capabilities(device, dev, net_devs))
  device.update(_get_vpd_details(device, dev, pci_devs))

  
  Which results in, for example, this content in the capabilities field:

   'capabilities': {'vpd': {'card_serial_number': 'testserial'}},

  instead of this

   'capabilities': {'network': ['rx',
'tx',
'sg',
'tso',
'gso',
'gro',
'rxvlan',
'txvlan'],
'vpd': {'card_serial_number': 'testserial'}}

  This is a regression from the earlier behavior, however, current unit
  and functional tests do not cover this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1964995/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1884723] Re: [OVS] multicast between VM instances on different compute nodes is broken with IGMP snooping enabled

2021-02-03 Thread Dmitrii Shcherbakov
** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884723

Title:
  [OVS] multicast between VM instances on different compute nodes is
  broken with IGMP snooping enabled

Status in neutron:
  In Progress
Status in neutron package in Ubuntu:
  New

Bug description:
  It was originally reported by Matt Flusche in Red Hat's bugzilla.
  Below is description of the issue:

  I was verifying these OVS configuration options and the impact on
  tenant networking.  My thought going into testing was vxlan would not
  be impacted but vlan tenant would break; however, for vxlan tenant
  networks it looks like these options will break multicast also.

  In a lab test (osp13), multicast is broken between VM instances on
  different compute nodes after applying:

  >  # ovs-vsctl set Bridge br-int mcast_snooping_enable=true
  >  # ovs-vsctl set Bridge br-int 
other_config:mcast-snooping-disable-flood-unregistered=true

  The following can be used to temporarily allow multicast over vxlan:

  ovs-vsctl set Port patch-tun other_config:mcast-snooping-flood-
  reports=true

  This will flood reports to br-tun and the other vxlan endpoints will
  learn the remote port.  This allows multicast snooping to work for a
  period of time; however, since there is no IGMP querier to continue to
  solicit IGMP reports once the Age timer expires (300 secs) the traffic
  will be blocked.

  It seems that this solution as suggested will work if only provider
  networking is used.  Is that correct?

  An options that might work would be:

  ovs-vsctl set Bridge br-int mcast_snooping_enable=true
  ovs-vsctl set Bridge br-int 
other_config:mcast-snooping-disable-flood-unregistered=false  #<--- change to 
false; default

  Then, for each patch on br-int:

  ovs-vsctl set Port  other_config:mcast-snooping-flood-reports=true
  ovs-vsctl set Port  other_config:mcast-snooping-flood=true

  This might provide best effort snooping.  multicast isolation where
  IGMP queriers are available and flood everywhere else?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1884723/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1517180] Re: No support for adding custom certificate chains

2019-09-19 Thread Dmitrii Shcherbakov
** Changed in: maas
   Status: Invalid => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1517180

Title:
  No support for adding custom certificate chains

Status in cloud-init:
  Triaged
Status in curtin:
  Triaged
Status in MAAS:
  New

Bug description:
  In a MAAS behind a proxy that uses a self-signed certificate, when
  machines provisioned using maas attempt to contact e.g.
  https://entropy.ubuntu.com, they fail to validate the cert chain and
  fail.

  Suggested solution borrowed from an email from kirkland:

  On the MAAS administrative configuration page, we should add a small
  section where the MAAS admin can copy/paste/edit any certificate
  chains that they want to add to machines provisioned by MAAS.  These
  certs should then be inserted into /etc/ssl/certs by cloud-init or
  curtin on initial install (depending on the earliest point at which
  the cert might be needed).

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1517180/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1773967] Re: Application credentials can't be used with group-only role assignments

2019-08-06 Thread Dmitrii Shcherbakov
** Also affects: keystone (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1773967

Title:
  Application credentials can't be used with group-only role assignments

Status in Ubuntu Cloud Archive:
  New
Status in OpenStack Identity (keystone):
  Fix Released
Status in keystone package in Ubuntu:
  New

Bug description:
  If a user only has a role assignment on a project via a group
  membership, the user can create an application credential for the
  project but it cannot be used. If someone tries to use it, the debug
  logs will report:

   User  has no access to project 

  We need to ensure that any application credential that is created can
  be used so long as it is not expired and the user exists and has
  access to the project they created the application credential for. If
  we decide that application credentials should not be valid for users
  who have no explicit role assignments on projects, then we should
  prevent it from being created and provide a useful message to the
  user.

  This is probably related to
  https://bugs.launchpad.net/keystone/+bug/1589993

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1773967/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1834009] [NEW] Trust API does not support delegating federated roles (roles obtained from federated groups)

2019-06-24 Thread Dmitrii Shcherbakov
Public bug reported:

When a trust is created a trustor user is required to have a role on a
project in question. This is verified via a call to the keystone
database without looking at roles that can be inferred from federated
groups present in a token.

In this scenario a federated user does not have any direct role
assignments in the Keystone database - only the ones that can be
inferred from federated group membership.

https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L141
https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L172-L178

A call to /v3/auth/tokens which verifies that "roles" for groups present in 
"OS-FEDERATION" section are properly populated:
http://paste.openstack.org/show/753298/
"roles": [
  {
"id": "e4ab04a7c6ec4c91a826b2a3ba333407",
"domain_id": null,
"name": "Member"
  }
# ...
"user": {
  "OS-FEDERATION": {
"identity_provider": {
  "id": "adfs"
},
"protocol": {
  "id": "mapped"
},
"groups": [
  {
"id": "7594d86688c54ee2aab4c9df020f5468"
  }
]
  },

This bug is similar to this one for application credentials:
https://bugs.launchpad.net/keystone/+bug/1832092

Users, Member role and role assignments:
http://paste.openstack.org/show/753300/

The issue was discovered while troubleshooting "Error: ERROR: Missing
required credential: roles [u'Member']" showed by heat dashboard during
a stack creation:

http://paste.openstack.org/show/753301/ (heat API rpdb trace where a
Keystone trust API call is made)

Keystone side:
http://paste.openstack.org/show/753302/ (keystone trust API rpdb trace)

** Affects: keystone
 Importance: Undecided
 Status: New


** Tags: cpe-onsite

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1834009

Title:
  Trust API does not support delegating federated roles (roles obtained
  from federated groups)

Status in OpenStack Identity (keystone):
  New

Bug description:
  When a trust is created a trustor user is required to have a role on a
  project in question. This is verified via a call to the keystone
  database without looking at roles that can be inferred from federated
  groups present in a token.

  In this scenario a federated user does not have any direct role
  assignments in the Keystone database - only the ones that can be
  inferred from federated group membership.

  
https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L141
  
https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L172-L178

  A call to /v3/auth/tokens which verifies that "roles" for groups present in 
"OS-FEDERATION" section are properly populated:
  http://paste.openstack.org/show/753298/
  "roles": [
{
  "id": "e4ab04a7c6ec4c91a826b2a3ba333407",
  "domain_id": null,
  "name": "Member"
}
  # ...
  "user": {
"OS-FEDERATION": {
  "identity_provider": {
"id": "adfs"
  },
  "protocol": {
"id": "mapped"
  },
  "groups": [
{
  "id": "7594d86688c54ee2aab4c9df020f5468"
}
  ]
},

  This bug is similar to this one for application credentials:
  https://bugs.launchpad.net/keystone/+bug/1832092

  Users, Member role and role assignments:
  http://paste.openstack.org/show/753300/

  The issue was discovered while troubleshooting "Error: ERROR: Missing
  required credential: roles [u'Member']" showed by heat dashboard
  during a stack creation:

  http://paste.openstack.org/show/753301/ (heat API rpdb trace where a
  Keystone trust API call is made)

  Keystone side:
  http://paste.openstack.org/show/753302/ (keystone trust API rpdb trace)

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1834009/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1832265] Re: py3: inconsistent encoding of token fields

2019-06-22 Thread Dmitrii Shcherbakov
Ran into a related problem during debugging of dashboard errors ("Unable
to retrieve key pairs") with a Rocky cloud & identity federation.

There was no clear indication as to why failures occurred.

https://paste.ubuntu.com/p/v5HXyyWXC2/ (full pdb trace)

At a high level I was getting validation failures for the identity
provider (which was enabled in Keystone and was otherwise correct in
terms of config) in the /v3/auth/token code path.

I narrowed it down to a validation error due to a type mismatch (bytes
vs str):


1) the error occurs in send_notification:

> /usr/lib/python3/dist-packages/keystone/auth/plugins/mapped.py(101)handle_scoped_token()->None
-> send_notification(taxonomy.OUTCOME_SUCCESS)
(Pdb) l
 96 # send off failed authentication notification, raise the 
exception
 97 # after sending the notification
 98 send_notification(taxonomy.OUTCOME_FAILURE)
 99 raise
100 else:
101  -> send_notification(taxonomy.OUTCOME_SUCCESS)

# ...


2) this is how the validation error looks like:

(Pdb) setattr(self, FED_CRED_KEYNAME_IDENTITY_PROVIDER, identity_provider)
*** ValueError: identity_provider failed validation:  at 0x7fa0016ef9d8>


3) the lambda function where the error occurs

 67 class FederatedCredential(Credential):
 68 identity_provider = cadftype.ValidatorDescriptor(
 69 FED_CRED_KEYNAME_IDENTITY_PROVIDER,
 70  -> lambda x: isinstance(x, six.string_types))
 71 user = cadftype.ValidatorDescriptor(
 72 FED_CRED_KEYNAME_USER,
 73 lambda x: isinstance(x, six.string_types))
 74 groups = cadftype.ValidatorDescriptor(
 75 FED_CRED_KEYNAME_GROUPS,


4) type comparison (b'adfs' is the identity provider name):

((Pdb)) x
b'adfs'
((Pdb)) six.string_types
(,)
((Pdb)) type(x)


Using a package from James' PPA helped as I am not getting errors in the
same code-path anymore.

apt policy keystone
keystone:
  Installed: 2:14.1.0-0ubuntu2~ubuntu18.04.1~ppa201906140719
  Candidate: 2:14.1.0-0ubuntu2~ubuntu18.04.1~ppa201906140719
  Version table:
 *** 2:14.1.0-0ubuntu2~ubuntu18.04.1~ppa201906140719 500


When clicking through tabs very fast I encountered a glitch which
results in the following error messages being displayed (see the
screencast in the attachment):

Error: "Unable to retrieve key pairs"/"Unable to retrieve images"/""Unable to 
retrieve server groups"
Warning: "Policy check failed"

I tried to set breakpoints in the same place - the same validation error
does NOT occur with the patch so this is something else unrelated to py2
vs py3 string handling.

** Attachment added: "2019-06-22-16-12-40.mkv"
   
https://bugs.launchpad.net/charm-keystone-ldap/+bug/1832265/+attachment/5272335/+files/2019-06-22-16-12-40.mkv

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1832265

Title:
  py3: inconsistent encoding of token fields

Status in OpenStack Keystone LDAP integration:
  Invalid
Status in Ubuntu Cloud Archive:
  New
Status in OpenStack Identity (keystone):
  In Progress
Status in keystone package in Ubuntu:
  Fix Released
Status in keystone source package in Cosmic:
  Triaged
Status in keystone source package in Disco:
  Triaged

Bug description:
  When using an LDAP domain user on a bionic-rocky cloud within horizon,
  we are unable to see the projects listed in the project selection
  drop-down, and are unable to query resources from any projects to
  which we are assigned the role Member.

  It appears that the following log entries in keystone may be helpful
  to troubleshooting this issue:

  (keystone.middleware.auth): 2019-06-10 19:47:02,700 DEBUG RBAC: auth_context: 
{'trust_id': None, 'trustor_id': None, 'trustee_id': None, 'domain_id': None, 
'domain_name': None, 'group_ids': [], 'token': , 'user_id': 
b'd4fb94cfa3ce0f7829d76fe44697488e7765d88e29f5a896f57d43caadb0fad4', 
'user_domain_id': '997b3e91271140feb1635eefba7c65a1', 'system_scope': None, 
'project_id': None, 'project_domain_id': None, 'roles': [], 'is_admin_project': 
True, 'service_user_id': None, 'service_user_domain_id': None, 
'service_project_id': None, 'service_project_domain_id': None, 'service_roles': 
[]}
  (keystone.server.flask.application): 2019-06-10 19:47:02,700 DEBUG 
Dispatching request to legacy mapper: /v3/users
  (keystone.server.flask.application): 2019-06-10 19:47:02,700 DEBUG 
SCRIPT_NAME: `/v3`, PATH_INFO: 
`/users/d4fb94cfa3ce0f7829d76fe44697488e7765d88e29f5a896f57d43caadb0fad4/projects`
  (routes.middleware): 2019-06-10 19:47:02,700 DEBUG Matched GET 
/users/d4fb94cfa3ce0f7829d76fe44697488e7765d88e29f5a896f57d43caadb0fad4/projects
  (routes.middleware): 2019-06-10 19:47:02,700 DEBUG Route path: 
'/users/{user_id}/projects', defaults: {'action': 'list_user_projects

[Yahoo-eng-team] [Bug 1832092] [NEW] [rocky+] Creation of application credentials fails when role assignments only come from role assignments of federated groups

2019-06-08 Thread Dmitrii Shcherbakov
Public bug reported:

[Version]
Rocky (UCA)

[Problem Description]

(see the User Scenario section below for a description of the
environment)

When no direct role assignments to federated users are done and only
federated group role assignments are present, application credential
creation via Horizon fails with the following errors:

horizon apache2 error.log:

[Sat Jun 08 14:27:59.153479 2019] [wsgi:error] [pid 150327:tid
139962773473024] [remote 10.232.46.207:35898] Recoverable error: Invalid
application credential: Could not find role assignment with role:
91afa82fab85426fa741370dabad80bf, user or group:
794d430997c64060854bf77f2e7e6e16, project, domain, or system:
7de76f768cb84149b8b2d693d1d21f45. (HTTP 400) (Request-ID: req-da2e3322
-2f6f-468f-bd0d-b08855f9893b)

keystone.log:

(keystone.common.wsgi): 2019-06-08 14:30:55,933 WARNING Invalid application 
credential: Could not find role assignment with role: 
91afa82fab85426fa741370dabad80bf, us
er or group: 794d430997c64060854bf77f2e7e6e16, project, domain, or system: 
7de76f768cb84149b8b2d693d1d21f45.
(keystone.middleware.auth): 2019-06-08 14:31:00,940 DEBUG Authenticating user 
token


Code-path:

create_application_credential -> _require_user_has_role_in_project ->
_get_user_roles -> _get_user_roles -> list_role_assignments ->
_list_effective_role_assignments -> _get_group_ids_for_user_id ->
list_groups_for_user -> _get_group_ids_for_user_id

A detailed rpdb trace:
http://paste.openstack.org/show/752652/


 82 def _require_user_has_role_in_project(self, roles, user_id, 
project_id):
 83 user_roles = self._get_user_roles(user_id, project_id)
 84  -> for role in roles:
 85 if role['id'] not in user_roles:
 86 raise 
exception.RoleAssignmentNotFound(role_id=role['id'],
 87actor_id=user_id,
 88
target_id=project_id)


[Possible Solution]

Group membership details obtained dynamically during federated
authentication and embedded into a fernet token (first an unscoped
token, then a project-scoped token) need to be used in addition to
querying the database for user to group membership.

[User Scenario]

Federated authentication via SAML with the following mapping (i.e. no
direct role assignment to a user on a project - only federated group-
based role assignment):

openstack mapping show adfs_mapping
+---++
| Field | Value 

 |
+---++
| id| adfs_mapping  

 |
| rules | [{'remote': [{'type': 'MELLON_NAME_ID'}, {'type': 'MELLON_groups'}], 
'local': [{'domain': {'id': 'e834e57943714e058c203d4f544ea946'}, 'user': 
{'name': '{0}'}, 'groups': '{1}'}]}] |
+---++

# a federated user
openstack user list --domain adfs
+--++
| ID   | Name   |
+--++
| 794d430997c64060854bf77f2e7e6e16 | intranet\Administrator |
+--++

# a group that that exists both on the IdP and Keystone (SP) side
openstack group list --domain adfs
+--++
| ID   | Name   |
+--++
| 701f70e7549d4de28cecd60127a1a444 | adfs_users |
+--++

# grouptest is a project that adfs_users group members get a Member role 
assignment on
openstack project list --domain adfs
+--+---+
| ID   | Name  |
+--+---+
| 7de76f768cb84149b8b2d693d1d21f45 | grouptest |
| 6a0657cf98684a62af99dc7b71a383dd | test  |
+--+---+

# no direct Member role assignments for federated users 
openstack role assignment list --names
++--+-+-+--++---+
| Role   | U

[Yahoo-eng-team] [Bug 1828126] [NEW] [<= Queens] With token-provider='uuid', roles of dynamically obtained federated groups are not taken into account during token-based authentication (for project-sc

2019-05-07 Thread Dmitrii Shcherbakov
Public bug reported:

[Overview]
The relevant part of the federated authentication process after the IdP and SP 
token parsing stages is as follows:

1) WSGI environment variables created based on token attributes (e.g. SAML 
token attributes) are passed down to Keystone;
2) Keystone creates a shadow mapped user in the db and tries to map token 
attributes to objects such as groups, roles and projects in the DB based on a 
custom mapping created by an operator;
3) groups that may be obtained from token attributes are matched against groups 
in Keystone but the user is not included into those groups in Keystone DB (to 
support dynamic group membership changes at the IdP side). If any of the target 
groups do not exist in Keystone authentication fails;
4) A domain-scoped federated token is created (e.g. by Horizon) and then a 
project-scoped token is created using the previous token as the authentication 
method.

(4) is where the problem occurs.
 
[Environment]
Queens, 19.04 charms, token-provider='uuid' for charm-keystone.

openstack commands used to configure an IdP:
https://paste.ubuntu.com/p/nj6MdQDKk2/

keystone.conf sections:
[auth]
methods = external,password,token,oauth1,totp,application_credential,saml2
[federation]
trusted_dashboard = https://dashboard.maas/auth/websso/
[saml2]
remote_id_attribute = MELLON_IDP

IdP is ADFS in this case which uses a windows account name as NAMEID and adds 
an attribute which corresponds to a group ID (the group name in Active 
Directory is the OpenStack group ID). The resulting SAML token then contains 
the following elements:
   
3f031869ef9f4dc49a342d6be69e98b3  


The direct usage of a group ID is present to rule out group name to ID
resolution problems.

[Use-case]

Automatic project provisioning and Member role assignment to users is
not used on purpose to manage user access to projects via group to role
assignments. A user is a assigned to a group at the IdP side and the
keystone database does not contain any role assignments for shadow-
mapped users. `openstack role assignment list --names` will not contain
anything related to group assignments - all group membership information
will only be exposed in a token.

[Problem Description]

1) the first token (federated, obtained via v3 federation API) is domain-scoped 
and authentication succeeds for it;
2) then a client (e.g. Horizon) gets a project-scoped token based on that 
federated token (token authentication & regular v3 API) for which roles need to 
be populated - including the roles to access the target project;
3) the roles for the second token are not populated correctly based on the 
(dynamic) group assignments that came from the SAML token for the first token - 
clearly the role population code-path for the second token is not aware of 
groups that came dynamically with the SAML token. The expected result would be 
awareness of groups assigned to the shadow mapped user and then inference of 
roles from groups based on group to role assignments in the Keystone DB. This 
explains the fact that project auto-provisioning and project role assignment to 
shadow users directly works properly (because this can be queried by keystone 
from its db).

The visible end-result for a user authenticating via the dashboard is
represented in a form of errors such as "Unauthorized: Unable to..." for
any accessed dashboard pane.

[Symptoms]

Example: https://paste.ubuntu.com/p/syxxWmdyD7/
(keystone.token.provider): 2019-05-07 17:47:01,947 DEBUG Unable to validate 
token: The request you have made requires authentication.

Project-scoped token example (contains the right group and "methods": ["token", 
"saml2"]) as queried directly from the db:
https://paste.ubuntu.com/p/rRgXSctgWT/

rpdb trace - first pass at finding where it fails:
https://paste.ubuntu.com/p/DhG4HXCnBB/

Second pass (the most useful) - a trace point in 
keystone/token/providers/common.py get_token_data() going down to 
keystone/token/providers/common.py(432)_populate_roles() where the Unauthorized 
exception is thrown:
https://paste.ubuntu.com/p/pjRf7qBzcX/


[Root Cause]

Based on the symptoms it is clear that _populate_roles (unlike
populate_roles_for_federated_user) does not include group roles for
groups obtained via federated authentication:

https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L408-L432
(_populate_roles)

https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L168-L193
(_get_roles_for_user, has a branch to work with group roles but for
system-scoped tokens only)

https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L190-L193
(get_roles_for_user_and_project gets user to role assignments which are
not present in this case)

Which in the end leads to exception.Unauthorized being thrown by
Keystone
https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L416

[Yahoo-eng-team] [Bug 1774710] Re: DHCP agent doesn't do anything with a network's dns_domain attribute

2019-04-23 Thread Dmitrii Shcherbakov
** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1774710

Title:
  DHCP agent doesn't do anything with a network's dns_domain attribute

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  0) Set up Neutron with ML2/OVS or LB, or anything that uses the DHCP agent
  1) Create a network with dns_domain
  2) Boot a VM on it

  Notice the VM doesn't have the DNS domain in it's /etc/resolv.conf

  In short, per-network DNS domains are not respected by the DHCP agent.
  The dns_domain attribute is persisted in the Neutron DB and passed on
  to the DHCP agent via RPC, but the agent doesn't do anything with it.

  Versions:
  Master and all previous versions.

  WIP fix is in https://review.openstack.org/#/c/571546.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1774710/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1763608] Re: Netplan ignores Interfaces without IP Addresses

2019-02-04 Thread Dmitrii Shcherbakov
I do not think Neutron is related in any way here by the way because it
is not responsible for bringing OVS bridge interface links up => moving
to invalid for Neutron.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1763608

Title:
  Netplan ignores Interfaces without IP Addresses

Status in kolla:
  Invalid
Status in netplan:
  New
Status in neutron:
  Invalid

Bug description:
  The "manual" method in /etc/network/interfaces resulted in an
  interface being brought up, but not having an IP address assigned.

  When configuring an Interface without an IP Address, netplan ignores
  the interface instead of bringing it up.

  ---
  network:
version: 2
renderer: networkd
ethernets:
  eth1: {}

  Expected result from `netplan apply`: eth1 is brought up.
  Actual result: eth1 is still down.

  Similarly `netplan generate` does not generate any file in
  /run/systemd/network for eth1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/kolla/+bug/1763608/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1783654] Re: DVR process flow not installed on physical bridge for shared tenant network

2018-08-24 Thread Dmitrii Shcherbakov
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1783654

Title:
  DVR process flow not installed on physical bridge for shared tenant
  network

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  Seems like collateral from
  https://bugs.launchpad.net/neutron/+bug/1751396

  In DVR, the distributed gateway port's IP and MAC are shared in the
  qrouter across all hosts.

  The dvr_process_flow on the physical bridge (which replaces the shared
  router_distributed MAC address with the unique per-host MAC when its
  the source), is missing, and so is the drop rule which instructs the
  bridge to drop all traffic destined for the shared distributed MAC.

  Because of this, we are seeing the router MAC on the network
  infrastructure, causing it on flap on br-int on every compute host:

  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec1
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec2
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec1
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
 11 4  fa:16:3e:42:a2:ec0

  
  Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 
is the correct local qr-interface. Because these dvr flows are missing on 
br-vlan, pkts w/ source mac ingress into the host and br-int learns it upstream.

  
  The symptom is when pinging a VM's floating IP, we see occasional packet loss 
(10-30%), and sometimes the responses are sent upstream by br-int instead of 
the qrouter, so the ICMP replies come with fixed IP of the replier since no 
NAT'ing took place, and on the tenant network rather than external network.

  When I force net_shared_only to False here, the problem goes away:
  
https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436

  It should we noted we *ONLY* need to do this on our dvr_snat host. The
  dvr process's are missing on every compute host. But if we shut
  qrouter on the snat host, FIP functionality works and DVR mac stops
  flapping on others. Or if we apply fix only to snat host, it works.
  Perhaps there is something on SNAT node that is unique

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1783654/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1751396] Re: DVR: Inter Tenant Traffic between two networks and connected through a shared network not reachable with DVR routers

2018-04-18 Thread Dmitrii Shcherbakov
** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1751396

Title:
  DVR: Inter Tenant Traffic between two networks and connected through a
  shared network not reachable with DVR routers

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  Inter Tenant Traffic between Two Tenants on two different private
  networks connected through a common shared network (created by Admin)
  is not route able through DVR routers

  Steps to reproduce it:

  (NOTE: No external, just shared network)
  This is only reproducable in Multinode scenario. ( 1 Controller - 2 compute ).
  Make sure that the two VMs are isolated in two different computes.

  openstack network create --share shared_net

  openstack subnet create shared_net_sn --network shared_net --subnet-
  range 172.168.10.0/24

  
  openstack network create net_A
  openstack subnet create net_A_sn --network net_A --subnet-range 10.1.0.0/24

  
  openstack network create net_B
  openstack subnet create net_B_sn --network net_B --subnet-range 10.2.0.0/24

  
  openstack router create router_A

  openstack port create --network=shared_net --fixed-ip 
subnet=shared_net_sn,ip-address=172.168.10.20 port_router_A_shared_net
  openstack router add port router_A port_router_A_shared_net
  openstack router add subnet router_A net_A_sn

  openstack router create router_B
  openstack port create --network=shared_net --fixed-ip 
subnet=shared_net_sn,ip-address=172.168.10.30 port_router_B_shared_net
  openstack router add port router_B port_router_B_shared_net
  openstack router add subnet router_B net_B_sn

  openstack server create server_A --flavor m1.tiny --image cirros --nic 
net-id=net_A
  openstack server create server_B --flavor m1.tiny --image cirros --nic 
net-id=net_B

  Add static routes to the router.
  openstack router set router_A --route 
destination=10.1.0.0/24,gateway=172.168.10.20
  openstack router set router_B --route 
destination=10.2.0.0/24,gateway=172.168.10.30
  ```

  Ping from one instance to the other times out

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1751396/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1759971] Re: [dvr][fast-exit] a route to a tenant network does not get created in fip namespace if an external network is attached after a tenant network have been attached (race

2018-04-06 Thread Dmitrii Shcherbakov
Affects Pike and Queens UCA.

** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1759971

Title:
  [dvr][fast-exit] a route to a tenant network does not get created in
  fip namespace if an external network is attached after a tenant
  network have been attached (race condition)

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  Overall, similar scenario to
  https://bugs.launchpad.net/neutron/+bug/1759956 but a different
  problem.

  Relevant agent config options:
  http://paste.openstack.org/show/718418/

  OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack
  charms), 2 external subnets (one routed provider network), 1 tenant
  subnet, all subnets in the same address scope to trigger "fast exit"
  logic.

  Tenant subnet cidr: 192.168.100.0/24

  openstack address scope create dev
  openstack subnet pool create --address-scope dev --pool-prefix 10.232.40.0/21 
--pool-prefix 10.232.16.0/21 dev
  openstack subnet pool create --address-scope dev --pool-prefix 
192.168.100.0/24 tenant
  openstack network create --external --provider-physical-network physnet1 
--provider-network-type flat pubnet
  openstack network segment set --name segment1 
d8391bfb-4466-4a45-972c-45ffcec9f6bc
  openstack network segment create --physical-network physnet2 --network-type 
flat --network pubnet segment2
  openstack subnet create --no-dhcp --subnet-pool dev --subnet-range 
10.232.16.0/21 --allocation-pool start=10.232.17.0,end=10.232.17.255 
--dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet 
--network-segment segment1 pubsubnetl1
  openstack subnet create --gateway 10.232.40.100 --no-dhcp --subnet-pool dev 
--subnet-range 10.232.40.0/21 --allocation-pool 
start=10.232.41.0,end=10.232.41.255 --dns-nameserver 10.232.36.101 --ip-version 
4 --network pubnet --network-segment segment2 pubsubnetl2
  openstack network create --internal --provider-network-type vxlan tenantnet
   openstack subnet create --dhcp --ip-version 4 --subnet-range 
192.168.100.0/24 --subnet-pool tenant --dns-nameserver 10.232.36.101 --network 
tenantnet tenantsubnet

  # ---
  # Works in this order when an external network is attached first

  openstack router create --disable --no-ha --distributed pubrouter
  openstack router set --disable-snat --external-gateway pubnet --enable 
pubrouter

  openstack router add subnet pubrouter tenantsubnet

  2018-03-29 23:30:48.933 2050638 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne
  tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 
'route', 'replace', '192.168.100.0/24', 'via', '169.254.106.114', 'dev', 
'fpr-09fd1
  424-7'] create_process 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

  # --
  # Doesn't work the other way around - as a fip namespace does not get created 
before a tenant network is attached
  openstack router create --disable --no-ha --distributed pubrouter

  openstack router add subnet pubrouter tenantsubnet
  openstack router set --disable-snat --external-gateway pubnet --enable 
pubrouter

  # to "fix" this we need to re-trigger the right code path

  openstack router remove subnet pubrouter tenantsubnet
  openstack router add subnet pubrouter tenantsubnet

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1759971/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1761591] [NEW] [dvr] enable_snat attribute is ignored - centralized snat port gets created

2018-04-05 Thread Dmitrii Shcherbakov
Public bug reported:

OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one
routed provider network), 1 tenant subnet added to a router.

Tenant subnet cidr: 192.168.100.0/24

Relevant agent configs:
http://paste.openstack.org/show/718514/

Commands and outputs:
http://paste.openstack.org/show/rww2iliACb81IbZDUQ9g/

Although a router is created with --disable-snat and enable_snat is
shown as set to "false"

openstack router set --disable-snat --external-gateway pubnet --enable
pubrouter

a centralized snat port is still created for that router:

| device_owner  | network:router_centralized_snat


I suspect this is because _create_snat_interfaces_after_change does not take 
enable_snat into account:
https://github.com/openstack/neutron/blob/stable/queens/neutron/db/l3_dvr_db.py#L160-L168


Additionally, when agent mode is dvr_snat an snat- network 
namespace gets created unconditionally by virtue of DvrEdgeRouter usage:

https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/agent.py#L343-L347
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_edge_router.py#L32-L33

It seems that right now there is a tight dependency on having a dvr_snat
node in a deployment so even if only fast exit(/entry) functionality is
intended to be used, there is no way to completely disable SNAT.

A gateway port is still required to be bound to a dvr_snat node,
however, DvrEdgeRouter could operate differently depending on whether
enable_snat is actually true (to handle updates to this attribute). In
this case a router_centralized_snat port and an snat namespace would
only be created on addition of external gateway information with
enable_snat or on updates that set enable_snat to true.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: cpe-onsite

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1761591

Title:
  [dvr] enable_snat attribute is ignored - centralized snat port gets
  created

Status in neutron:
  New

Bug description:
  OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one
  routed provider network), 1 tenant subnet added to a router.

  Tenant subnet cidr: 192.168.100.0/24

  Relevant agent configs:
  http://paste.openstack.org/show/718514/

  Commands and outputs:
  http://paste.openstack.org/show/rww2iliACb81IbZDUQ9g/

  Although a router is created with --disable-snat and enable_snat is
  shown as set to "false"

  openstack router set --disable-snat --external-gateway pubnet --enable
  pubrouter

  a centralized snat port is still created for that router:

  | device_owner  | network:router_centralized_snat

  
  I suspect this is because _create_snat_interfaces_after_change does not take 
enable_snat into account:
  
https://github.com/openstack/neutron/blob/stable/queens/neutron/db/l3_dvr_db.py#L160-L168

  
  Additionally, when agent mode is dvr_snat an snat- network 
namespace gets created unconditionally by virtue of DvrEdgeRouter usage:

  
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/agent.py#L343-L347
  
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_edge_router.py#L32-L33

  It seems that right now there is a tight dependency on having a
  dvr_snat node in a deployment so even if only fast exit(/entry)
  functionality is intended to be used, there is no way to completely
  disable SNAT.

  A gateway port is still required to be bound to a dvr_snat node,
  however, DvrEdgeRouter could operate differently depending on whether
  enable_snat is actually true (to handle updates to this attribute). In
  this case a router_centralized_snat port and an snat namespace would
  only be created on addition of external gateway information with
  enable_snat or on updates that set enable_snat to true.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1761591/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1759956] Re: [dvr][fast-exit] incorrect policy rules get deleted when a distributed router has ports on multiple tenant networks

2018-04-05 Thread Dmitrii Shcherbakov
Affects pike and queens UCA packages.

** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: neutron (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1759956

Title:
  [dvr][fast-exit] incorrect policy rules get deleted when a distributed
  router has ports on multiple tenant networks

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Confirmed

Bug description:
  TL;DR: ip -4 rule del priority  table  type
  unicast will delete the first matching rule it encounters: if there
  are two rules with the same priority it will just kill the first one
  it finds.

  The original setup is described here:
  https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1759918

  OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack
  charms), 2 external subnets (one routed provider network), 2 tenant
  subnets all in the same address scope to trigger "fast exit".

  2 tenant networks attached (subnets 192.168.100.0/24 and
  192.168.200.0/24) to a DVR:

  # 2 rules as expected
  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule
  0:  from all lookup local 
  32766:  from all lookup main 
  32767:  from all lookup default 
  8:  from 192.168.100.0/24 lookup 16 
  8:  from 192.168.200.0/24 lookup 16 

  # remove 192.168.200.0/24 sometimes deletes an incorrect policy rule
  openstack router remove subnet pubrouter othertenantsubnet

  # ip route del contains the cidr
  2018-03-29 20:09:52.946 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne
  tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 
'route', 'del', '192.168.200.0/24', 'via', '169.254.93.94', 'dev', 
'fpr-4f9ca9ef-3'
  ] create_process 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

  # ip rule delete is not that specific
  2018-03-29 20:09:53.195 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 
'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 
'rule', 'del', 'priority', '8', 'table', '16', 'type', 'unicast'] create_pr
  ocess /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

  
  2018-03-29 20:15:59.210 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 
'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 
'rule', 'show'] create_process 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92
  2018-03-29 20:15:59.455 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 
'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 
'rule', 'add', 'from', '192.168.100.0/24', 'priority', '8', 'table', '16', 
'type', 'unicast'] create_process 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

  

  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule
  0:  from all lookup local 
  32766:  from all lookup main 
  32767:  from all lookup default 
  8:  from 192.168.100.0/24 lookup 16 
  8:  from 192.168.200.0/24 lookup 16 

  # try to delete a rule manually to see what is going on

  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule ; ip netns 
exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip -4 rule del priority 8 
table 16 type unicast ; ip netns exec 
qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule
  0:  from all lookup local 
  32766:  from all lookup main 
  32767:  from all lookup default 
  8:  from 192.168.100.0/24 lookup 16 
  8:  from 192.168.200.0/24 lookup 16 

  0:  from all lookup local 
  32766:  from all lookup main 
  32767:  from all lookup default 
  8:  from 192.168.200.0/24 lookup 16 

  # ^^ 192.168.100.0/24 rule got deleted instead of 192.168.200.0/24

  # add the rule back manually
  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule add from 
192.168.100.0/24 priority 8 table 16 type unicast

  # different order now - 192.168.200.0/24 is first
  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule
  0:  from all lookup local 
  32766:  from all lookup main 
  32767:  from all lookup default 
  8:  from 192.168.200.0/24 lookup 16 
  8:  from 192.168.100.0/24 lookup 16 

  # now 192.168.200.0/24 got deleted because it was first to match

  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule ; ip netns 
exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip -4 rule del priority 8 
table 16 type unicast ; ip netns exec 
qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule
  0:  from all lookup l

[Yahoo-eng-team] [Bug 1761555] [NEW] [dvr][fast-exit] router add/remove subnet operations are not idempotent

2018-04-05 Thread Dmitrii Shcherbakov
Public bug reported:

OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one
routed provider network), 2 tenant subnets, all subnets in the same
address scope to trigger "fast exit" logic.

Tenant subnet cidr: 192.168.100.0/24
Other tenant subnet cidr: 192.168.200.0/24

Relevant agent configs:
http://paste.openstack.org/show/718514/

Commands and outputs:
http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/

Overall, a similar situation as with
https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant
subnet at first for which routes and rules do not get deleted at all.

Problem description:

* router add subnet tenantsubnet
* routes in fip namespace and rules in qrouter namespace get created and a 
distributed port gets created for DVR;
* router remove subnet tenantsubnet
* routes are still there, no new logged events in DVR l3 agent logs

If two networks are added then removing one of them triggers removal of
routes and rules and new messages are logged in l3 agent log (the rules
removed are affected by pad.lv/1759956).

A sequence of add subnet/remove subnet commands may result in errors
logged in l3 agent logs: http://paste.openstack.org/show/718511/


Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a 
proper route is added for a few seconds but then removed:

# just do some operations
(openstack) router add subnet pubrouter tenantsubnet
(openstack) router add subnet pubrouter othertenantsubnet
(openstack) router add subnet pubrouter tenantsubnet
(openstack) router add subnet pubrouter tenantsubnet
(openstack) router remove subnet pubrouter tenantsubnet

# lots of errors, see http://paste.openstack.org/show/718511/

# try again without restarting agents
(openstack) router add subnet pubrouter tenantsubnet # ran client command

# ... got 192.168.100.0/24 here for a few seconds while l3 agent was doing 
something
10.232.16.0/21 dev fg-7f42af4f-ad  proto kernel  scope link  src 10.232.17.5 
169.254.106.114/31 dev fpr-3182a7c6-b  proto kernel  scope link  src 
169.254.106.115 
192.168.100.0/24 via 169.254.106.114 dev fpr-3182a7c6-b 
192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b 

# finished server and l3 agent finished processing "router add subnet pubrouter 
tenantsubnet"
# route got deleted
root@ipotane:~# ip netns exec fip-64ab1ec3-4927-4f09-87f9-804e7f4f8748 ip r
10.232.16.0/21 dev fg-7f42af4f-ad  proto kernel  scope link  src 10.232.17.5 
169.254.106.114/31 dev fpr-3182a7c6-b  proto kernel  scope link  src 
169.254.106.115 
192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b  

There is something wrong with how tenant network add/remove
notifications are sent it seems because on first removal of a tenant
network nothing is logged in l3 agent logs but there is activity in
neutron server logs.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: cpe-onsite

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1761555

Title:
  [dvr][fast-exit] router add/remove subnet operations are not
  idempotent

Status in neutron:
  New

Bug description:
  OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one
  routed provider network), 2 tenant subnets, all subnets in the same
  address scope to trigger "fast exit" logic.

  Tenant subnet cidr: 192.168.100.0/24
  Other tenant subnet cidr: 192.168.200.0/24

  Relevant agent configs:
  http://paste.openstack.org/show/718514/

  Commands and outputs:
  http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/

  Overall, a similar situation as with
  https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant
  subnet at first for which routes and rules do not get deleted at all.

  Problem description:

  * router add subnet tenantsubnet
  * routes in fip namespace and rules in qrouter namespace get created and a 
distributed port gets created for DVR;
  * router remove subnet tenantsubnet
  * routes are still there, no new logged events in DVR l3 agent logs

  If two networks are added then removing one of them triggers removal
  of routes and rules and new messages are logged in l3 agent log (the
  rules removed are affected by pad.lv/1759956).

  A sequence of add subnet/remove subnet commands may result in errors
  logged in l3 agent logs: http://paste.openstack.org/show/718511/

  
  Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a 
proper route is added for a few seconds but then removed:

  # just do some operations
  (openstack) router add subnet pubrouter tenantsubnet
  (openstack) router add subnet pubrouter othertenantsubnet
  (openstack) router add subnet pubrouter tenantsubnet
  (openstack) router add subnet pubrouter tenantsubnet
  (openstack) router remove subnet pubrouter tenantsubnet

  # lots of errors, see http://paste.openstack.org/show/718511/

  # try again without restarting agents
  (openstack) router 

[Yahoo-eng-team] [Bug 1761556] [NEW] [dvr][fast-exit] router add/remove subnet operations are not idempotent

2018-04-05 Thread Dmitrii Shcherbakov
Public bug reported:

OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one
routed provider network), 2 tenant subnets, all subnets in the same
address scope to trigger "fast exit" logic.

Tenant subnet cidr: 192.168.100.0/24
Other tenant subnet cidr: 192.168.200.0/24

Relevant agent configs:
http://paste.openstack.org/show/718514/

Commands and outputs:
http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/

Overall, a similar situation as with
https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant
subnet at first for which routes and rules do not get deleted at all.

Problem description:

* router add subnet tenantsubnet
* routes in fip namespace and rules in qrouter namespace get created and a 
distributed port gets created for DVR;
* router remove subnet tenantsubnet
* routes are still there, no new logged events in DVR l3 agent logs

If two networks are added then removing one of them triggers removal of
routes and rules and new messages are logged in l3 agent log (the rules
removed are affected by pad.lv/1759956).

A sequence of add subnet/remove subnet commands may result in errors
logged in l3 agent logs: http://paste.openstack.org/show/718511/


Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a 
proper route is added for a few seconds but then removed:

# just do some operations
(openstack) router add subnet pubrouter tenantsubnet
(openstack) router add subnet pubrouter othertenantsubnet
(openstack) router add subnet pubrouter tenantsubnet
(openstack) router add subnet pubrouter tenantsubnet
(openstack) router remove subnet pubrouter tenantsubnet

# lots of errors, see http://paste.openstack.org/show/718511/

# try again without restarting agents
(openstack) router add subnet pubrouter tenantsubnet # ran client command

# ... got 192.168.100.0/24 here for a few seconds while l3 agent was doing 
something
10.232.16.0/21 dev fg-7f42af4f-ad  proto kernel  scope link  src 10.232.17.5 
169.254.106.114/31 dev fpr-3182a7c6-b  proto kernel  scope link  src 
169.254.106.115 
192.168.100.0/24 via 169.254.106.114 dev fpr-3182a7c6-b 
192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b 

# finished server and l3 agent finished processing "router add subnet pubrouter 
tenantsubnet"
# route got deleted
root@ipotane:~# ip netns exec fip-64ab1ec3-4927-4f09-87f9-804e7f4f8748 ip r
10.232.16.0/21 dev fg-7f42af4f-ad  proto kernel  scope link  src 10.232.17.5 
169.254.106.114/31 dev fpr-3182a7c6-b  proto kernel  scope link  src 
169.254.106.115 
192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b  

There is something wrong with how tenant network add/remove
notifications are sent it seems because on first removal of a tenant
network nothing is logged in l3 agent logs but there is activity in
neutron server logs.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: cpe-onsite

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1761556

Title:
  [dvr][fast-exit] router add/remove subnet operations are not
  idempotent

Status in neutron:
  New

Bug description:
  OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one
  routed provider network), 2 tenant subnets, all subnets in the same
  address scope to trigger "fast exit" logic.

  Tenant subnet cidr: 192.168.100.0/24
  Other tenant subnet cidr: 192.168.200.0/24

  Relevant agent configs:
  http://paste.openstack.org/show/718514/

  Commands and outputs:
  http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/

  Overall, a similar situation as with
  https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant
  subnet at first for which routes and rules do not get deleted at all.

  Problem description:

  * router add subnet tenantsubnet
  * routes in fip namespace and rules in qrouter namespace get created and a 
distributed port gets created for DVR;
  * router remove subnet tenantsubnet
  * routes are still there, no new logged events in DVR l3 agent logs

  If two networks are added then removing one of them triggers removal
  of routes and rules and new messages are logged in l3 agent log (the
  rules removed are affected by pad.lv/1759956).

  A sequence of add subnet/remove subnet commands may result in errors
  logged in l3 agent logs: http://paste.openstack.org/show/718511/

  
  Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a 
proper route is added for a few seconds but then removed:

  # just do some operations
  (openstack) router add subnet pubrouter tenantsubnet
  (openstack) router add subnet pubrouter othertenantsubnet
  (openstack) router add subnet pubrouter tenantsubnet
  (openstack) router add subnet pubrouter tenantsubnet
  (openstack) router remove subnet pubrouter tenantsubnet

  # lots of errors, see http://paste.openstack.org/show/718511/

  # try again without restarting agents
  (openstack) router 

[Yahoo-eng-team] [Bug 1759971] [NEW] [dvr][fast-exit] a route to a tenant network does not get created in fip namespace if an external network is attached after a tenant network have been attached

2018-03-29 Thread Dmitrii Shcherbakov
Public bug reported:

Overall, similar scenario to
https://bugs.launchpad.net/neutron/+bug/1759956 but a different problem.

OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack
charms), 2 external subnets (one routed provider network), 1 tenant
subnet, all subnets in the same address scope to trigger "fast exit"
logic.

Tenant subnet cidr: 192.168.100.0/24

openstack address scope create dev
openstack subnet pool create --address-scope dev --pool-prefix 10.232.40.0/21 
--pool-prefix 10.232.16.0/21 dev
openstack subnet pool create --address-scope dev --pool-prefix 192.168.100.0/24 
tenant
openstack network create --external --provider-physical-network physnet1 
--provider-network-type flat pubnet
openstack network segment set --name segment1 
d8391bfb-4466-4a45-972c-45ffcec9f6bc
openstack network segment create --physical-network physnet2 --network-type 
flat --network pubnet segment2
openstack subnet create --no-dhcp --subnet-pool dev --subnet-range 
10.232.16.0/21 --allocation-pool start=10.232.17.0,end=10.232.17.255 
--dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet 
--network-segment segment1 pubsubnetl1
openstack subnet create --gateway 10.232.40.100 --no-dhcp --subnet-pool dev 
--subnet-range 10.232.40.0/21 --allocation-pool 
start=10.232.41.0,end=10.232.41.255 --dns-nameserver 10.232.36.101 --ip-version 
4 --network pubnet --network-segment segment2 pubsubnetl2
openstack network create --internal --provider-network-type vxlan tenantnet
 openstack subnet create --dhcp --ip-version 4 --subnet-range 192.168.100.0/24 
--subnet-pool tenant --dns-nameserver 10.232.36.101 --network tenantnet 
tenantsubnet

# ---
# Works in this order when an external network is attached first

openstack router create --disable --no-ha --distributed pubrouter
openstack router set --disable-snat --external-gateway pubnet --enable pubrouter

openstack router add subnet pubrouter tenantsubnet

2018-03-29 23:30:48.933 2050638 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne
tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 'route', 
'replace', '192.168.100.0/24', 'via', '169.254.106.114', 'dev', 'fpr-09fd1
424-7'] create_process 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

# --
# Doesn't work the other way around - as a fip namespace does not get created 
before a tenant network is attached
openstack router create --disable --no-ha --distributed pubrouter

openstack router add subnet pubrouter tenantsubnet
openstack router set --disable-snat --external-gateway pubnet --enable pubrouter

# to "fix" this we need to re-trigger the right code path

openstack router remove subnet pubrouter tenantsubnet
openstack router add subnet pubrouter tenantsubnet

The right code path seems to be in dvr_local_router.py
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L413
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L623-L632

Based on a quick grep nothing in dvr_fip_ns.py calls
internal_network_added so this never gets triggered.

neutron/agent/l3/dvr_edge_ha_router.py|40| def internal_network_added(self, 
port):
neutron/agent/l3/dvr_edge_ha_router.py|41| # Call RouterInfo's 
internal_network_added (Plugs the port, adds IP)
neutron/agent/l3/dvr_edge_ha_router.py|42| 
router_info.RouterInfo.internal_network_added(self, port)
neutron/agent/l3/dvr_edge_router.py|96| def internal_network_added(self, port):
neutron/agent/l3/dvr_edge_router.py|97| super(DvrEdgeRouter, 
self).internal_network_added(port)
neutron/agent/l3/dvr_edge_router.py|110| self._internal_network_added(
neutron/agent/l3/dvr_edge_router.py|142| self._internal_network_added(
neutron/agent/l3/dvr_local_router.py|398| def internal_network_added(self, 
port):
neutron/agent/l3/dvr_local_router.py|399| super(DvrLocalRouter, 
self).internal_network_added(port)
neutron/agent/l3/ha_router.py|331| def internal_network_added(self, port):
neutron/agent/l3/router_info.py|441| def _internal_network_added(self, ns_name, 
network_id, port_id,
neutron/agent/l3/router_info.py|458| def internal_network_added(self, port):
neutron/agent/l3/router_info.py|466| self._internal_network_added(self.ns_name,
neutron/agent/l3/router_info.py|556| self.internal_network_added(p)

https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_fip_ns.py

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: cpe-onsite

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1759971

Title:
  [dvr][fast-exit] a route to a tenant network does not get created in
  fip namespace if an external network is attached after a tenant
  network have been attached

Status in neutron:
  New

Bug description:
  Overall, similar scenario to
  https://b

[Yahoo-eng-team] [Bug 1759956] [NEW] [dvr][fast-exit] incorrect policy rules get deleted when a distributed router has ports on multiple tenant networks

2018-03-29 Thread Dmitrii Shcherbakov
32767:  from all lookup default 
8:  from 192.168.200.0/24 lookup 16 
8:  from 192.168.100.0/24 lookup 16 

0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 
8:  from 192.168.100.0/24 lookup 16 


Code:

_dvr_internal_network_removed
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L431-L443

_delete_interface_routing_rule_in_router_ns
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L642-L648
ip_rule = ip_lib.IPRule(namespace=self.ns_name)
for subnet in router_port['subnets']:
rtr_port_cidr = subnet['cidr']
ip_rule.rule.delete(ip=rtr_port_cidr,
table=dvr_fip_ns.FIP_RT_TBL,
priority=dvr_fip_ns.FAST_PATH_EXIT_PR)

IpRuleCommand
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_lib.py#L486-L494

# TODO(Carl) ip ignored in delete, okay in general?

He-he, experience shows that definitely not.

We need to use the most specific rule description to avoid ordering
issues.

ip -4 rule del from 192.168.200.0/24 priority 8 table 16 type
unicast

With a fix it looks like this:

2018-03-29 20:58:57.023 192084 DEBUG neutron.agent.linux.utils [-]
Running command: ['sudo', 'neutron-rootwrap',
'/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef-
303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'del', 'from',
'192.168.200.0/24', 'priority', '8', 'table', '16', 'type',
'unicast'] create_process /usr/lib/python2.7/dist-
packages/neutron/agent/linux/utils.py:92

** Affects: neutron
 Importance: Undecided
 Assignee: Dmitrii Shcherbakov (dmitriis)
 Status: In Progress


** Tags: cpe-onsite

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1759956

Title:
  [dvr][fast-exit] incorrect policy rules get deleted when a distributed
  router has ports on multiple tenant networks

Status in neutron:
  In Progress

Bug description:
  TL;DR: ip -4 rule del priority  table  type
  unicast will delete the first matching rule it encounters: if there
  are two rules with the same priority it will just kill the first one
  it finds.

  The original setup is described here:
  https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1759918

  OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack
  charms), 2 external subnets (one routed provider network), 2 tenant
  subnets all in the same address scope to trigger "fast exit".

  2 tenant networks attached (subnets 192.168.100.0/24 and
  192.168.200.0/24) to a DVR:

  # 2 rules as expected
  ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule
  0:  from all lookup local 
  32766:  from all lookup main 
  32767:  from all lookup default 
  8:  from 192.168.100.0/24 lookup 16 
  8:  from 192.168.200.0/24 lookup 16 

  # remove 192.168.200.0/24 sometimes deletes an incorrect policy rule
  openstack router remove subnet pubrouter othertenantsubnet

  # ip route del contains the cidr
  2018-03-29 20:09:52.946 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne
  tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 
'route', 'del', '192.168.200.0/24', 'via', '169.254.93.94', 'dev', 
'fpr-4f9ca9ef-3'
  ] create_process 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

  # ip rule delete is not that specific
  2018-03-29 20:09:53.195 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 
'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 
'rule', 'del', 'priority', '8', 'table', '16', 'type', 'unicast'] create_pr
  ocess /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92

  
  2018-03-29 20:15:59.210 2083594 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 
'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 
'rule', 'show'] create_process 
/usr/lib/python2.7/dist-packages

[Yahoo-eng-team] [Bug 1759120] [NEW] Objects are not returned if domain name is used instead of domain id

2018-03-26 Thread Dmitrii Shcherbakov
Public bug reported:

# OS_USERNAME=user OS_USER_DOMAIN_NAME=admin_domain OS_PROJECT_NAME=admin 
# OS_PROJECT_DOMAIN_NAME=admin_domain
openstack user list --domain testdomain -> users returned for testdomain

# OS_USERNAME=user OS_USER_DOMAIN_NAME=testdomain OS_DOMAIN_NAME=testdomain + 
policy file modification
openstack user list --domain 49a912df2669410faecc6e0ab5d8dc80 -> users returned 
for testdomain
openstack user list --domain testdomain -> no users returned for testdomain

The same is valid for projects and roles. Role assignments have slightly
different policy rules in a sample file.

Environment: OpenStack Pike (UCA) + a slightly modified
https://github.com/openstack/keystone/blob/stable/pike/etc/policy.v3cloudsample.json
file:

https://paste.ubuntu.com/p/Zk7S7d7qm2/
"admin_and_matching_domain_id": "rule:admin_required and 
(domain_id:%(domain_id)s or domain_name:%(domain_id)s)",

domain_name:%(domain_id)s - this was added to allow usage of --domain
, not just ID as documented, e.g. here
https://docs.openstack.org/python-openstackclient/pike/cli/command-
objects/user.html#cmdoption-user-create-domain ("--domain 
Default domain (name or ID)")

https://paste.ubuntu.com/p/D35vMMbdTm/ - the first part of this is a
demonstration that a policy file is not enough to use --domain  without policy file modification in a non-admin project, the
second part is a demonstration of the problem after policy file
modification.

The domain_name is taken from auth_context and matched against domain_id
API call argument as described here
https://docs.openstack.org/keystone/pike/admin/identity-service-api-
protection.html

Debug mode traces for 3 different scenarios:
https://paste.ubuntu.com/p/8ntVt69tYy/


I can see that the whole Admin scoping and policy enforcement implementation is 
being reworked [0][1][2][3] and UUID tokens were deprecated in Pike so 
"domain_name" usage from auth context is not a reliable thing to do [4]. If my 
understanding is correct, please duplicate or "won't fix" this and let this be 
a reference for others to look at. Usage of --domain argument with a domain 
name instead of a domain_id is a bit inconsistent in how it's documented in OSC 
docs because it seems to only work for the admin user with admin project scoped 
tokens (provided that sample policy files are used).


[0] pad.lv/1750673
[1] https://review.openstack.org/#/c/526203/
[2] 
https://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/role-check-from-middleware.html
[3] pad.lv/968696
[4] https://review.openstack.org/#/c/525325/

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1759120

Title:
  Objects are not returned if domain name is used instead of domain id

Status in OpenStack Identity (keystone):
  New

Bug description:
  # OS_USERNAME=user OS_USER_DOMAIN_NAME=admin_domain OS_PROJECT_NAME=admin 
  # OS_PROJECT_DOMAIN_NAME=admin_domain
  openstack user list --domain testdomain -> users returned for testdomain

  # OS_USERNAME=user OS_USER_DOMAIN_NAME=testdomain OS_DOMAIN_NAME=testdomain + 
policy file modification
  openstack user list --domain 49a912df2669410faecc6e0ab5d8dc80 -> users 
returned for testdomain
  openstack user list --domain testdomain -> no users returned for testdomain

  The same is valid for projects and roles. Role assignments have
  slightly different policy rules in a sample file.

  Environment: OpenStack Pike (UCA) + a slightly modified
  
https://github.com/openstack/keystone/blob/stable/pike/etc/policy.v3cloudsample.json
  file:

  https://paste.ubuntu.com/p/Zk7S7d7qm2/
  "admin_and_matching_domain_id": "rule:admin_required and 
(domain_id:%(domain_id)s or domain_name:%(domain_id)s)",

  domain_name:%(domain_id)s - this was added to allow usage of --domain
  , not just ID as documented, e.g. here
  https://docs.openstack.org/python-openstackclient/pike/cli/command-
  objects/user.html#cmdoption-user-create-domain ("--domain 
  Default domain (name or ID)")

  https://paste.ubuntu.com/p/D35vMMbdTm/ - the first part of this is a
  demonstration that a policy file is not enough to use --domain
   without policy file modification in a non-admin project,
  the second part is a demonstration of the problem after policy file
  modification.

  The domain_name is taken from auth_context and matched against
  domain_id API call argument as described here
  https://docs.openstack.org/keystone/pike/admin/identity-service-api-
  protection.html

  Debug mode traces for 3 different scenarios:
  https://paste.ubuntu.com/p/8ntVt69tYy/

  
  I can see that the whole Admin scoping and policy enforcement implementation 
is being reworked [0][1][2][3] and UUID tokens were deprecated in Pike so 
"domain_name" usage from auth context is not a reliable thing to do [4]. If my 
understanding is c