[Yahoo-eng-team] [Bug 2024481] Re: [ndr] neutron-bgp-dragent is racy when a service restart is made just before a speaker is added
** Description changed: Hit a race with the Antelope (22.0.0) version of NDR in one of our functional test runs: 1) neutron-bgp-dragent got restarted right before creating a speaker and adding an external network and tenant network to it; - 2) As can be seen in the service log below, just after neutron-bgp-dragent started, it tried to advertise a route (00:03:21.766) before a speaker got added to it (00:03:22.251). + 2) As can be seen in the service log below, just after neutron-bgp-dragent started, it tried to advertise a route (00:03:21.766) before a speaker got added to it (00:03:22.251) - it failed with the `BgpSpeakerNotAdded` exception: https://github.com/openstack/neutron-dynamic-routing/blob/13e0d8a63dbdbd9e1a863144999794d4fc9af22d/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L150-L154 3) As a result, the peer (FRR in our case) only got a floating IP route (/32) in the test result in the tenant network route (/24) was never advertised. Test steps (downstream) that generated the log lines: https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/configure/bgp_speaker.py#L65-L100 The service restart is done prior to calling the test code above (notably, it was done as a workaround for something else but inadvertently helped to trigger this edge case): https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/charm_tests/dragent/configure.py#L92-L103 The lack of a route at the peer side can be seen at 2023-06-19 00:03:32 here: https://openstack-ci-reports.ubuntu.com/artifacts/e4c/886157/8/check/jammy-antelope-ovn/e4c9b5d/job-output.txt 2023-06-19 00:03:32.346994 | focal-medium | 2023-06-19 00:03:32.347012 | focal-medium | B>* 100.64.0.144/32 [20/0] via 172.16.27.207, ens3, weight 1, 00:00:07 2023-06-19 00:03:32.347045 | focal-medium | Summary: It looks like neutron-bgp-dragent may try to advertise routes it gets from a DB before a speaker is added by it. It should properly make sure a speaker is present before trying to advertise routes. If speakers aren't scheduled to it yet, it should attempt to advertise as soon as one is present on it. --- Functional test log: 2023-06-19 00:03:19.709430 | focal-medium | 2023-06-19 00:03:19 [INFO] Setting up BGP speaker 2023-06-19 00:03:20.307141 | focal-medium | 2023-06-19 00:03:20 [INFO] Creating BGP Speaker 2023-06-19 00:03:20.434428 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising BGP routes 2023-06-19 00:03:20.678231 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising ext_net network on BGP Speaker bgp-speaker 2023-06-19 00:03:20.919232 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising private network on BGP Speaker bgp-speaker 2023-06-19 00:03:21.155337 | focal-medium | 2023-06-19 00:03:21 [INFO] Setting up BGP peer 2023-06-19 00:03:22.099859 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating BGP Peer 2023-06-19 00:03:22.142524 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding BGP peer to BGP speaker 2023-06-19 00:03:22.143374 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding peer osci-frr on BGP Speaker bgp-speaker 2023-06-19 00:03:22.208265 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating floating IP to advertise 2023-06-19 00:03:22.301280 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating port: NDR_TEST_FIP 2023-06-19 00:03:23.599942 | focal-medium | 2023-06-19 00:03:23 [INFO] Creating floatingip 2023-06-19 00:03:26.351808 | focal-medium | 2023-06-19 00:03:26 [INFO] Advertised floating IP: 100.64.0.144 neutron-bgp-dragent.log: 2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] Logging enabled! 2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] /usr/bin/neutron-bgp-dragent version 22.0.0 2023-06-19 00:03:21.533 26428 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] Initializing os-ken driver for BGP functionality. 2023-06-19 00:03:21.533 26428 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] Initialized os-ken BGP Speaker driver interface with bgp_router_id=172.16.0.46 2023-06-19 00:03:21.578 26428 INFO neutron_dynamic_routing.services.bgp.agent.bgp_dragent [-] BGP dynamic routing agent started 2023-06-19 00:03:21.748 26428 INFO bgpspeaker.api.base [None req-3e563ce5-7b78-46d2-9dd3-02067da4e197 - - - - - -] API method core.start called with args: {'waiter': , 'local_as': 4279238701, 'router_id': '172.16.0.46', 'bgp_server_hosts': ('0.0.0.0', '::'), 'bgp_server_port': 0, 'refresh_stalepath_time': 0, 'refresh_max_eor_time': 0, 'label_range': (100, 10), 'allow_local_as_in_count': 0, 'cluster_id': None, 'local_pref': 100} 2023-06-19 00:03:21.766 26428 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [None req-f082fe6d-cb70-4761-b02d-19f38bda7ae2 - -
[Yahoo-eng-team] [Bug 2024481] [NEW] [ndr] neutron-bgp-dragent is racy when a service restart is made just before a speaker is added
Public bug reported: Hit a race with the Antelope (22.0.0) version of NDR in one of our functional test runs: 1) neutron-bgp-dragent got restarted right before creating a speaker and adding an external network and tenant network to it; 2) As can be seen in the service log below, just after neutron-bgp-dragent started, it tried to advertise a route (00:03:21.766) before a speaker got added to it (00:03:22.251) - it failed with the `BgpSpeakerNotAdded` exception: https://github.com/openstack/neutron-dynamic-routing/blob/13e0d8a63dbdbd9e1a863144999794d4fc9af22d/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L150-L154 3) As a result, the peer (FRR in our case) only got a floating IP route (/32) in the test result in the tenant network route (/24) was never advertised. Test steps (downstream) that generated the log lines: https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/configure/bgp_speaker.py#L65-L100 The service restart is done prior to calling the test code above (notably, it was done as a workaround for something else but inadvertently helped to trigger this edge case): https://github.com/openstack-charmers/zaza-openstack-tests/blob/edd7717dc2ca300cfb94729d9d6bb7021787906c/zaza/openstack/charm_tests/dragent/configure.py#L92-L103 The lack of a route at the peer side can be seen at 2023-06-19 00:03:32 here: https://openstack-ci-reports.ubuntu.com/artifacts/e4c/886157/8/check/jammy-antelope-ovn/e4c9b5d/job-output.txt 2023-06-19 00:03:32.346994 | focal-medium | 2023-06-19 00:03:32.347012 | focal-medium | B>* 100.64.0.144/32 [20/0] via 172.16.27.207, ens3, weight 1, 00:00:07 2023-06-19 00:03:32.347045 | focal-medium | Summary: It looks like neutron-bgp-dragent may try to advertise routes it gets from a DB before a speaker is added by it. It should properly make sure a speaker is present before trying to advertise routes. If speakers aren't scheduled to it yet, it should attempt to advertise as soon as one is present on it. --- Functional test log: 2023-06-19 00:03:19.709430 | focal-medium | 2023-06-19 00:03:19 [INFO] Setting up BGP speaker 2023-06-19 00:03:20.307141 | focal-medium | 2023-06-19 00:03:20 [INFO] Creating BGP Speaker 2023-06-19 00:03:20.434428 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising BGP routes 2023-06-19 00:03:20.678231 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising ext_net network on BGP Speaker bgp-speaker 2023-06-19 00:03:20.919232 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising private network on BGP Speaker bgp-speaker 2023-06-19 00:03:21.155337 | focal-medium | 2023-06-19 00:03:21 [INFO] Setting up BGP peer 2023-06-19 00:03:22.099859 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating BGP Peer 2023-06-19 00:03:22.142524 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding BGP peer to BGP speaker 2023-06-19 00:03:22.143374 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding peer osci-frr on BGP Speaker bgp-speaker 2023-06-19 00:03:22.208265 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating floating IP to advertise 2023-06-19 00:03:22.301280 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating port: NDR_TEST_FIP 2023-06-19 00:03:23.599942 | focal-medium | 2023-06-19 00:03:23 [INFO] Creating floatingip 2023-06-19 00:03:26.351808 | focal-medium | 2023-06-19 00:03:26 [INFO] Advertised floating IP: 100.64.0.144 neutron-bgp-dragent.log: 2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] Logging enabled! 2023-06-19 00:03:20.751 26428 INFO neutron.common.config [-] /usr/bin/neutron-bgp-dragent version 22.0.0 2023-06-19 00:03:21.533 26428 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] Initializing os-ken driver for BGP functionality. 2023-06-19 00:03:21.533 26428 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] Initialized os-ken BGP Speaker driver interface with bgp_router_id=172.16.0.46 2023-06-19 00:03:21.578 26428 INFO neutron_dynamic_routing.services.bgp.agent.bgp_dragent [-] BGP dynamic routing agent started 2023-06-19 00:03:21.748 26428 INFO bgpspeaker.api.base [None req-3e563ce5-7b78-46d2-9dd3-02067da4e197 - - - - - -] API method core.start called with args: {'waiter': , 'local_as': 4279238701, 'router_id': '172.16.0.46', 'bgp_server_hosts': ('0.0.0.0', '::'), 'bgp_server_port': 0, 'refresh_stalepath_time': 0, 'refresh_max_eor_time': 0, 'label_range': (100, 10), 'allow_local_as_in_count': 0, 'cluster_id': None, 'local_pref': 100} 2023-06-19 00:03:21.766 26428 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [None req-f082fe6d-cb70-4761-b02d-19f38bda7ae2 - - - - - -] Call to driver for BGP Speaker 04d9b59c-e4b9-4756-92b3-df364fa7bd0d advertise_route has failed with exception BGP Speaker for local_as=4279238701 with router_id=172.16.0.46 not added yet..: neutron_dynamic_routing.services.bgp.agent.driver.exceptions.BgpSpeakerNotAdded: BGP Speaker
[Yahoo-eng-team] [Bug 1959666] Re: Neutron-dynamic-routing does not work with OVN
When it comes to the NDR charm we enabled it in the charms (neutron-api- plugin-ovn specifically needed a code change) documenting those limitations in the charm-guide. https://review.opendev.org/q/topic:2023-enable-ndr https://review.opendev.org/q/topic:2023-ovn-ndr Also we are adding some data plane testing to make sure that the advertised routes are actually possible to reach. https://review.opendev.org/c/openstack/charm-neutron-dynamic- routing/+/886157 ** Also affects: charm-neutron-api-plugin-ovn Importance: Undecided Status: New ** Changed in: charm-neutron-api-plugin-ovn Status: New => Fix Committed ** Changed in: charm-neutron-dynamic-routing Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1959666 Title: Neutron-dynamic-routing does not work with OVN Status in OpenStack Neutron API OVN Plugin Charm: Fix Committed Status in OpenStack Neutron Dynamic Routing charm: In Progress Status in Ubuntu Cloud Archive: New Status in neutron: Fix Released Bug description: When using OVN as Neutron backend, announcing prefixes with neutron- dynamic-routing is currently not working due to changes in the database structure. Some attempt to fix this has been made in https://review.opendev.org/c/openstack/neutron-dynamic- routing/+/814055 but wasn't successful. This is a major stop gap for production deployments which are using BGP to provide connectivity for IPv6 subnets in tenant networks. To manage notifications about this bug go to: https://bugs.launchpad.net/charm-neutron-api-plugin-ovn/+bug/1959666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2022058] [NEW] [ovn] l3ha and disitributed router extra attributes do not reflect OVN state
Public bug reported: With https://bugs.launchpad.net/neutron/+bug/1995974 fixed and https://review.opendev.org/c/openstack/neutron/+/864051 merged extra attributes such as `distributed` and `ha` are now created for OVN routers as well. Their default values are taken from the global configuration options more relevant for default L3 service plugin implementation based on Linux network namespaces https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/db/l3_attrs_db.py#L24-L27 https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_hamode_db.py#L21 https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_dvr_db.py#L19-L27 as opposed to relying on the OVN-specific options. For instance, it order to enable the support for distributed floating IPs there is an OVN-specific global option that enables this mode for all OVN routers: https://github.com/openstack/neutron/blob/598fcb437a0ad3d564435799c70f38429ab4f0eb/neutron/conf/plugins/ml2/drivers/ovn/ovn_conf.py#L133-L140 As a result, OVN routers now have the `distributed` property set to `False` by default (unless the global ML2/ovs-specific default is changed) and it does not reflect the state of the `ovn/enable_distributed_floating_ip` option. It can also be changed via the API on the router without any apparent effect. The ML2/ovs and ML2/ovn comparison docs still refer to OVN-based router having no `l3ha` or `distributed` attributes whereas this is not the case anymore: https://github.com/openstack/neutron/blame/cd66232c2b26cb4141c2e9426ce2dec0f38c364c/doc/source/ovn/faq/index.rst#L16-L29 One place where it becomes relevant is the neutron-dynamic-routing project which relies on the `distributed` property to determine whether to add /32 routes with next-hops set to a router gateway port IP (centralized FIPs case) or not (distributed FIPs case). https://github.com/openstack/neutron-dynamic-routing/blob/513ea649be9fd652b0c5b391167f851bc3d653bb/neutron_dynamic_routing/db/bgp_db.py#L564 https://github.com/openstack/neutron-dynamic-routing/blob/513ea649be9fd652b0c5b391167f851bc3d653bb/neutron_dynamic_routing/db/bgp_db.py#L567-L580 For distributed routers the logic is such that IP addresses of ports with a device owner set to `floatingip_agent_gateway` are used as a next hop for /32 routes, however, the OVN-based L3 service plugin implementation (OVNL3RouterPlugin) does not create those on a per external network bases much like the core L3RouterPlugin-based implementation does with DVR. As a result, if an operator uses distributed FIPs with OVN with the router attribute `distributed == False`, neutron-dynamic-routing will advertise /32 routes with the centralized FIP logic (the southbound traffic would go via the router gateway port). On the other hand, if an operator uses distributed FIPs with OVN with the router attribute `distributed == True`, neutron-dynamic-routing will not advertise anything because the centralized routes will not be added as the router seems to be distributed whereas there are no `floatingip_agent_gateway` ports created with OVN. There are at least two outputs to expect as a fix: 1) Make sure the distributed state is reflected correctly for OVN routers based on the OVN-specific config option; 2) Fix neutron-dynamic routing to still create centralized /32 routes if there are not any `floatingip_agent_gateway` ports OR change the OVN implementation to create those for the purposes of direct southbound routing purposes. ** Affects: neutron Importance: Undecided Status: New ** Tags: bgp ndr neutron-dynamic-routing ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2022058 Title: [ovn] l3ha and disitributed router extra attributes do not reflect OVN state Status in neutron: New Bug description: With https://bugs.launchpad.net/neutron/+bug/1995974 fixed and https://review.opendev.org/c/openstack/neutron/+/864051 merged extra attributes such as `distributed` and `ha` are now created for OVN routers as well. Their default values are taken from the global configuration options more relevant for default L3 service plugin implementation based on Linux network namespaces https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/db/l3_attrs_db.py#L24-L27 https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_hamode_db.py#L21 https://github.com/openstack/neutron/blob/0de6a4d620f1cb780c6a3635e10406b0db97762a/neutron/conf/db/l3_dvr_db.py#L19-L27 as opposed to relying on the OVN-specific options. For instance, it order to enable the support for distributed floating IPs there is an OVN-specific global option that enables this mode for all OVN routers: https://github.com
[Yahoo-eng-team] [Bug 2003842] [NEW] [OVN] A route inferred from a subnet's default gateway is not added to ovn-nb if segment_id is not None for a subnet
Public bug reported: Context: * Neutron is configured to use OVN * An external provider network with one segment is created * A subnet with a default gateway IP set is associated with this segment explicitly (segment_id != None) * A router's gateway port is set to use the provider network (external_gateway_info is set with a network_id passed) Result: OVN NB does not contain a default route and instance traffic is blackholed. -- Detailed description: The first time a external gateway info is set as follows $ openstack router set --external-gateway pubnet r1 does not result in OVN getting a default route with the next-hop set to the subnet's gateway IP: $ sudo ovn-nbctl list logical_router_static_route ; echo $? 0 Doing it twice in a row does (the default route appears in the table after the second command): $ openstack router set --external-gateway pubnet r1 && openstack router set --external-gateway pubnet r1 $ sudo ovn-nbctl list logical_router_static_route _uuid : df7c6020-83e7-446c-8f5c-31db96eb2dd3 bfd : [] external_ids: {"neutron:is_ext_gw"="true", "neutron:subnet_id"="abdae752-034c-4845-b6b3-92bf40cf24a6"} ip_prefix : "0.0.0.0/0" nexthop : "10.1.1.1" options : {} output_port : [] policy : [] route_table : "" The inferred route is normally installed by this portion of code: https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1264-L1279 Based on the result from _get_gw_info: https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1197-L1204 `_get_gw_info` returns an empty list since `external_fixed_ips` is an empty list: self._l3_plugin.get_router(context, 'd51ec4b0-c847-41e0-b43d-5dbf8ddcca32') {'id': 'd51ec4b0-c847-41e0-b43d-5dbf8ddcca32', 'name': 'r1', 'tenant_id': 'dbfcc6c6a50f481685fda546abd00cd3', 'admin_state_up': True, 'status': 'ACTIVE', 'external_gateway_info': {'network_id': 'eef0120b-d01f-4cf7-9d1a-65f1da1eb67c', 'external_fixed_ips': [], 'enable_snat': True}, 'gw_port_id': '2da99728-b04e-4a4f-ac6f-d0930de8264a', 'description': '', 'availability_zones': [], 'distributed': False, 'ha': False, 'ha_vr_id': 0, 'availability_zone_hints': [], 'routes': [], 'tags': [], 'created_at': '2023-01-20T09:45:55Z', 'updated_at': '2023-01-24T12:44:14Z', 'revision_number': 35, 'project_id': 'dbfcc6c6a50f481685fda546abd00cd3'} Meanwhile, the `external_fixed_ips` field is empty because of the deferred IPAM logic triggered by the presence of `segment_id != None` for the subnet on the external network. Based on this logic, the port is unbound and does not get an IP allocation until a port update & port binding: https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/objects/subnet.py#L341-L343 (subnets attached to segments are excluded if a host isn't known) https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/objects/subnet.py#L481-L486 (ipam_exceptions.DeferIpam is raised) https://github.com/openstack/neutron/blob/21927e79075ce0f3e521e56fca0bed8f1de61066/neutron/db/db_base_plugin_v2.py#L1472-L1478 (DeferIpam is caught and the port gets IP_ALLOCATION_NONE for its IP allocation as it has no fixed ips. Port state after it gets created in the unbound state (the code trying to add a default route is trying to find fixed IPs at the same time the gateway port is unbound and does not have any): openstack port list --router r1 +--+--+---+++ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--+--+---+++ | 2da99728-b04e-4a4f-ac6f-d0930de8264a | | fa:16:3e:eb:cf:76 | | DOWN | | 97d604f2-addb-46b8-9eaf-745257dddb2f | | fa:16:3e:c8:73:8b | ip_address='192.168.0.1', subnet_id='89227e7b-d2b0-4953-afe7-2b471736f85a' | ACTIVE | +--+--+---+++ openstack port show 2da99728-b04e-4a4f-ac6f-d0930de8264a +-+--+ | Field | Value| +-+--+ | admin_state_up | UP | | allowed_address_pairs | | | bind
[Yahoo-eng-team] [Bug 2002687] [NEW] [RFE] Active-active L3 Gateway with Multihoming
Public bug reported: Some network designs include multiple L3 gateways to: * Share the load across different gateways; * Provide independent network paths for the north-south direction (e.g. via different ISPs). Having multi-homing implemented at the instance level imposes additional burden on the end user of a cloud and support requirements for the guest OS, whereas utilizing ECMP and BFD at the router side alleviates the need for instance-side awareness of a more complex routing setup. Adding more than one gateway port implies extending the existing data model which was described in the multiple external gateways spec (https://specs.openstack.org/openstack/neutron-specs/specs/xena/multiple-external-gateways.html). However, it left adding additional gateway routes out of scope leaving this to future improvements around dynamic routing. Also the focus of neutron-dynamic-routing has so far been around advertising routes, not accepting new ones from the external peers - so dynamic routing support like this is a very different subject. However, manual addition of extra routes does not utilize the default gateway IP information available from subnets in Neutron while this could be addressed by implementing an extra conditional behavior when adding more than one gateway port to a router. ECMP routes can result in black-holing of traffic should the next-hop of a route becomes unreachable. BFD is a standard protocol adopted by IETF for next-hop failure detection which can be used for route eviction. OVN supports BFD as of v21.03.0 (https://github.com/ovn-org/ovn/commit/6e0a69ad4bcdf9e4cace5c73ef48ab06065e8519) with a data model that allows enabling BFD on a per next-hop basis by associating BFD session information with routes, however, it is not modeled at the Neutron level even if a backend supports it. >From the Neutron data model perspective, ECMP for routes is already a supported concept since ECMP support spec got implemented (https://specs.openstack.org/openstack/neutron-specs/specs/wallaby/l3-router-support-ecmp.html) in Wallaby (albeit the spec focused on the L3-agent based implementation only). As for OVN and BFD, the OVN database state needs to be populated by Neutron based on the data from the Neutron database, therefore, data model changes to the Neutron DB are needed to represent the BFD session parameters. --- The previous work on multiple gateway ports did not get completed and the neutron-lib changes were reverted. Likewise, the scope of this RFE is bigger with some overlap and augmentation compared to prior art. The spec will follow for this RFE with more details as to how the data model and API changes are proposed to be made. ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2002687 Title: [RFE] Active-active L3 Gateway with Multihoming Status in neutron: New Bug description: Some network designs include multiple L3 gateways to: * Share the load across different gateways; * Provide independent network paths for the north-south direction (e.g. via different ISPs). Having multi-homing implemented at the instance level imposes additional burden on the end user of a cloud and support requirements for the guest OS, whereas utilizing ECMP and BFD at the router side alleviates the need for instance-side awareness of a more complex routing setup. Adding more than one gateway port implies extending the existing data model which was described in the multiple external gateways spec (https://specs.openstack.org/openstack/neutron-specs/specs/xena/multiple-external-gateways.html). However, it left adding additional gateway routes out of scope leaving this to future improvements around dynamic routing. Also the focus of neutron-dynamic-routing has so far been around advertising routes, not accepting new ones from the external peers - so dynamic routing support like this is a very different subject. However, manual addition of extra routes does not utilize the default gateway IP information available from subnets in Neutron while this could be addressed by implementing an extra conditional behavior when adding more than one gateway port to a router. ECMP routes can result in black-holing of traffic should the next-hop of a route becomes unreachable. BFD is a standard protocol adopted by IETF for next-hop failure detection which can be used for route eviction. OVN supports BFD as of v21.03.0 (https://github.com/ovn-org/ovn/commit/6e0a69ad4bcdf9e4cace5c73ef48ab06065e8519) with a data model that allows enabling BFD on a per next-hop basis by associating BFD session information with routes, however, it is not modeled at the Neutron level even if a backend supports it. From the Neutron data model perspective, ECMP for routes is already a
[Yahoo-eng-team] [Bug 1973276] Re: OVN port loses its virtual type after port update
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1973276 Title: OVN port loses its virtual type after port update Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: Bug found in Octavia (master) Octavia creates at least 2 ports for each load balancer: - the VIP port, it is down, it keeps/stores the IP address of the LB - the VRRP port, plugged into a VM, it has the VIP address in the allowed-address list (and the VIP address is configured on the interface in the VM) When sending an ARP request for the VIP address, the VRRP port should reply with its mac-address. In OVN the VIP port is marked as "type: virtual". But when the VIP port is updated, it loses its "port: virtual" status and that breaks the ARP resolution (OVN replies to the ARP request by sending the mac-address of the VIP port - which is not used/down). Quick reproducer that simulates the Octavia behavior: === import subprocess import time import openstack conn = openstack.connect(cloud="devstack-admin-demo") network = conn.network.find_network("public") sg = conn.network.find_security_group('sg') if not sg: sg = conn.network.create_security_group(name='sg') vip_port = conn.network.create_port( name="lb-vip", network_id=network.id, device_id="lb-1", device_owner="me", is_admin_state_up=False) vip_address = [ fixed_ip['ip_address'] for fixed_ip in vip_port.fixed_ips if '.' in fixed_ip['ip_address']][0] vrrp_port = conn.network.create_port( name="lb-vrrp", device_id="vrrp", device_owner="vm", network_id=network.id) vrrp_port = conn.network.update_port( vrrp_port, allowed_address_pairs=[ {"ip_address": vip_address, "mac_address": vrrp_port.mac_address}]) time.sleep(1) output = subprocess.check_output( f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'", shell=True) output = output.decode('utf-8') if 'type: virtual' in output: print("Port is virtual, this is ok.") print(output) conn.network.update_port( vip_port, security_group_ids=[sg.id]) time.sleep(1) output = subprocess.check_output( f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'", shell=True) output = output.decode('utf-8') if 'type: virtual' not in output: print("Port is not virtual, this is an issue.") print(output) === In my env (devstack master on c9s): $ python3 /mnt/host/virtual_port_issue.py Port is virtual, this is ok. port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip) type: virtual addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"] Port is not virtual, this is an issue. port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip) addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"] port 8ec36278-82b1-436b-bc5e-ea03ef22192f In Octavia, the "port: virtual" is _sometimes_ back after other updates of the ports, but in some cases the LB is unreachable. (and "ovn-nbctl lsp-set-type virtual" fixes the LB) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1973276/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1964995] [NEW] [yoga][regression] network capabilities in extra info are overridden if vpd is present for a PCI device
Public bug reported: VPD capability handling was added in https://opendev.org/openstack/nova/commit/ab49f97b2c08294234c7bfd3dedb75780ca519e6 and now does a device dict update as follows https://opendev.org/openstack/nova/src/commit/dde15d9c475c8ef709578310d304c9d8ecb9d493/nova/virt/libvirt/host.py#L1428 device.update(_get_device_capabilities(device, dev, net_devs)) device.update(_get_vpd_details(device, dev, pci_devs)) Which results in, for example, this content in the capabilities field: 'capabilities': {'vpd': {'card_serial_number': 'testserial'}}, instead of this 'capabilities': {'network': ['rx', 'tx', 'sg', 'tso', 'gso', 'gro', 'rxvlan', 'txvlan'], 'vpd': {'card_serial_number': 'testserial'}} This is a regression from the earlier behavior, however, current unit and functional tests do not cover this. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1964995 Title: [yoga][regression] network capabilities in extra info are overridden if vpd is present for a PCI device Status in OpenStack Compute (nova): New Bug description: VPD capability handling was added in https://opendev.org/openstack/nova/commit/ab49f97b2c08294234c7bfd3dedb75780ca519e6 and now does a device dict update as follows https://opendev.org/openstack/nova/src/commit/dde15d9c475c8ef709578310d304c9d8ecb9d493/nova/virt/libvirt/host.py#L1428 device.update(_get_device_capabilities(device, dev, net_devs)) device.update(_get_vpd_details(device, dev, pci_devs)) Which results in, for example, this content in the capabilities field: 'capabilities': {'vpd': {'card_serial_number': 'testserial'}}, instead of this 'capabilities': {'network': ['rx', 'tx', 'sg', 'tso', 'gso', 'gro', 'rxvlan', 'txvlan'], 'vpd': {'card_serial_number': 'testserial'}} This is a regression from the earlier behavior, however, current unit and functional tests do not cover this. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1964995/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1884723] Re: [OVS] multicast between VM instances on different compute nodes is broken with IGMP snooping enabled
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1884723 Title: [OVS] multicast between VM instances on different compute nodes is broken with IGMP snooping enabled Status in neutron: In Progress Status in neutron package in Ubuntu: New Bug description: It was originally reported by Matt Flusche in Red Hat's bugzilla. Below is description of the issue: I was verifying these OVS configuration options and the impact on tenant networking. My thought going into testing was vxlan would not be impacted but vlan tenant would break; however, for vxlan tenant networks it looks like these options will break multicast also. In a lab test (osp13), multicast is broken between VM instances on different compute nodes after applying: > # ovs-vsctl set Bridge br-int mcast_snooping_enable=true > # ovs-vsctl set Bridge br-int other_config:mcast-snooping-disable-flood-unregistered=true The following can be used to temporarily allow multicast over vxlan: ovs-vsctl set Port patch-tun other_config:mcast-snooping-flood- reports=true This will flood reports to br-tun and the other vxlan endpoints will learn the remote port. This allows multicast snooping to work for a period of time; however, since there is no IGMP querier to continue to solicit IGMP reports once the Age timer expires (300 secs) the traffic will be blocked. It seems that this solution as suggested will work if only provider networking is used. Is that correct? An options that might work would be: ovs-vsctl set Bridge br-int mcast_snooping_enable=true ovs-vsctl set Bridge br-int other_config:mcast-snooping-disable-flood-unregistered=false #<--- change to false; default Then, for each patch on br-int: ovs-vsctl set Port other_config:mcast-snooping-flood-reports=true ovs-vsctl set Port other_config:mcast-snooping-flood=true This might provide best effort snooping. multicast isolation where IGMP queriers are available and flood everywhere else? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1884723/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1517180] Re: No support for adding custom certificate chains
** Changed in: maas Status: Invalid => New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1517180 Title: No support for adding custom certificate chains Status in cloud-init: Triaged Status in curtin: Triaged Status in MAAS: New Bug description: In a MAAS behind a proxy that uses a self-signed certificate, when machines provisioned using maas attempt to contact e.g. https://entropy.ubuntu.com, they fail to validate the cert chain and fail. Suggested solution borrowed from an email from kirkland: On the MAAS administrative configuration page, we should add a small section where the MAAS admin can copy/paste/edit any certificate chains that they want to add to machines provisioned by MAAS. These certs should then be inserted into /etc/ssl/certs by cloud-init or curtin on initial install (depending on the earliest point at which the cert might be needed). To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1517180/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1773967] Re: Application credentials can't be used with group-only role assignments
** Also affects: keystone (Ubuntu) Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1773967 Title: Application credentials can't be used with group-only role assignments Status in Ubuntu Cloud Archive: New Status in OpenStack Identity (keystone): Fix Released Status in keystone package in Ubuntu: New Bug description: If a user only has a role assignment on a project via a group membership, the user can create an application credential for the project but it cannot be used. If someone tries to use it, the debug logs will report: User has no access to project We need to ensure that any application credential that is created can be used so long as it is not expired and the user exists and has access to the project they created the application credential for. If we decide that application credentials should not be valid for users who have no explicit role assignments on projects, then we should prevent it from being created and provide a useful message to the user. This is probably related to https://bugs.launchpad.net/keystone/+bug/1589993 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1773967/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1834009] [NEW] Trust API does not support delegating federated roles (roles obtained from federated groups)
Public bug reported: When a trust is created a trustor user is required to have a role on a project in question. This is verified via a call to the keystone database without looking at roles that can be inferred from federated groups present in a token. In this scenario a federated user does not have any direct role assignments in the Keystone database - only the ones that can be inferred from federated group membership. https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L141 https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L172-L178 A call to /v3/auth/tokens which verifies that "roles" for groups present in "OS-FEDERATION" section are properly populated: http://paste.openstack.org/show/753298/ "roles": [ { "id": "e4ab04a7c6ec4c91a826b2a3ba333407", "domain_id": null, "name": "Member" } # ... "user": { "OS-FEDERATION": { "identity_provider": { "id": "adfs" }, "protocol": { "id": "mapped" }, "groups": [ { "id": "7594d86688c54ee2aab4c9df020f5468" } ] }, This bug is similar to this one for application credentials: https://bugs.launchpad.net/keystone/+bug/1832092 Users, Member role and role assignments: http://paste.openstack.org/show/753300/ The issue was discovered while troubleshooting "Error: ERROR: Missing required credential: roles [u'Member']" showed by heat dashboard during a stack creation: http://paste.openstack.org/show/753301/ (heat API rpdb trace where a Keystone trust API call is made) Keystone side: http://paste.openstack.org/show/753302/ (keystone trust API rpdb trace) ** Affects: keystone Importance: Undecided Status: New ** Tags: cpe-onsite -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1834009 Title: Trust API does not support delegating federated roles (roles obtained from federated groups) Status in OpenStack Identity (keystone): New Bug description: When a trust is created a trustor user is required to have a role on a project in question. This is verified via a call to the keystone database without looking at roles that can be inferred from federated groups present in a token. In this scenario a federated user does not have any direct role assignments in the Keystone database - only the ones that can be inferred from federated group membership. https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L141 https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/trust/controllers.py#L172-L178 A call to /v3/auth/tokens which verifies that "roles" for groups present in "OS-FEDERATION" section are properly populated: http://paste.openstack.org/show/753298/ "roles": [ { "id": "e4ab04a7c6ec4c91a826b2a3ba333407", "domain_id": null, "name": "Member" } # ... "user": { "OS-FEDERATION": { "identity_provider": { "id": "adfs" }, "protocol": { "id": "mapped" }, "groups": [ { "id": "7594d86688c54ee2aab4c9df020f5468" } ] }, This bug is similar to this one for application credentials: https://bugs.launchpad.net/keystone/+bug/1832092 Users, Member role and role assignments: http://paste.openstack.org/show/753300/ The issue was discovered while troubleshooting "Error: ERROR: Missing required credential: roles [u'Member']" showed by heat dashboard during a stack creation: http://paste.openstack.org/show/753301/ (heat API rpdb trace where a Keystone trust API call is made) Keystone side: http://paste.openstack.org/show/753302/ (keystone trust API rpdb trace) To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1834009/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1832265] Re: py3: inconsistent encoding of token fields
Ran into a related problem during debugging of dashboard errors ("Unable to retrieve key pairs") with a Rocky cloud & identity federation. There was no clear indication as to why failures occurred. https://paste.ubuntu.com/p/v5HXyyWXC2/ (full pdb trace) At a high level I was getting validation failures for the identity provider (which was enabled in Keystone and was otherwise correct in terms of config) in the /v3/auth/token code path. I narrowed it down to a validation error due to a type mismatch (bytes vs str): 1) the error occurs in send_notification: > /usr/lib/python3/dist-packages/keystone/auth/plugins/mapped.py(101)handle_scoped_token()->None -> send_notification(taxonomy.OUTCOME_SUCCESS) (Pdb) l 96 # send off failed authentication notification, raise the exception 97 # after sending the notification 98 send_notification(taxonomy.OUTCOME_FAILURE) 99 raise 100 else: 101 -> send_notification(taxonomy.OUTCOME_SUCCESS) # ... 2) this is how the validation error looks like: (Pdb) setattr(self, FED_CRED_KEYNAME_IDENTITY_PROVIDER, identity_provider) *** ValueError: identity_provider failed validation: at 0x7fa0016ef9d8> 3) the lambda function where the error occurs 67 class FederatedCredential(Credential): 68 identity_provider = cadftype.ValidatorDescriptor( 69 FED_CRED_KEYNAME_IDENTITY_PROVIDER, 70 -> lambda x: isinstance(x, six.string_types)) 71 user = cadftype.ValidatorDescriptor( 72 FED_CRED_KEYNAME_USER, 73 lambda x: isinstance(x, six.string_types)) 74 groups = cadftype.ValidatorDescriptor( 75 FED_CRED_KEYNAME_GROUPS, 4) type comparison (b'adfs' is the identity provider name): ((Pdb)) x b'adfs' ((Pdb)) six.string_types (,) ((Pdb)) type(x) Using a package from James' PPA helped as I am not getting errors in the same code-path anymore. apt policy keystone keystone: Installed: 2:14.1.0-0ubuntu2~ubuntu18.04.1~ppa201906140719 Candidate: 2:14.1.0-0ubuntu2~ubuntu18.04.1~ppa201906140719 Version table: *** 2:14.1.0-0ubuntu2~ubuntu18.04.1~ppa201906140719 500 When clicking through tabs very fast I encountered a glitch which results in the following error messages being displayed (see the screencast in the attachment): Error: "Unable to retrieve key pairs"/"Unable to retrieve images"/""Unable to retrieve server groups" Warning: "Policy check failed" I tried to set breakpoints in the same place - the same validation error does NOT occur with the patch so this is something else unrelated to py2 vs py3 string handling. ** Attachment added: "2019-06-22-16-12-40.mkv" https://bugs.launchpad.net/charm-keystone-ldap/+bug/1832265/+attachment/5272335/+files/2019-06-22-16-12-40.mkv ** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1832265 Title: py3: inconsistent encoding of token fields Status in OpenStack Keystone LDAP integration: Invalid Status in Ubuntu Cloud Archive: New Status in OpenStack Identity (keystone): In Progress Status in keystone package in Ubuntu: Fix Released Status in keystone source package in Cosmic: Triaged Status in keystone source package in Disco: Triaged Bug description: When using an LDAP domain user on a bionic-rocky cloud within horizon, we are unable to see the projects listed in the project selection drop-down, and are unable to query resources from any projects to which we are assigned the role Member. It appears that the following log entries in keystone may be helpful to troubleshooting this issue: (keystone.middleware.auth): 2019-06-10 19:47:02,700 DEBUG RBAC: auth_context: {'trust_id': None, 'trustor_id': None, 'trustee_id': None, 'domain_id': None, 'domain_name': None, 'group_ids': [], 'token': , 'user_id': b'd4fb94cfa3ce0f7829d76fe44697488e7765d88e29f5a896f57d43caadb0fad4', 'user_domain_id': '997b3e91271140feb1635eefba7c65a1', 'system_scope': None, 'project_id': None, 'project_domain_id': None, 'roles': [], 'is_admin_project': True, 'service_user_id': None, 'service_user_domain_id': None, 'service_project_id': None, 'service_project_domain_id': None, 'service_roles': []} (keystone.server.flask.application): 2019-06-10 19:47:02,700 DEBUG Dispatching request to legacy mapper: /v3/users (keystone.server.flask.application): 2019-06-10 19:47:02,700 DEBUG SCRIPT_NAME: `/v3`, PATH_INFO: `/users/d4fb94cfa3ce0f7829d76fe44697488e7765d88e29f5a896f57d43caadb0fad4/projects` (routes.middleware): 2019-06-10 19:47:02,700 DEBUG Matched GET /users/d4fb94cfa3ce0f7829d76fe44697488e7765d88e29f5a896f57d43caadb0fad4/projects (routes.middleware): 2019-06-10 19:47:02,700 DEBUG Route path: '/users/{user_id}/projects', defaults: {'action': 'list_user_projects
[Yahoo-eng-team] [Bug 1832092] [NEW] [rocky+] Creation of application credentials fails when role assignments only come from role assignments of federated groups
Public bug reported: [Version] Rocky (UCA) [Problem Description] (see the User Scenario section below for a description of the environment) When no direct role assignments to federated users are done and only federated group role assignments are present, application credential creation via Horizon fails with the following errors: horizon apache2 error.log: [Sat Jun 08 14:27:59.153479 2019] [wsgi:error] [pid 150327:tid 139962773473024] [remote 10.232.46.207:35898] Recoverable error: Invalid application credential: Could not find role assignment with role: 91afa82fab85426fa741370dabad80bf, user or group: 794d430997c64060854bf77f2e7e6e16, project, domain, or system: 7de76f768cb84149b8b2d693d1d21f45. (HTTP 400) (Request-ID: req-da2e3322 -2f6f-468f-bd0d-b08855f9893b) keystone.log: (keystone.common.wsgi): 2019-06-08 14:30:55,933 WARNING Invalid application credential: Could not find role assignment with role: 91afa82fab85426fa741370dabad80bf, us er or group: 794d430997c64060854bf77f2e7e6e16, project, domain, or system: 7de76f768cb84149b8b2d693d1d21f45. (keystone.middleware.auth): 2019-06-08 14:31:00,940 DEBUG Authenticating user token Code-path: create_application_credential -> _require_user_has_role_in_project -> _get_user_roles -> _get_user_roles -> list_role_assignments -> _list_effective_role_assignments -> _get_group_ids_for_user_id -> list_groups_for_user -> _get_group_ids_for_user_id A detailed rpdb trace: http://paste.openstack.org/show/752652/ 82 def _require_user_has_role_in_project(self, roles, user_id, project_id): 83 user_roles = self._get_user_roles(user_id, project_id) 84 -> for role in roles: 85 if role['id'] not in user_roles: 86 raise exception.RoleAssignmentNotFound(role_id=role['id'], 87actor_id=user_id, 88 target_id=project_id) [Possible Solution] Group membership details obtained dynamically during federated authentication and embedded into a fernet token (first an unscoped token, then a project-scoped token) need to be used in addition to querying the database for user to group membership. [User Scenario] Federated authentication via SAML with the following mapping (i.e. no direct role assignment to a user on a project - only federated group- based role assignment): openstack mapping show adfs_mapping +---++ | Field | Value | +---++ | id| adfs_mapping | | rules | [{'remote': [{'type': 'MELLON_NAME_ID'}, {'type': 'MELLON_groups'}], 'local': [{'domain': {'id': 'e834e57943714e058c203d4f544ea946'}, 'user': {'name': '{0}'}, 'groups': '{1}'}]}] | +---++ # a federated user openstack user list --domain adfs +--++ | ID | Name | +--++ | 794d430997c64060854bf77f2e7e6e16 | intranet\Administrator | +--++ # a group that that exists both on the IdP and Keystone (SP) side openstack group list --domain adfs +--++ | ID | Name | +--++ | 701f70e7549d4de28cecd60127a1a444 | adfs_users | +--++ # grouptest is a project that adfs_users group members get a Member role assignment on openstack project list --domain adfs +--+---+ | ID | Name | +--+---+ | 7de76f768cb84149b8b2d693d1d21f45 | grouptest | | 6a0657cf98684a62af99dc7b71a383dd | test | +--+---+ # no direct Member role assignments for federated users openstack role assignment list --names ++--+-+-+--++---+ | Role | U
[Yahoo-eng-team] [Bug 1828126] [NEW] [<= Queens] With token-provider='uuid', roles of dynamically obtained federated groups are not taken into account during token-based authentication (for project-sc
Public bug reported: [Overview] The relevant part of the federated authentication process after the IdP and SP token parsing stages is as follows: 1) WSGI environment variables created based on token attributes (e.g. SAML token attributes) are passed down to Keystone; 2) Keystone creates a shadow mapped user in the db and tries to map token attributes to objects such as groups, roles and projects in the DB based on a custom mapping created by an operator; 3) groups that may be obtained from token attributes are matched against groups in Keystone but the user is not included into those groups in Keystone DB (to support dynamic group membership changes at the IdP side). If any of the target groups do not exist in Keystone authentication fails; 4) A domain-scoped federated token is created (e.g. by Horizon) and then a project-scoped token is created using the previous token as the authentication method. (4) is where the problem occurs. [Environment] Queens, 19.04 charms, token-provider='uuid' for charm-keystone. openstack commands used to configure an IdP: https://paste.ubuntu.com/p/nj6MdQDKk2/ keystone.conf sections: [auth] methods = external,password,token,oauth1,totp,application_credential,saml2 [federation] trusted_dashboard = https://dashboard.maas/auth/websso/ [saml2] remote_id_attribute = MELLON_IDP IdP is ADFS in this case which uses a windows account name as NAMEID and adds an attribute which corresponds to a group ID (the group name in Active Directory is the OpenStack group ID). The resulting SAML token then contains the following elements: 3f031869ef9f4dc49a342d6be69e98b3 The direct usage of a group ID is present to rule out group name to ID resolution problems. [Use-case] Automatic project provisioning and Member role assignment to users is not used on purpose to manage user access to projects via group to role assignments. A user is a assigned to a group at the IdP side and the keystone database does not contain any role assignments for shadow- mapped users. `openstack role assignment list --names` will not contain anything related to group assignments - all group membership information will only be exposed in a token. [Problem Description] 1) the first token (federated, obtained via v3 federation API) is domain-scoped and authentication succeeds for it; 2) then a client (e.g. Horizon) gets a project-scoped token based on that federated token (token authentication & regular v3 API) for which roles need to be populated - including the roles to access the target project; 3) the roles for the second token are not populated correctly based on the (dynamic) group assignments that came from the SAML token for the first token - clearly the role population code-path for the second token is not aware of groups that came dynamically with the SAML token. The expected result would be awareness of groups assigned to the shadow mapped user and then inference of roles from groups based on group to role assignments in the Keystone DB. This explains the fact that project auto-provisioning and project role assignment to shadow users directly works properly (because this can be queried by keystone from its db). The visible end-result for a user authenticating via the dashboard is represented in a form of errors such as "Unauthorized: Unable to..." for any accessed dashboard pane. [Symptoms] Example: https://paste.ubuntu.com/p/syxxWmdyD7/ (keystone.token.provider): 2019-05-07 17:47:01,947 DEBUG Unable to validate token: The request you have made requires authentication. Project-scoped token example (contains the right group and "methods": ["token", "saml2"]) as queried directly from the db: https://paste.ubuntu.com/p/rRgXSctgWT/ rpdb trace - first pass at finding where it fails: https://paste.ubuntu.com/p/DhG4HXCnBB/ Second pass (the most useful) - a trace point in keystone/token/providers/common.py get_token_data() going down to keystone/token/providers/common.py(432)_populate_roles() where the Unauthorized exception is thrown: https://paste.ubuntu.com/p/pjRf7qBzcX/ [Root Cause] Based on the symptoms it is clear that _populate_roles (unlike populate_roles_for_federated_user) does not include group roles for groups obtained via federated authentication: https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L408-L432 (_populate_roles) https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L168-L193 (_get_roles_for_user, has a branch to work with group roles but for system-scoped tokens only) https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L190-L193 (get_roles_for_user_and_project gets user to role assignments which are not present in this case) Which in the end leads to exception.Unauthorized being thrown by Keystone https://opendev.org/openstack/keystone/src/branch/stable/queens/keystone/token/providers/common.py#L416
[Yahoo-eng-team] [Bug 1774710] Re: DHCP agent doesn't do anything with a network's dns_domain attribute
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1774710 Title: DHCP agent doesn't do anything with a network's dns_domain attribute Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: 0) Set up Neutron with ML2/OVS or LB, or anything that uses the DHCP agent 1) Create a network with dns_domain 2) Boot a VM on it Notice the VM doesn't have the DNS domain in it's /etc/resolv.conf In short, per-network DNS domains are not respected by the DHCP agent. The dns_domain attribute is persisted in the Neutron DB and passed on to the DHCP agent via RPC, but the agent doesn't do anything with it. Versions: Master and all previous versions. WIP fix is in https://review.openstack.org/#/c/571546. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1774710/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1763608] Re: Netplan ignores Interfaces without IP Addresses
I do not think Neutron is related in any way here by the way because it is not responsible for bringing OVS bridge interface links up => moving to invalid for Neutron. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1763608 Title: Netplan ignores Interfaces without IP Addresses Status in kolla: Invalid Status in netplan: New Status in neutron: Invalid Bug description: The "manual" method in /etc/network/interfaces resulted in an interface being brought up, but not having an IP address assigned. When configuring an Interface without an IP Address, netplan ignores the interface instead of bringing it up. --- network: version: 2 renderer: networkd ethernets: eth1: {} Expected result from `netplan apply`: eth1 is brought up. Actual result: eth1 is still down. Similarly `netplan generate` does not generate any file in /run/systemd/network for eth1. To manage notifications about this bug go to: https://bugs.launchpad.net/kolla/+bug/1763608/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1783654] Re: DVR process flow not installed on physical bridge for shared tenant network
** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: neutron (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1783654 Title: DVR process flow not installed on physical bridge for shared tenant network Status in Ubuntu Cloud Archive: New Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396 In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts. The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with the unique per-host MAC when its the source), is missing, and so is the drop rule which instructs the bridge to drop all traffic destined for the shared distributed MAC. Because of this, we are seeing the router MAC on the network infrastructure, causing it on flap on br-int on every compute host: root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec2 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec0 Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 is the correct local qr-interface. Because these dvr flows are missing on br-vlan, pkts w/ source mac ingress into the host and br-int learns it upstream. The symptom is when pinging a VM's floating IP, we see occasional packet loss (10-30%), and sometimes the responses are sent upstream by br-int instead of the qrouter, so the ICMP replies come with fixed IP of the replier since no NAT'ing took place, and on the tenant network rather than external network. When I force net_shared_only to False here, the problem goes away: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436 It should we noted we *ONLY* need to do this on our dvr_snat host. The dvr process's are missing on every compute host. But if we shut qrouter on the snat host, FIP functionality works and DVR mac stops flapping on others. Or if we apply fix only to snat host, it works. Perhaps there is something on SNAT node that is unique To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1783654/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1751396] Re: DVR: Inter Tenant Traffic between two networks and connected through a shared network not reachable with DVR routers
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1751396 Title: DVR: Inter Tenant Traffic between two networks and connected through a shared network not reachable with DVR routers Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: Inter Tenant Traffic between Two Tenants on two different private networks connected through a common shared network (created by Admin) is not route able through DVR routers Steps to reproduce it: (NOTE: No external, just shared network) This is only reproducable in Multinode scenario. ( 1 Controller - 2 compute ). Make sure that the two VMs are isolated in two different computes. openstack network create --share shared_net openstack subnet create shared_net_sn --network shared_net --subnet- range 172.168.10.0/24 openstack network create net_A openstack subnet create net_A_sn --network net_A --subnet-range 10.1.0.0/24 openstack network create net_B openstack subnet create net_B_sn --network net_B --subnet-range 10.2.0.0/24 openstack router create router_A openstack port create --network=shared_net --fixed-ip subnet=shared_net_sn,ip-address=172.168.10.20 port_router_A_shared_net openstack router add port router_A port_router_A_shared_net openstack router add subnet router_A net_A_sn openstack router create router_B openstack port create --network=shared_net --fixed-ip subnet=shared_net_sn,ip-address=172.168.10.30 port_router_B_shared_net openstack router add port router_B port_router_B_shared_net openstack router add subnet router_B net_B_sn openstack server create server_A --flavor m1.tiny --image cirros --nic net-id=net_A openstack server create server_B --flavor m1.tiny --image cirros --nic net-id=net_B Add static routes to the router. openstack router set router_A --route destination=10.1.0.0/24,gateway=172.168.10.20 openstack router set router_B --route destination=10.2.0.0/24,gateway=172.168.10.30 ``` Ping from one instance to the other times out To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1751396/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1759971] Re: [dvr][fast-exit] a route to a tenant network does not get created in fip namespace if an external network is attached after a tenant network have been attached (race
Affects Pike and Queens UCA. ** Also affects: neutron (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1759971 Title: [dvr][fast-exit] a route to a tenant network does not get created in fip namespace if an external network is attached after a tenant network have been attached (race condition) Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: Overall, similar scenario to https://bugs.launchpad.net/neutron/+bug/1759956 but a different problem. Relevant agent config options: http://paste.openstack.org/show/718418/ OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack charms), 2 external subnets (one routed provider network), 1 tenant subnet, all subnets in the same address scope to trigger "fast exit" logic. Tenant subnet cidr: 192.168.100.0/24 openstack address scope create dev openstack subnet pool create --address-scope dev --pool-prefix 10.232.40.0/21 --pool-prefix 10.232.16.0/21 dev openstack subnet pool create --address-scope dev --pool-prefix 192.168.100.0/24 tenant openstack network create --external --provider-physical-network physnet1 --provider-network-type flat pubnet openstack network segment set --name segment1 d8391bfb-4466-4a45-972c-45ffcec9f6bc openstack network segment create --physical-network physnet2 --network-type flat --network pubnet segment2 openstack subnet create --no-dhcp --subnet-pool dev --subnet-range 10.232.16.0/21 --allocation-pool start=10.232.17.0,end=10.232.17.255 --dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet --network-segment segment1 pubsubnetl1 openstack subnet create --gateway 10.232.40.100 --no-dhcp --subnet-pool dev --subnet-range 10.232.40.0/21 --allocation-pool start=10.232.41.0,end=10.232.41.255 --dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet --network-segment segment2 pubsubnetl2 openstack network create --internal --provider-network-type vxlan tenantnet openstack subnet create --dhcp --ip-version 4 --subnet-range 192.168.100.0/24 --subnet-pool tenant --dns-nameserver 10.232.36.101 --network tenantnet tenantsubnet # --- # Works in this order when an external network is attached first openstack router create --disable --no-ha --distributed pubrouter openstack router set --disable-snat --external-gateway pubnet --enable pubrouter openstack router add subnet pubrouter tenantsubnet 2018-03-29 23:30:48.933 2050638 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 'route', 'replace', '192.168.100.0/24', 'via', '169.254.106.114', 'dev', 'fpr-09fd1 424-7'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 # -- # Doesn't work the other way around - as a fip namespace does not get created before a tenant network is attached openstack router create --disable --no-ha --distributed pubrouter openstack router add subnet pubrouter tenantsubnet openstack router set --disable-snat --external-gateway pubnet --enable pubrouter # to "fix" this we need to re-trigger the right code path openstack router remove subnet pubrouter tenantsubnet openstack router add subnet pubrouter tenantsubnet To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1759971/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1761591] [NEW] [dvr] enable_snat attribute is ignored - centralized snat port gets created
Public bug reported: OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one routed provider network), 1 tenant subnet added to a router. Tenant subnet cidr: 192.168.100.0/24 Relevant agent configs: http://paste.openstack.org/show/718514/ Commands and outputs: http://paste.openstack.org/show/rww2iliACb81IbZDUQ9g/ Although a router is created with --disable-snat and enable_snat is shown as set to "false" openstack router set --disable-snat --external-gateway pubnet --enable pubrouter a centralized snat port is still created for that router: | device_owner | network:router_centralized_snat I suspect this is because _create_snat_interfaces_after_change does not take enable_snat into account: https://github.com/openstack/neutron/blob/stable/queens/neutron/db/l3_dvr_db.py#L160-L168 Additionally, when agent mode is dvr_snat an snat- network namespace gets created unconditionally by virtue of DvrEdgeRouter usage: https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/agent.py#L343-L347 https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_edge_router.py#L32-L33 It seems that right now there is a tight dependency on having a dvr_snat node in a deployment so even if only fast exit(/entry) functionality is intended to be used, there is no way to completely disable SNAT. A gateway port is still required to be bound to a dvr_snat node, however, DvrEdgeRouter could operate differently depending on whether enable_snat is actually true (to handle updates to this attribute). In this case a router_centralized_snat port and an snat namespace would only be created on addition of external gateway information with enable_snat or on updates that set enable_snat to true. ** Affects: neutron Importance: Undecided Status: New ** Tags: cpe-onsite -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1761591 Title: [dvr] enable_snat attribute is ignored - centralized snat port gets created Status in neutron: New Bug description: OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one routed provider network), 1 tenant subnet added to a router. Tenant subnet cidr: 192.168.100.0/24 Relevant agent configs: http://paste.openstack.org/show/718514/ Commands and outputs: http://paste.openstack.org/show/rww2iliACb81IbZDUQ9g/ Although a router is created with --disable-snat and enable_snat is shown as set to "false" openstack router set --disable-snat --external-gateway pubnet --enable pubrouter a centralized snat port is still created for that router: | device_owner | network:router_centralized_snat I suspect this is because _create_snat_interfaces_after_change does not take enable_snat into account: https://github.com/openstack/neutron/blob/stable/queens/neutron/db/l3_dvr_db.py#L160-L168 Additionally, when agent mode is dvr_snat an snat- network namespace gets created unconditionally by virtue of DvrEdgeRouter usage: https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/agent.py#L343-L347 https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_edge_router.py#L32-L33 It seems that right now there is a tight dependency on having a dvr_snat node in a deployment so even if only fast exit(/entry) functionality is intended to be used, there is no way to completely disable SNAT. A gateway port is still required to be bound to a dvr_snat node, however, DvrEdgeRouter could operate differently depending on whether enable_snat is actually true (to handle updates to this attribute). In this case a router_centralized_snat port and an snat namespace would only be created on addition of external gateway information with enable_snat or on updates that set enable_snat to true. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1761591/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1759956] Re: [dvr][fast-exit] incorrect policy rules get deleted when a distributed router has ports on multiple tenant networks
Affects pike and queens UCA packages. ** Also affects: neutron (Ubuntu) Importance: Undecided Status: New ** Changed in: neutron (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1759956 Title: [dvr][fast-exit] incorrect policy rules get deleted when a distributed router has ports on multiple tenant networks Status in neutron: Fix Released Status in neutron package in Ubuntu: Confirmed Bug description: TL;DR: ip -4 rule del priority table type unicast will delete the first matching rule it encounters: if there are two rules with the same priority it will just kill the first one it finds. The original setup is described here: https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1759918 OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack charms), 2 external subnets (one routed provider network), 2 tenant subnets all in the same address scope to trigger "fast exit". 2 tenant networks attached (subnets 192.168.100.0/24 and 192.168.200.0/24) to a DVR: # 2 rules as expected ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.100.0/24 lookup 16 8: from 192.168.200.0/24 lookup 16 # remove 192.168.200.0/24 sometimes deletes an incorrect policy rule openstack router remove subnet pubrouter othertenantsubnet # ip route del contains the cidr 2018-03-29 20:09:52.946 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 'route', 'del', '192.168.200.0/24', 'via', '169.254.93.94', 'dev', 'fpr-4f9ca9ef-3' ] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 # ip rule delete is not that specific 2018-03-29 20:09:53.195 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'del', 'priority', '8', 'table', '16', 'type', 'unicast'] create_pr ocess /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 2018-03-29 20:15:59.210 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'show'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 2018-03-29 20:15:59.455 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'add', 'from', '192.168.100.0/24', 'priority', '8', 'table', '16', 'type', 'unicast'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.100.0/24 lookup 16 8: from 192.168.200.0/24 lookup 16 # try to delete a rule manually to see what is going on ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule ; ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip -4 rule del priority 8 table 16 type unicast ; ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.100.0/24 lookup 16 8: from 192.168.200.0/24 lookup 16 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.200.0/24 lookup 16 # ^^ 192.168.100.0/24 rule got deleted instead of 192.168.200.0/24 # add the rule back manually ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule add from 192.168.100.0/24 priority 8 table 16 type unicast # different order now - 192.168.200.0/24 is first ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.200.0/24 lookup 16 8: from 192.168.100.0/24 lookup 16 # now 192.168.200.0/24 got deleted because it was first to match ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule ; ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip -4 rule del priority 8 table 16 type unicast ; ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule 0: from all lookup l
[Yahoo-eng-team] [Bug 1761555] [NEW] [dvr][fast-exit] router add/remove subnet operations are not idempotent
Public bug reported: OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one routed provider network), 2 tenant subnets, all subnets in the same address scope to trigger "fast exit" logic. Tenant subnet cidr: 192.168.100.0/24 Other tenant subnet cidr: 192.168.200.0/24 Relevant agent configs: http://paste.openstack.org/show/718514/ Commands and outputs: http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/ Overall, a similar situation as with https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant subnet at first for which routes and rules do not get deleted at all. Problem description: * router add subnet tenantsubnet * routes in fip namespace and rules in qrouter namespace get created and a distributed port gets created for DVR; * router remove subnet tenantsubnet * routes are still there, no new logged events in DVR l3 agent logs If two networks are added then removing one of them triggers removal of routes and rules and new messages are logged in l3 agent log (the rules removed are affected by pad.lv/1759956). A sequence of add subnet/remove subnet commands may result in errors logged in l3 agent logs: http://paste.openstack.org/show/718511/ Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a proper route is added for a few seconds but then removed: # just do some operations (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter othertenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router remove subnet pubrouter tenantsubnet # lots of errors, see http://paste.openstack.org/show/718511/ # try again without restarting agents (openstack) router add subnet pubrouter tenantsubnet # ran client command # ... got 192.168.100.0/24 here for a few seconds while l3 agent was doing something 10.232.16.0/21 dev fg-7f42af4f-ad proto kernel scope link src 10.232.17.5 169.254.106.114/31 dev fpr-3182a7c6-b proto kernel scope link src 169.254.106.115 192.168.100.0/24 via 169.254.106.114 dev fpr-3182a7c6-b 192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b # finished server and l3 agent finished processing "router add subnet pubrouter tenantsubnet" # route got deleted root@ipotane:~# ip netns exec fip-64ab1ec3-4927-4f09-87f9-804e7f4f8748 ip r 10.232.16.0/21 dev fg-7f42af4f-ad proto kernel scope link src 10.232.17.5 169.254.106.114/31 dev fpr-3182a7c6-b proto kernel scope link src 169.254.106.115 192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b There is something wrong with how tenant network add/remove notifications are sent it seems because on first removal of a tenant network nothing is logged in l3 agent logs but there is activity in neutron server logs. ** Affects: neutron Importance: Undecided Status: New ** Tags: cpe-onsite -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1761555 Title: [dvr][fast-exit] router add/remove subnet operations are not idempotent Status in neutron: New Bug description: OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one routed provider network), 2 tenant subnets, all subnets in the same address scope to trigger "fast exit" logic. Tenant subnet cidr: 192.168.100.0/24 Other tenant subnet cidr: 192.168.200.0/24 Relevant agent configs: http://paste.openstack.org/show/718514/ Commands and outputs: http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/ Overall, a similar situation as with https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant subnet at first for which routes and rules do not get deleted at all. Problem description: * router add subnet tenantsubnet * routes in fip namespace and rules in qrouter namespace get created and a distributed port gets created for DVR; * router remove subnet tenantsubnet * routes are still there, no new logged events in DVR l3 agent logs If two networks are added then removing one of them triggers removal of routes and rules and new messages are logged in l3 agent log (the rules removed are affected by pad.lv/1759956). A sequence of add subnet/remove subnet commands may result in errors logged in l3 agent logs: http://paste.openstack.org/show/718511/ Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a proper route is added for a few seconds but then removed: # just do some operations (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter othertenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router remove subnet pubrouter tenantsubnet # lots of errors, see http://paste.openstack.org/show/718511/ # try again without restarting agents (openstack) router
[Yahoo-eng-team] [Bug 1761556] [NEW] [dvr][fast-exit] router add/remove subnet operations are not idempotent
Public bug reported: OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one routed provider network), 2 tenant subnets, all subnets in the same address scope to trigger "fast exit" logic. Tenant subnet cidr: 192.168.100.0/24 Other tenant subnet cidr: 192.168.200.0/24 Relevant agent configs: http://paste.openstack.org/show/718514/ Commands and outputs: http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/ Overall, a similar situation as with https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant subnet at first for which routes and rules do not get deleted at all. Problem description: * router add subnet tenantsubnet * routes in fip namespace and rules in qrouter namespace get created and a distributed port gets created for DVR; * router remove subnet tenantsubnet * routes are still there, no new logged events in DVR l3 agent logs If two networks are added then removing one of them triggers removal of routes and rules and new messages are logged in l3 agent log (the rules removed are affected by pad.lv/1759956). A sequence of add subnet/remove subnet commands may result in errors logged in l3 agent logs: http://paste.openstack.org/show/718511/ Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a proper route is added for a few seconds but then removed: # just do some operations (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter othertenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router remove subnet pubrouter tenantsubnet # lots of errors, see http://paste.openstack.org/show/718511/ # try again without restarting agents (openstack) router add subnet pubrouter tenantsubnet # ran client command # ... got 192.168.100.0/24 here for a few seconds while l3 agent was doing something 10.232.16.0/21 dev fg-7f42af4f-ad proto kernel scope link src 10.232.17.5 169.254.106.114/31 dev fpr-3182a7c6-b proto kernel scope link src 169.254.106.115 192.168.100.0/24 via 169.254.106.114 dev fpr-3182a7c6-b 192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b # finished server and l3 agent finished processing "router add subnet pubrouter tenantsubnet" # route got deleted root@ipotane:~# ip netns exec fip-64ab1ec3-4927-4f09-87f9-804e7f4f8748 ip r 10.232.16.0/21 dev fg-7f42af4f-ad proto kernel scope link src 10.232.17.5 169.254.106.114/31 dev fpr-3182a7c6-b proto kernel scope link src 169.254.106.115 192.168.200.0/24 via 169.254.106.114 dev fpr-3182a7c6-b There is something wrong with how tenant network add/remove notifications are sent it seems because on first removal of a tenant network nothing is logged in l3 agent logs but there is activity in neutron server logs. ** Affects: neutron Importance: Undecided Status: New ** Tags: cpe-onsite -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1761556 Title: [dvr][fast-exit] router add/remove subnet operations are not idempotent Status in neutron: New Bug description: OpenStack Queens from UCA (xenial, GA kernel), 2 external subnets (one routed provider network), 2 tenant subnets, all subnets in the same address scope to trigger "fast exit" logic. Tenant subnet cidr: 192.168.100.0/24 Other tenant subnet cidr: 192.168.200.0/24 Relevant agent configs: http://paste.openstack.org/show/718514/ Commands and outputs: http://paste.openstack.org/show/JFYmGJwF1pdtliQOfXgd/ Overall, a similar situation as with https://bugs.launchpad.net/neutron/+bug/1759956 but with one tenant subnet at first for which routes and rules do not get deleted at all. Problem description: * router add subnet tenantsubnet * routes in fip namespace and rules in qrouter namespace get created and a distributed port gets created for DVR; * router remove subnet tenantsubnet * routes are still there, no new logged events in DVR l3 agent logs If two networks are added then removing one of them triggers removal of routes and rules and new messages are logged in l3 agent log (the rules removed are affected by pad.lv/1759956). A sequence of add subnet/remove subnet commands may result in errors logged in l3 agent logs: http://paste.openstack.org/show/718511/ Sometimes after re-adding a tenantsubnet in presence of othertenantsubnet a proper route is added for a few seconds but then removed: # just do some operations (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter othertenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router add subnet pubrouter tenantsubnet (openstack) router remove subnet pubrouter tenantsubnet # lots of errors, see http://paste.openstack.org/show/718511/ # try again without restarting agents (openstack) router
[Yahoo-eng-team] [Bug 1759971] [NEW] [dvr][fast-exit] a route to a tenant network does not get created in fip namespace if an external network is attached after a tenant network have been attached
Public bug reported: Overall, similar scenario to https://bugs.launchpad.net/neutron/+bug/1759956 but a different problem. OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack charms), 2 external subnets (one routed provider network), 1 tenant subnet, all subnets in the same address scope to trigger "fast exit" logic. Tenant subnet cidr: 192.168.100.0/24 openstack address scope create dev openstack subnet pool create --address-scope dev --pool-prefix 10.232.40.0/21 --pool-prefix 10.232.16.0/21 dev openstack subnet pool create --address-scope dev --pool-prefix 192.168.100.0/24 tenant openstack network create --external --provider-physical-network physnet1 --provider-network-type flat pubnet openstack network segment set --name segment1 d8391bfb-4466-4a45-972c-45ffcec9f6bc openstack network segment create --physical-network physnet2 --network-type flat --network pubnet segment2 openstack subnet create --no-dhcp --subnet-pool dev --subnet-range 10.232.16.0/21 --allocation-pool start=10.232.17.0,end=10.232.17.255 --dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet --network-segment segment1 pubsubnetl1 openstack subnet create --gateway 10.232.40.100 --no-dhcp --subnet-pool dev --subnet-range 10.232.40.0/21 --allocation-pool start=10.232.41.0,end=10.232.41.255 --dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet --network-segment segment2 pubsubnetl2 openstack network create --internal --provider-network-type vxlan tenantnet openstack subnet create --dhcp --ip-version 4 --subnet-range 192.168.100.0/24 --subnet-pool tenant --dns-nameserver 10.232.36.101 --network tenantnet tenantsubnet # --- # Works in this order when an external network is attached first openstack router create --disable --no-ha --distributed pubrouter openstack router set --disable-snat --external-gateway pubnet --enable pubrouter openstack router add subnet pubrouter tenantsubnet 2018-03-29 23:30:48.933 2050638 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 'route', 'replace', '192.168.100.0/24', 'via', '169.254.106.114', 'dev', 'fpr-09fd1 424-7'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 # -- # Doesn't work the other way around - as a fip namespace does not get created before a tenant network is attached openstack router create --disable --no-ha --distributed pubrouter openstack router add subnet pubrouter tenantsubnet openstack router set --disable-snat --external-gateway pubnet --enable pubrouter # to "fix" this we need to re-trigger the right code path openstack router remove subnet pubrouter tenantsubnet openstack router add subnet pubrouter tenantsubnet The right code path seems to be in dvr_local_router.py https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L413 https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L623-L632 Based on a quick grep nothing in dvr_fip_ns.py calls internal_network_added so this never gets triggered. neutron/agent/l3/dvr_edge_ha_router.py|40| def internal_network_added(self, port): neutron/agent/l3/dvr_edge_ha_router.py|41| # Call RouterInfo's internal_network_added (Plugs the port, adds IP) neutron/agent/l3/dvr_edge_ha_router.py|42| router_info.RouterInfo.internal_network_added(self, port) neutron/agent/l3/dvr_edge_router.py|96| def internal_network_added(self, port): neutron/agent/l3/dvr_edge_router.py|97| super(DvrEdgeRouter, self).internal_network_added(port) neutron/agent/l3/dvr_edge_router.py|110| self._internal_network_added( neutron/agent/l3/dvr_edge_router.py|142| self._internal_network_added( neutron/agent/l3/dvr_local_router.py|398| def internal_network_added(self, port): neutron/agent/l3/dvr_local_router.py|399| super(DvrLocalRouter, self).internal_network_added(port) neutron/agent/l3/ha_router.py|331| def internal_network_added(self, port): neutron/agent/l3/router_info.py|441| def _internal_network_added(self, ns_name, network_id, port_id, neutron/agent/l3/router_info.py|458| def internal_network_added(self, port): neutron/agent/l3/router_info.py|466| self._internal_network_added(self.ns_name, neutron/agent/l3/router_info.py|556| self.internal_network_added(p) https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_fip_ns.py ** Affects: neutron Importance: Undecided Status: New ** Tags: cpe-onsite -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1759971 Title: [dvr][fast-exit] a route to a tenant network does not get created in fip namespace if an external network is attached after a tenant network have been attached Status in neutron: New Bug description: Overall, similar scenario to https://b
[Yahoo-eng-team] [Bug 1759956] [NEW] [dvr][fast-exit] incorrect policy rules get deleted when a distributed router has ports on multiple tenant networks
32767: from all lookup default 8: from 192.168.200.0/24 lookup 16 8: from 192.168.100.0/24 lookup 16 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.100.0/24 lookup 16 Code: _dvr_internal_network_removed https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L431-L443 _delete_interface_routing_rule_in_router_ns https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/dvr_local_router.py#L642-L648 ip_rule = ip_lib.IPRule(namespace=self.ns_name) for subnet in router_port['subnets']: rtr_port_cidr = subnet['cidr'] ip_rule.rule.delete(ip=rtr_port_cidr, table=dvr_fip_ns.FIP_RT_TBL, priority=dvr_fip_ns.FAST_PATH_EXIT_PR) IpRuleCommand https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_lib.py#L486-L494 # TODO(Carl) ip ignored in delete, okay in general? He-he, experience shows that definitely not. We need to use the most specific rule description to avoid ordering issues. ip -4 rule del from 192.168.200.0/24 priority 8 table 16 type unicast With a fix it looks like this: 2018-03-29 20:58:57.023 192084 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef- 303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'del', 'from', '192.168.200.0/24', 'priority', '8', 'table', '16', 'type', 'unicast'] create_process /usr/lib/python2.7/dist- packages/neutron/agent/linux/utils.py:92 ** Affects: neutron Importance: Undecided Assignee: Dmitrii Shcherbakov (dmitriis) Status: In Progress ** Tags: cpe-onsite -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1759956 Title: [dvr][fast-exit] incorrect policy rules get deleted when a distributed router has ports on multiple tenant networks Status in neutron: In Progress Bug description: TL;DR: ip -4 rule del priority table type unicast will delete the first matching rule it encounters: if there are two rules with the same priority it will just kill the first one it finds. The original setup is described here: https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1759918 OpenStack Queens from UCA (xenial, GA kernel, deployed via OpenStack charms), 2 external subnets (one routed provider network), 2 tenant subnets all in the same address scope to trigger "fast exit". 2 tenant networks attached (subnets 192.168.100.0/24 and 192.168.200.0/24) to a DVR: # 2 rules as expected ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 8: from 192.168.100.0/24 lookup 16 8: from 192.168.200.0/24 lookup 16 # remove 192.168.200.0/24 sometimes deletes an incorrect policy rule openstack router remove subnet pubrouter othertenantsubnet # ip route del contains the cidr 2018-03-29 20:09:52.946 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'ne tns', 'exec', 'fip-d0f008fc-dc45-4237-9ce0-a9e1977735eb', 'ip', '-4', 'route', 'del', '192.168.200.0/24', 'via', '169.254.93.94', 'dev', 'fpr-4f9ca9ef-3' ] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 # ip rule delete is not that specific 2018-03-29 20:09:53.195 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'del', 'priority', '8', 'table', '16', 'type', 'unicast'] create_pr ocess /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:92 2018-03-29 20:15:59.210 2083594 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800', 'ip', '-4', 'rule', 'show'] create_process /usr/lib/python2.7/dist-packages
[Yahoo-eng-team] [Bug 1759120] [NEW] Objects are not returned if domain name is used instead of domain id
Public bug reported: # OS_USERNAME=user OS_USER_DOMAIN_NAME=admin_domain OS_PROJECT_NAME=admin # OS_PROJECT_DOMAIN_NAME=admin_domain openstack user list --domain testdomain -> users returned for testdomain # OS_USERNAME=user OS_USER_DOMAIN_NAME=testdomain OS_DOMAIN_NAME=testdomain + policy file modification openstack user list --domain 49a912df2669410faecc6e0ab5d8dc80 -> users returned for testdomain openstack user list --domain testdomain -> no users returned for testdomain The same is valid for projects and roles. Role assignments have slightly different policy rules in a sample file. Environment: OpenStack Pike (UCA) + a slightly modified https://github.com/openstack/keystone/blob/stable/pike/etc/policy.v3cloudsample.json file: https://paste.ubuntu.com/p/Zk7S7d7qm2/ "admin_and_matching_domain_id": "rule:admin_required and (domain_id:%(domain_id)s or domain_name:%(domain_id)s)", domain_name:%(domain_id)s - this was added to allow usage of --domain , not just ID as documented, e.g. here https://docs.openstack.org/python-openstackclient/pike/cli/command- objects/user.html#cmdoption-user-create-domain ("--domain Default domain (name or ID)") https://paste.ubuntu.com/p/D35vMMbdTm/ - the first part of this is a demonstration that a policy file is not enough to use --domain without policy file modification in a non-admin project, the second part is a demonstration of the problem after policy file modification. The domain_name is taken from auth_context and matched against domain_id API call argument as described here https://docs.openstack.org/keystone/pike/admin/identity-service-api- protection.html Debug mode traces for 3 different scenarios: https://paste.ubuntu.com/p/8ntVt69tYy/ I can see that the whole Admin scoping and policy enforcement implementation is being reworked [0][1][2][3] and UUID tokens were deprecated in Pike so "domain_name" usage from auth context is not a reliable thing to do [4]. If my understanding is correct, please duplicate or "won't fix" this and let this be a reference for others to look at. Usage of --domain argument with a domain name instead of a domain_id is a bit inconsistent in how it's documented in OSC docs because it seems to only work for the admin user with admin project scoped tokens (provided that sample policy files are used). [0] pad.lv/1750673 [1] https://review.openstack.org/#/c/526203/ [2] https://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/role-check-from-middleware.html [3] pad.lv/968696 [4] https://review.openstack.org/#/c/525325/ ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1759120 Title: Objects are not returned if domain name is used instead of domain id Status in OpenStack Identity (keystone): New Bug description: # OS_USERNAME=user OS_USER_DOMAIN_NAME=admin_domain OS_PROJECT_NAME=admin # OS_PROJECT_DOMAIN_NAME=admin_domain openstack user list --domain testdomain -> users returned for testdomain # OS_USERNAME=user OS_USER_DOMAIN_NAME=testdomain OS_DOMAIN_NAME=testdomain + policy file modification openstack user list --domain 49a912df2669410faecc6e0ab5d8dc80 -> users returned for testdomain openstack user list --domain testdomain -> no users returned for testdomain The same is valid for projects and roles. Role assignments have slightly different policy rules in a sample file. Environment: OpenStack Pike (UCA) + a slightly modified https://github.com/openstack/keystone/blob/stable/pike/etc/policy.v3cloudsample.json file: https://paste.ubuntu.com/p/Zk7S7d7qm2/ "admin_and_matching_domain_id": "rule:admin_required and (domain_id:%(domain_id)s or domain_name:%(domain_id)s)", domain_name:%(domain_id)s - this was added to allow usage of --domain , not just ID as documented, e.g. here https://docs.openstack.org/python-openstackclient/pike/cli/command- objects/user.html#cmdoption-user-create-domain ("--domain Default domain (name or ID)") https://paste.ubuntu.com/p/D35vMMbdTm/ - the first part of this is a demonstration that a policy file is not enough to use --domain without policy file modification in a non-admin project, the second part is a demonstration of the problem after policy file modification. The domain_name is taken from auth_context and matched against domain_id API call argument as described here https://docs.openstack.org/keystone/pike/admin/identity-service-api- protection.html Debug mode traces for 3 different scenarios: https://paste.ubuntu.com/p/8ntVt69tYy/ I can see that the whole Admin scoping and policy enforcement implementation is being reworked [0][1][2][3] and UUID tokens were deprecated in Pike so "domain_name" usage from auth context is not a reliable thing to do [4]. If my understanding is c