[Yahoo-eng-team] [Bug 2035332] [NEW] VLAN networks for North / South Traffic Broken
Public bug reported: ## Environment ### Deployment - Ubuntu 22.04 LTS - Openstack Release ZED - Kolla-ansible - stable/zed repo - Kolla - stable/zed repo - Containers built with ubuntu 22.04 LTS - Containers built on 2023-08-23 - OVN+DVR+VLAN tenant networks. - We have three controllers occ1, occ2 occ3 - Neutron version neutron-21.1.3.dev34 commit d6ee668cc32725cb7d15d2e08fdb50a761f91fe4 - ovn-nbctl 22.09.1 - Open vSwitch Library 3.0.3 - DB Schema 6.3.0 1. New provider network deployed into openstack, on vlan 504. 2. Router connected to this provider network. 3. Instance connected to provider network no FIP ## Issues Attempting to send north/south traffic (ping 8.8.8.8), results in the following symptoms. 2 pings are successful, all other pings fail, until the ping is cancelled, and a couple of minutes pass, then two pings will be successful again, then back to failing. New routers with vlan networks attached don't create all three ports on the controllers. Even when fixing the localnet ports on the router to have three with changing the priority when attaching a FIP the N/S traffic is limited to 2 pings Only when setting `reside-on-redirect-chassis` to `True` can we get the vlan to work with FIP and have baremetal nodes have FIP. ## Diagnostics After looking at the ovn-controller logs on the control nodes we can see that it tries to claim the port on occ0001. which matches the gateway chassis on the routers LRP port. ``` 2023-09-06T14:13:32.454Z|00718|binding|INFO|Claiming lport cr-lrp-1a089d8f-d7a3-4116-a496-94cb87abe57f for this chassis. 2023-09-06T14:13:32.454Z|00719|binding|INFO|cr-lrp-1a089d8f-d7a3-4116-a496-94cb87abe57f: Claiming fa:16:3e:fc:ba:cf 1xx.xx.xxx.xxx/25 ``` Gateway chassis of the LRP port. ``` ovn-nbctl list Gateway_Chassis | grep -A2 -B4 lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 _uuid : cf26be06-206d-443c-b224-25cc06ef2094 chassis_name: occ2 external_ids: {} name: lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1_occ2 options : {} priority: 2 -- _uuid : 1d9e8314-ed00-4694-8974-0328b78d34e1 chassis_name: occ1 external_ids: {} name: lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1_occ1 options : {} priority: 3 -- _uuid : b1e41ceb-ca2d-42eb-a896-b3551ea1b32f chassis_name: occ3 external_ids: {} name: lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1_occ3 options : {} priority: 1 ``` We see nothing about `occ2` or `occ3` trying to claim the LRP port but we found that when you change the priority around to try resolve, we can see that the port is not on `occ1` but is on occ0002 We change occ0001 = 1 and occ0003 = 3 which means `occ3` will be come the highest gateway. ``` ovn-nbctl set gateway_chassis 1d9e8314-ed00-4694-8974-0328b78d34e1 priority=1 ovn-nbctl set gateway_chassis b1e41ceb-ca2d-42eb-a896-b3551ea1b32f priority=3 ``` the logs show the following. occ0001 ``` 2023-09-06T14:10:06.134Z|00667|binding|INFO|Releasing lport cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 from this chassis (sb_readonly=0) 2023-09-06T14:10:06.134Z|00668|if_status|WARN|Trying to release unknown interface cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 ``` occ0002 ``` 2023-09-06T14:10:14.883Z|00444|binding|INFO|Releasing lport cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 from this chassis (sb_readonly=0) 2023-09-06T14:10:14.883Z|00445|if_status|WARN|Trying to release unknown interface cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 ``` occ0003 ``` 2023-09-06T14:10:14.789Z|00459|binding|INFO|Changing chassis for lport cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 from occ2 to occ3. 2023-09-06T14:10:14.789Z|00460|binding|INFO|cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1: Claiming fa:16:3e:71:df:71 1xx.xx.xxx.xxx/25 ``` on `occ3` we can see that `occ2` had the gateway and not `occ1` which it should of had. This happens on creating new routers on the vlan provider network.All exisiting Routers before upgrade are working and that they have the same priority. ## Second diagnostics Looking at each Logical Router we can see that when the router is first created that only two of the three ports are created. Broken router: ``` _uuid : 773bb527-f193-4b47-8685-e62c9325dd7b copp: [] enabled : true external_ids: {"neutron:availability_zone_hints"="", "neutron:gw_network_id"="c9d130bc-301d-45c0-9328-a6964af65579", "neutron:gw_port_id"="1a089d8f-d7a3-4116-a496-94cb87abe57f", "neutron:revision_number"="4", "neutron:router_name"=new-r1-test} load_balancer : [] load_balancer_group : [] name: neutron-2b51e12e-5505-477e-9720-e5db31a05790 nat : [f22e6004-ad69-4b12-9445-7006a03495f5] options : {always_learn_from_arp_request="false",
[Yahoo-eng-team] [Bug 1832092] Re: [rocky+] Creation of application credentials fails when role assignments only come from role assignments of federated groups
*** This bug is a duplicate of bug 1809116 *** https://bugs.launchpad.net/bugs/1809116 ** Also affects: charm-keystone-saml-mellon Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1832092 Title: [rocky+] Creation of application credentials fails when role assignments only come from role assignments of federated groups Status in OpenStack Keystone SAML Mellon Charm: New Status in OpenStack Identity (keystone): New Bug description: [Version] Rocky (UCA) [Problem Description] (see the User Scenario section below for a description of the environment) When no direct role assignments to federated users are done and only federated group role assignments are present, application credential creation via Horizon fails with the following errors: horizon apache2 error.log: [Sat Jun 08 14:27:59.153479 2019] [wsgi:error] [pid 150327:tid 139962773473024] [remote 10.232.46.207:35898] Recoverable error: Invalid application credential: Could not find role assignment with role: 91afa82fab85426fa741370dabad80bf, user or group: 794d430997c64060854bf77f2e7e6e16, project, domain, or system: 7de76f768cb84149b8b2d693d1d21f45. (HTTP 400) (Request-ID: req- da2e3322-2f6f-468f-bd0d-b08855f9893b) keystone.log: (keystone.common.wsgi): 2019-06-08 14:30:55,933 WARNING Invalid application credential: Could not find role assignment with role: 91afa82fab85426fa741370dabad80bf, us er or group: 794d430997c64060854bf77f2e7e6e16, project, domain, or system: 7de76f768cb84149b8b2d693d1d21f45. (keystone.middleware.auth): 2019-06-08 14:31:00,940 DEBUG Authenticating user token Code-path: create_application_credential -> _require_user_has_role_in_project -> _get_user_roles -> _get_user_roles -> list_role_assignments -> _list_effective_role_assignments -> _get_group_ids_for_user_id -> list_groups_for_user -> _get_group_ids_for_user_id A detailed rpdb trace: http://paste.openstack.org/show/752652/ 82 def _require_user_has_role_in_project(self, roles, user_id, project_id): 83 user_roles = self._get_user_roles(user_id, project_id) 84 -> for role in roles: 85 if role['id'] not in user_roles: 86 raise exception.RoleAssignmentNotFound(role_id=role['id'], 87 actor_id=user_id, 88 target_id=project_id) [Possible Solution] Group membership details obtained dynamically during federated authentication and embedded into a fernet token (first an unscoped token, then a project-scoped token) need to be used in addition to querying the database for user to group membership. [User Scenario] Federated authentication via SAML with the following mapping (i.e. no direct role assignment to a user on a project - only federated group- based role assignment): openstack mapping show adfs_mapping +---++ | Field | Value | +---++ | id| adfs_mapping | | rules | [{'remote': [{'type': 'MELLON_NAME_ID'}, {'type': 'MELLON_groups'}], 'local': [{'domain': {'id': 'e834e57943714e058c203d4f544ea946'}, 'user': {'name': '{0}'}, 'groups': '{1}'}]}] | +---++ # a federated user openstack user list --domain adfs +--++ | ID | Name | +--++ | 794d430997c64060854bf77f2e7e6e16 | intranet\Administrator | +--++ # a group that that exists both on the IdP and Keystone (SP) side openstack group list --domain adfs +--++ | ID | Name | +--++ |