[Yahoo-eng-team] [Bug 2035332] [NEW] VLAN networks for North / South Traffic Broken

2023-09-13 Thread Graeme Moss
Public bug reported:

## Environment

### Deployment

- Ubuntu 22.04 LTS
- Openstack Release ZED
- Kolla-ansible - stable/zed repo
- Kolla - stable/zed repo
- Containers built with ubuntu 22.04 LTS
- Containers built on 2023-08-23
- OVN+DVR+VLAN tenant networks.
- We have three controllers occ1, occ2 occ3
- Neutron version neutron-21.1.3.dev34 commit 
d6ee668cc32725cb7d15d2e08fdb50a761f91fe4
- ovn-nbctl 22.09.1
- Open vSwitch Library 3.0.3
- DB Schema 6.3.0

1.  New provider network deployed into openstack, on vlan 504.
2.  Router connected to this provider network.
3.  Instance connected to provider network no FIP

## Issues

Attempting to send north/south traffic (ping 8.8.8.8), results in the
following symptoms. 2 pings are successful, all other pings fail, until
the ping is cancelled, and a couple of minutes pass, then two pings will
be successful again, then back to failing.

New routers with vlan networks attached don't create all three ports on
the controllers.

Even when fixing the localnet ports on the router to have three with
changing the priority when attaching a FIP the N/S traffic is limited to
2 pings

Only when setting `reside-on-redirect-chassis` to `True` can we get the
vlan to work with FIP and have baremetal nodes have FIP.

## Diagnostics

After looking at the ovn-controller logs on the control nodes we can see
that it tries to claim the port on occ0001. which matches the gateway
chassis on the routers LRP port.

```
2023-09-06T14:13:32.454Z|00718|binding|INFO|Claiming lport 
cr-lrp-1a089d8f-d7a3-4116-a496-94cb87abe57f for this chassis.
2023-09-06T14:13:32.454Z|00719|binding|INFO|cr-lrp-1a089d8f-d7a3-4116-a496-94cb87abe57f:
 Claiming fa:16:3e:fc:ba:cf 1xx.xx.xxx.xxx/25
```

Gateway chassis of the LRP port.

```
ovn-nbctl list Gateway_Chassis | grep -A2 -B4 
lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1

_uuid   : cf26be06-206d-443c-b224-25cc06ef2094
chassis_name: occ2
external_ids: {}
name: lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1_occ2
options : {}
priority: 2
--

_uuid   : 1d9e8314-ed00-4694-8974-0328b78d34e1
chassis_name: occ1
external_ids: {}
name: lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1_occ1
options : {}
priority: 3
--

_uuid   : b1e41ceb-ca2d-42eb-a896-b3551ea1b32f
chassis_name: occ3
external_ids: {}
name: lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1_occ3
options : {}
priority: 1
```

We see nothing about `occ2` or `occ3` trying to claim the LRP port but 
we found that when you change the priority around to try resolve, we can see 
that the port is not on `occ1` but is on occ0002
We change occ0001 = 1 and occ0003 = 3 which means `occ3` will be come the 
highest gateway.

```
ovn-nbctl set gateway_chassis 1d9e8314-ed00-4694-8974-0328b78d34e1 priority=1
ovn-nbctl set gateway_chassis b1e41ceb-ca2d-42eb-a896-b3551ea1b32f priority=3
```

the logs show the following.

occ0001

```
2023-09-06T14:10:06.134Z|00667|binding|INFO|Releasing lport 
cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 from this chassis (sb_readonly=0)
2023-09-06T14:10:06.134Z|00668|if_status|WARN|Trying to release unknown 
interface cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1
```

occ0002

```
2023-09-06T14:10:14.883Z|00444|binding|INFO|Releasing lport 
cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 from this chassis (sb_readonly=0)
2023-09-06T14:10:14.883Z|00445|if_status|WARN|Trying to release unknown 
interface cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1
```

occ0003

```
2023-09-06T14:10:14.789Z|00459|binding|INFO|Changing chassis for lport 
cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1 from occ2 to occ3.
2023-09-06T14:10:14.789Z|00460|binding|INFO|cr-lrp-71cf7286-de37-4d86-b362-eb7ba689d2d1:
 Claiming fa:16:3e:71:df:71 1xx.xx.xxx.xxx/25
```

on `occ3` we can see that `occ2` had the gateway and not
`occ1` which it should of had. This happens on creating new routers
on the vlan provider network.All exisiting Routers before upgrade are
working and that they have the same priority.

## Second diagnostics

Looking at each Logical Router we can see that when the router is first created 
that only two of the three ports are created.
Broken router:

```
_uuid   : 773bb527-f193-4b47-8685-e62c9325dd7b
copp: []
enabled : true
external_ids: {"neutron:availability_zone_hints"="", 
"neutron:gw_network_id"="c9d130bc-301d-45c0-9328-a6964af65579", 
"neutron:gw_port_id"="1a089d8f-d7a3-4116-a496-94cb87abe57f", 
"neutron:revision_number"="4", "neutron:router_name"=new-r1-test}
load_balancer   : []
load_balancer_group : []
name: neutron-2b51e12e-5505-477e-9720-e5db31a05790
nat : [f22e6004-ad69-4b12-9445-7006a03495f5]
options : {always_learn_from_arp_request="false", 

[Yahoo-eng-team] [Bug 1832092] Re: [rocky+] Creation of application credentials fails when role assignments only come from role assignments of federated groups

2022-05-06 Thread Graeme Moss
*** This bug is a duplicate of bug 1809116 ***
https://bugs.launchpad.net/bugs/1809116

** Also affects: charm-keystone-saml-mellon
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1832092

Title:
  [rocky+] Creation of application credentials fails when role
  assignments only come from role assignments of federated groups

Status in OpenStack Keystone SAML Mellon Charm:
  New
Status in OpenStack Identity (keystone):
  New

Bug description:
  [Version]
  Rocky (UCA)

  [Problem Description]

  (see the User Scenario section below for a description of the
  environment)

  When no direct role assignments to federated users are done and only
  federated group role assignments are present, application credential
  creation via Horizon fails with the following errors:

  horizon apache2 error.log:

  [Sat Jun 08 14:27:59.153479 2019] [wsgi:error] [pid 150327:tid
  139962773473024] [remote 10.232.46.207:35898] Recoverable error:
  Invalid application credential: Could not find role assignment with
  role: 91afa82fab85426fa741370dabad80bf, user or group:
  794d430997c64060854bf77f2e7e6e16, project, domain, or system:
  7de76f768cb84149b8b2d693d1d21f45. (HTTP 400) (Request-ID: req-
  da2e3322-2f6f-468f-bd0d-b08855f9893b)

  keystone.log:

  (keystone.common.wsgi): 2019-06-08 14:30:55,933 WARNING Invalid application 
credential: Could not find role assignment with role: 
91afa82fab85426fa741370dabad80bf, us
  er or group: 794d430997c64060854bf77f2e7e6e16, project, domain, or system: 
7de76f768cb84149b8b2d693d1d21f45.
  (keystone.middleware.auth): 2019-06-08 14:31:00,940 DEBUG Authenticating user 
token

  Code-path:

  create_application_credential -> _require_user_has_role_in_project ->
  _get_user_roles -> _get_user_roles -> list_role_assignments ->
  _list_effective_role_assignments -> _get_group_ids_for_user_id ->
  list_groups_for_user -> _get_group_ids_for_user_id

  A detailed rpdb trace:
  http://paste.openstack.org/show/752652/

   82   def _require_user_has_role_in_project(self, roles, user_id, 
project_id):
   83   user_roles = self._get_user_roles(user_id, project_id)
   84  ->   for role in roles:
   85   if role['id'] not in user_roles:
   86   raise 
exception.RoleAssignmentNotFound(role_id=role['id'],
   87  actor_id=user_id,
   88  
target_id=project_id)

  [Possible Solution]

  Group membership details obtained dynamically during federated
  authentication and embedded into a fernet token (first an unscoped
  token, then a project-scoped token) need to be used in addition to
  querying the database for user to group membership.

  [User Scenario]

  Federated authentication via SAML with the following mapping (i.e. no
  direct role assignment to a user on a project - only federated group-
  based role assignment):

  openstack mapping show adfs_mapping
  
+---++
  | Field | Value   

   |
  
+---++
  | id| adfs_mapping

   |
  | rules | [{'remote': [{'type': 'MELLON_NAME_ID'}, {'type': 
'MELLON_groups'}], 'local': [{'domain': {'id': 
'e834e57943714e058c203d4f544ea946'}, 'user': {'name': '{0}'}, 'groups': 
'{1}'}]}] |
  
+---++

  # a federated user
  openstack user list --domain adfs
  +--++
  | ID   | Name   |
  +--++
  | 794d430997c64060854bf77f2e7e6e16 | intranet\Administrator |
  +--++

  # a group that that exists both on the IdP and Keystone (SP) side
  openstack group list --domain adfs
  +--++
  | ID   | Name   |
  +--++
  |