[Yahoo-eng-team] [Bug 1975771] [NEW] instance stuck in BUILD state with vm_state building

2022-05-25 Thread Bjoern
Public bug reported:

Description
===

With a Train cellsv2 deployment we noticed an issue that instance randomly 
remain in BUILD state with vm_state building but nova-compute never seem to 
actually attempt building the instance.
Once we retry the instances may build which makes it hard to debug this issue 
and generally the infrastructure seem to work:

+--++
| Property | Value  
|
+--++
| OS-DCF:diskConfig| MANUAL 
|
| OS-EXT-AZ:availability_zone  | nova   
|
| OS-EXT-SRV-ATTR:host | -  
|
| OS-EXT-SRV-ATTR:hostname | test   
|
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -  
|
| OS-EXT-SRV-ATTR:instance_name| instance-26cd  
|
| OS-EXT-SRV-ATTR:kernel_id|
|
| OS-EXT-SRV-ATTR:launch_index | 0  
|
| OS-EXT-SRV-ATTR:ramdisk_id   |
|
| OS-EXT-SRV-ATTR:reservation_id   | r-rj1sb4zs 
|
| OS-EXT-SRV-ATTR:root_device_name | -  
|
| OS-EXT-SRV-ATTR:user_data| -  
|
| OS-EXT-STS:power_state   | 0  
|
| OS-EXT-STS:task_state| scheduling 
|
| OS-EXT-STS:vm_state  | building   
|
| OS-SRV-USG:launched_at   | -  
|
| OS-SRV-USG:terminated_at | -  
|
| accessIPv4   |
|
| accessIPv6   |
|
| config_drive |
|
| created  | 2022-05-25T22:59:18Z   
|
| description  | test   
|
| flavor:disk  | 1  
|
| flavor:ephemeral | 0  
|
| flavor:extra_specs   | {} 
|
| flavor:original_name | test-flavor
|
| flavor:ram   | 512
|
| flavor:swap  | 0  
|
| flavor:vcpus | 1  
|
| hostId   |
|
| host_status  |
|
| id   | 2a6cf0bf-8a25-4b9c-997f-e9dbfc7927e5   
|
| image| cirros-0.4.0-x86_64 
(15f38ee5-b94c-4bc0-a6f4-63cb308ba7bf) |
| key_name | -  
|
| locked   | False  
|
| locked_reason| -  
|
| metadata | {} 
|
| name | test   
|
| os-extended-volumes:volumes_attached | [] 
|
| progress | 0  
|
| server_groups| [] 
|
| status   | BUILD  
|

[Yahoo-eng-team] [Bug 1975743] [NEW] ML2 OVN - Creating an instance with hardware offloaded port is broken

2022-05-25 Thread Itai Levy
Public bug reported:

OpenStack Release: Yoga
Platform: Ubuntu focal

Creating an instance with vnic-type ‘direct’ port and ‘switchdev’ 
binding-profile is failing over the following validation error:
```
2022-05-25 19:13:40.331 125269 DEBUG neutron.api.v2.base 
[req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 
654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Request 
body: {'port': {'device_id': 'd46aef48-e42e-49c8-af9f-a83768747b4f', 
'device_owner': 'compute:nova', 'binding:profile': {'capabilities': 
['switchdev'], 'pci_vendor_info': '15b3:101e', 'pci_slot': ':08:03.2', 
'physical_network': None, 'card_serial_number': 'MT2034X11488', 
'pf_mac_address': '04:3f:72:9e:0b:a1', 'vf_num': 7}, 'binding:host_id': 
'node3.maas', 'dns_name': 'vm1'}} prepare_request_body 
/usr/lib/python3/dist-packages/neutron/api/v2/base.py:729


2022-05-25 19:13:40.429 125269 DEBUG neutron_lib.callbacks.manager 
[req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 
654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Publish 
callbacks 
['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-1311372',
 'neutron.services.ovn_l3.plugin.OVNL3RouterPlugin._port_update-8735219071964'] 
for port (0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7), before_update _notify_loop 
/usr/lib/python3/dist-packages/neutron_lib/callbacks/manager.py:176
2022-05-25 19:13:41.221 125269 DEBUG neutron.notifiers.nova 
[req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 
654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Ignoring 
state change previous_port_status: DOWN current_port_status: DOWN port_id 
0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7 record_port_status_changed 
/usr/lib/python3/dist-packages/neutron/notifiers/nova.py:233
2022-05-25 19:13:41.229 125269 DEBUG neutron_lib.callbacks.manager 
[req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 
654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Publish 
callbacks [] for port (0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7), precommit_update 
_notify_loop /usr/lib/python3/dist-packages/neutron_lib/callbacks/manager.py:176


2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers 
[req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 
654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Mechanism 
driver 'ovn' failed in update_port_precommit: 
neutron_lib.exceptions.InvalidInput: Invalid input for operation: Invalid 
binding:profile. too many parameters.
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers Traceback 
(most recent call last):
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 482, in 
_call_on_drivers
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers 
getattr(driver.obj, method_name)(context)
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py",
 line 792, in update_port_precommit
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers 
ovn_utils.validate_and_get_data_from_binding_profile(port)
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3/dist-packages/neutron/common/ovn/utils.py", line 266, in 
validate_and_get_data_from_binding_profile
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers raise 
n_exc.InvalidInput(error_message=msg)
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers 
neutron_lib.exceptions.InvalidInput: Invalid input for operation: Invalid 
binding:profile. too many parameters.
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers 
```

Seems like the issue is related to the commit from: 
https://review.opendev.org/c/openstack/neutron/+/818420


To reproduce:
https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ovn.html

1. Prepare a setup with SR-IOV adjusted for OVN HW Offload
2. Create a port with switchdev capabilities

$ openstack port create direct_overlay2 --vnic-type=direct --network
gen_data --binding-profile '{"capabilities":["switchdev"]}' --security-
group my_policy

3. Create an instance

$ openstack server create --key-name bastion --flavor d1.demo --image
ubuntu --port direct_overlay1 vm1 --availability-zone nova:node3.maas

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1975743

Title:
  ML2 OVN - Creating an instance with hardware offloaded port is broken

Status in neutron:
  New

Bug description:
  OpenStack Release: Yoga
  Platform: Ubuntu focal

  Creating an instance with 

[Yahoo-eng-team] [Bug 1975732] [NEW] System Reader cannot read system scope resources

2022-05-25 Thread Oleksandr Kozachenko
Public bug reported:

I created a user with project member role and assigned reader role with 
system_scope:all.
```
$ openstack role assignment list --names --system all --role reader
++---+---+-+++---+
| Role   | User  | Group | Project | Domain | System | Inherited |
++---+---+-+++---+
| reader |   user1@Default   |   | || all| False |
++---+---+-+++---+
```
But this user can only list resources in his project.
For example, failed to list all servers in the system with the following error.
```
$ openstack server list --all
Policy doesn't allow os_compute_api:servers:detail:get_all_tenants to be 
performed. (HTTP 403) (Request-ID: req-0be7173f-83cc-4917-9735-82e31464da32)

```
In nova api log, I can see `system_scope: None` in policy check.
```
Policy check for os_compute_api:servers:allow_all_filters failed with scope 
check {'is_admin': False, 'user_id': 'c0f8017926b496459fa91995a502c68c', 
'user_domain_id': 'default', 'system_scope': None, 'domain_id': None, 
'project_id': '62a1872ed4a9ef9865311576145b3baa', 'project_domain_id': 
'default', 'roles': ['reader'], 'is_admin_project': True, 'service_user_id': 
None, 'service_user_domain_id': None, 'service_project_id': None, 
'service_project_domain_id': None, 'service_roles': []} authorize 
/var/lib/openstack/lib/python3.8/site-packages/nova/policy.py:192

```

Also failed to get other resources such as service, endpoints, users which 
requires system scope permission.
Seems system scope is not working at all.

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1975732

Title:
  System Reader cannot read system scope resources

Status in OpenStack Identity (keystone):
  New

Bug description:
  I created a user with project member role and assigned reader role with 
system_scope:all.
  ```
  $ openstack role assignment list --names --system all --role reader
  ++---+---+-+++---+
  | Role   | User  | Group | Project | Domain | System | Inherited |
  ++---+---+-+++---+
  | reader |   user1@Default   |   | || all| False |
  ++---+---+-+++---+
  ```
  But this user can only list resources in his project.
  For example, failed to list all servers in the system with the following 
error.
  ```
  $ openstack server list --all
  Policy doesn't allow os_compute_api:servers:detail:get_all_tenants to be 
performed. (HTTP 403) (Request-ID: req-0be7173f-83cc-4917-9735-82e31464da32)

  ```
  In nova api log, I can see `system_scope: None` in policy check.
  ```
  Policy check for os_compute_api:servers:allow_all_filters failed with scope 
check {'is_admin': False, 'user_id': 'c0f8017926b496459fa91995a502c68c', 
'user_domain_id': 'default', 'system_scope': None, 'domain_id': None, 
'project_id': '62a1872ed4a9ef9865311576145b3baa', 'project_domain_id': 
'default', 'roles': ['reader'], 'is_admin_project': True, 'service_user_id': 
None, 'service_user_domain_id': None, 'service_project_id': None, 
'service_project_domain_id': None, 'service_roles': []} authorize 
/var/lib/openstack/lib/python3.8/site-packages/nova/policy.py:192

  ```

  Also failed to get other resources such as service, endpoints, users which 
requires system scope permission.
  Seems system scope is not working at all.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1975732/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1975711] Re: tox hangs due to pip backtracking during virtualenv generation

2022-05-25 Thread sean mooney
im settign this to opipion for now since this seam to be distro-specific

we have confirmed that this can happen on fedora 36 but we have also confirmed 
that
3.10 work on debian and nixos and we have ci jobs running non voting unit test 
on 3.10 on ubuntu 22.04

so in general this does not appear to be a nova but it looks like its
either a fedroa issue or its related to the tox/pip version that are
bing used.

we may be able to work around it in nova but I'm not sure we should do
that or can do that without change what we test and how we test in an
undesireable way.

** Changed in: nova
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1975711

Title:
  tox hangs due to pip backtracking during virtualenv generation

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Description
  ===
  On a fresh checkout of nova, running tox -e pep8 results in the process 
maxing out a CPU core and seemingly getting stuck (I terminated it after 30 
minutes of no progress).

  I believe this is due to pip trying to find a set of packages that
  exactly satisfy cross-requirements of all dependencies, checking
  multiple progressively older versions of each package until the tree
  becomes too complex to handle at all.

  Steps to reproduce
  ==

  * Make a fresh checkout of nova, a shallow one works since we only need 
master:
  git clone --depth 1 https://opendev.org/openstack/nova.git nova
  This makes sure the tox virtualenv from an existing checkout isn't reused.

  * From within the repo, run tox pep8 with verbosity to see pip output:
  $ tox -vvv -e pep8

  Expected result
  ===
  Tox successfully sets up its virtualenv and runs pep8.

  Actual result
  =
  pip downloads several versions of packages, outputting a large amount of 
messages like these for a few packages along the way:

  INFO: pip is looking at multiple versions of certifi to determine which 
version is compatible with other requirements. This could take a while.
Downloading certifi-2020.4.5-py2.py3-none-any.whl (156 kB)
   || 156 kB 81.6 MB/s 
Downloading certifi-2019.11.28-py2.py3-none-any.whl (156 kB)
   || 156 kB 86.8 MB/s 
Downloading certifi-2019.9.11-py2.py3-none-any.whl (154 kB)
   || 154 kB 79.5 MB/s 
Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB)
   || 157 kB 71.6 MB/s 
Downloading certifi-2019.3.9-py2.py3-none-any.whl (158 kB)
   || 158 kB 84.7 MB/s
  INFO: This is taking longer than usual. You might need to provide the 
dependency resolver with stricter constraints to reduce runtime. If you want to 
abort this run, you can press Ctrl + C to do so. To improve how pip performs, 
tell us what happened here: https://pip.pypa.io/surveys/backtracking

  Eventually it seems to get completely stuck after one of those
  downloads, maxing out a CPU core and seemingly making no more progress
  until terminated.

  Environment
  ===
  This happens in dev environments no in Openstack deployments. We've 
reproduced it on Fedora 35 and 36, I would expect others to be similarly 
impacted. Some system python env info:

  $ python -V
  Python 3.10.4

  $ pip show pip
  Name: pip
  Version: 21.3.1

  $ pip show tox
  Name: tox
  Version: 3.24.5

  Logs & Configs
  ==
  Reproduced on a fresh checkout with no altered configs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1975711/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1975711] [NEW] tox hangs due to pip backtracking during virtualenv generation

2022-05-25 Thread Miguel
Public bug reported:

Description
===
On a fresh checkout of nova, running tox -e pep8 results in the process maxing 
out a CPU core and seemingly getting stuck (I terminated it after 30 minutes of 
no progress).

I believe this is due to pip trying to find a set of packages that
exactly satisfy cross-requirements of all dependencies, checking
multiple progressively older versions of each package until the tree
becomes too complex to handle at all.

Steps to reproduce
==

* Make a fresh checkout of nova, a shallow one works since we only need master:
git clone --depth 1 https://opendev.org/openstack/nova.git nova
This makes sure the tox virtualenv from an existing checkout isn't reused.

* From within the repo, run tox pep8 with verbosity to see pip output:
$ tox -vvv -e pep8

Expected result
===
Tox successfully sets up its virtualenv and runs pep8.

Actual result
=
pip downloads several versions of packages, outputting a large amount of 
messages like these for a few packages along the way:

INFO: pip is looking at multiple versions of certifi to determine which version 
is compatible with other requirements. This could take a while.
  Downloading certifi-2020.4.5-py2.py3-none-any.whl (156 kB)
 || 156 kB 81.6 MB/s 
  Downloading certifi-2019.11.28-py2.py3-none-any.whl (156 kB)
 || 156 kB 86.8 MB/s 
  Downloading certifi-2019.9.11-py2.py3-none-any.whl (154 kB)
 || 154 kB 79.5 MB/s 
  Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB)
 || 157 kB 71.6 MB/s 
  Downloading certifi-2019.3.9-py2.py3-none-any.whl (158 kB)
 || 158 kB 84.7 MB/s
INFO: This is taking longer than usual. You might need to provide the 
dependency resolver with stricter constraints to reduce runtime. If you want to 
abort this run, you can press Ctrl + C to do so. To improve how pip performs, 
tell us what happened here: https://pip.pypa.io/surveys/backtracking

Eventually it seems to get completely stuck after one of those
downloads, maxing out a CPU core and seemingly making no more progress
until terminated.

Environment
===
This happens in dev environments no in Openstack deployments. We've reproduced 
it on Fedora 35 and 36, I would expect others to be similarly impacted. Some 
system python env info:

$ python -V
Python 3.10.4

$ pip show pip
Name: pip
Version: 21.3.1

$ pip show tox
Name: tox
Version: 3.24.5

Logs & Configs
==
Reproduced on a fresh checkout with no altered configs.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1975711

Title:
  tox hangs due to pip backtracking during virtualenv generation

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  On a fresh checkout of nova, running tox -e pep8 results in the process 
maxing out a CPU core and seemingly getting stuck (I terminated it after 30 
minutes of no progress).

  I believe this is due to pip trying to find a set of packages that
  exactly satisfy cross-requirements of all dependencies, checking
  multiple progressively older versions of each package until the tree
  becomes too complex to handle at all.

  Steps to reproduce
  ==

  * Make a fresh checkout of nova, a shallow one works since we only need 
master:
  git clone --depth 1 https://opendev.org/openstack/nova.git nova
  This makes sure the tox virtualenv from an existing checkout isn't reused.

  * From within the repo, run tox pep8 with verbosity to see pip output:
  $ tox -vvv -e pep8

  Expected result
  ===
  Tox successfully sets up its virtualenv and runs pep8.

  Actual result
  =
  pip downloads several versions of packages, outputting a large amount of 
messages like these for a few packages along the way:

  INFO: pip is looking at multiple versions of certifi to determine which 
version is compatible with other requirements. This could take a while.
Downloading certifi-2020.4.5-py2.py3-none-any.whl (156 kB)
   || 156 kB 81.6 MB/s 
Downloading certifi-2019.11.28-py2.py3-none-any.whl (156 kB)
   || 156 kB 86.8 MB/s 
Downloading certifi-2019.9.11-py2.py3-none-any.whl (154 kB)
   || 154 kB 79.5 MB/s 
Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB)
   || 157 kB 71.6 MB/s 
Downloading certifi-2019.3.9-py2.py3-none-any.whl (158 kB)
   || 158 kB 84.7 MB/s
  INFO: This is taking longer than usual. You might need to provide the 
dependency resolver with stricter constraints

[Yahoo-eng-team] [Bug 1975609] Re: requirements-check job failing on PrettyTable

2022-05-25 Thread Brian Rosmaita
Fixed in openstack/requirements by
https://review.opendev.org/c/openstack/requirements/+/843191

** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1975609

Title:
  requirements-check job failing on PrettyTable

Status in Glance:
  Fix Released

Bug description:
  I've got a patch up that updates requirements.txt, and I'm getting a
  requirements-check job failure (not due to the requirement I'm
  actually changing on the patch).

  Patch is:
  https://review.opendev.org/c/openstack/glance/+/841135

  Failure is:

  WARNING: possible mismatch found for package "PrettyTable"
 Attribute "package" does not match
 "PrettyTable" does not match "prettytable"
 Requirement(package='PrettyTable', location='', specifiers='>=0.7.1', 
markers='', comment='# BSD', extras=frozenset())
 Requirement(package='prettytable', location='', specifiers='', markers='', 
comment='# BSD', extras=frozenset())
  ERROR: Package 'prettytable' requirement does not match number of lines (2) 
in openstack/requirements

  Both 'PrettyTable' and 'prettytable' are in global-requirements, but
  it's only 'prettytable' in upper-constraints.

  My guesses are:
  - maybe it needs to be all lowercase? (doubt it, but you never know)
  - maybe the version we're specifying as a minimum is so old it can't run in 
python 3.8?

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1975609/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1975692] [NEW] OVN migration failed due to unhandled error in neutron_ovn_db_sync_util

2022-05-25 Thread Rodolfo Alonso
Public bug reported:

While performing ovs2ovn migration the following exception occurred,
from neutron/neutron-ovn-db-sync-util.log.1 on controller-0

Log: https://paste.opendev.org/show/b4OJEldZ3IBAjAJ1xOAd/

This bug is related to:
- https://bugs.launchpad.net/neutron/+bug/1939704
- https://bugs.launchpad.net/neutron/+bug/1964640

Bugzilla reference: (OSP16.2)
https://bugzilla.redhat.com/show_bug.cgi?id=2087721

** Affects: neutron
 Importance: Medium
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

** Changed in: neutron
   Importance: Undecided => Medium

** Changed in: neutron
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1975692

Title:
  OVN migration failed due to unhandled error in
  neutron_ovn_db_sync_util

Status in neutron:
  In Progress

Bug description:
  While performing ovs2ovn migration the following exception occurred,
  from neutron/neutron-ovn-db-sync-util.log.1 on controller-0

  Log: https://paste.opendev.org/show/b4OJEldZ3IBAjAJ1xOAd/

  This bug is related to:
  - https://bugs.launchpad.net/neutron/+bug/1939704
  - https://bugs.launchpad.net/neutron/+bug/1964640

  Bugzilla reference: (OSP16.2)
  https://bugzilla.redhat.com/show_bug.cgi?id=2087721

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1975692/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1947127] Fix included in openstack/neutron 19.3.0

2022-05-25 Thread OpenStack Infra
This issue was fixed in the openstack/neutron 19.3.0  release.

** Changed in: cloud-archive/xena
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1947127

Title:
  [SRU] Some DNS extensions not working with OVN

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive xena series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New
Status in neutron source package in Impish:
  New
Status in neutron source package in Jammy:
  New
Status in neutron source package in Kinetic:
  New

Bug description:
  [Impact]

  On a fresh devstack install with the q-dns service enable from the
  neutron devstack plugin, some features still don't work, e.g.:

  $ openstack subnet set private-subnet --dns-publish-fixed-ip
  BadRequestException: 400: Client Error for url: 
https://10.250.8.102:9696/v2.0/subnets/9f50c79e-6396-4c5b-be92-f64aa0f25beb, 
Unrecognized attribute(s) 'dns_publish_fixed_ip'

  $ openstack port create p1 --network private --dns-name p1 --dns-domain a.b.
  BadRequestException: 400: Client Error for url: 
https://10.250.8.102:9696/v2.0/ports, Unrecognized attribute(s) 'dns_domain'

  The reason seems to be that
  
https://review.opendev.org/c/openstack/neutron/+/686343/31/neutron/common/ovn/extensions.py
  only added dns_domain_keywords, but not e.g. dns_domain_ports as
  supported by OVN

  [Test Case]

  Create a normal OpenStack neutron test environment to see if we can
  successfully run the following commands:

  openstack subnet set private_subnet --dns-publish-fixed-ip
  openstack port create p1 --network private --dns-name p1 --dns-domain a.b.

  [Regression Potential]

  The fix has merged into the upstream stable/xena branch [1], here's
  just SRU into the 19.1.0 branch of UCA xena (the fix is already in
  20.0.0 so it's already in jammy and kinetic and focal-yoga), so it is
  a clean backport and might be helpful for deployments migrating to
  OVN.

  [1] https://review.opendev.org/c/openstack/neutron/+/838650

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1947127/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1975686] [NEW] MEM_ENCRYPTION_CONTEXT trait is missing from the compute RP even if AMD SEV is enabled on the compute node

2022-05-25 Thread Balazs Gibizer
Public bug reported:

Compute nodes with amd-sev enabled are reporting that support is
available but MEM_ENCRYPTION_CONTEXT is not present in the placement
traits for the compute nodes.

# Domain capabilites report support
[heat-admin@computeamdsev-1 log]$ sudo podman exec -it -u root nova_virtqemud 
virsh domcapabilities | grep -A 12 features
  






  47
  1
  509
  0

  


# It is active as well in /sys/module/kvm_amd
[heat-admin@computeamdsev-1 log]$ cat /sys/module/kvm_amd/parameters/sev
Y
[heat-admin@computeamdsev-1 log]$

# I do not see any errors with sev during startup
[heat-admin@computeamdsev-1 log]$ sudo dmesg | grep -i sev
[0.00] Command line: 
BOOT_IMAGE=(lvmid/nZkWaZ-f6bk-Bfto-h9OG-k1Sc-Y6RB-1Q3yZV/t77pr1-3H2Y-ml4l-MMJh-bp3H-zk2j-6z4W6w)/boot/vmlinuz-5.14.0-70.5.1.el9_0.x86_64
 root=LABEL=img-rootfs ro console=ttyS0 console=ttyS0,115200n81 no_timer_check 
crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M hugepagesz=1GB hugepages=32 
default_hugepagesz=1GB mem_encrypt=on kvm_amd.sev=1 console=tty0 
console=ttyS0,115200 no_timer_check nofb nomodeset vga=normal console=tty0 
console=ttyS0,115200 audit=1 nousb
[0.00] Kernel command line: 
BOOT_IMAGE=(lvmid/nZkWaZ-f6bk-Bfto-h9OG-k1Sc-Y6RB-1Q3yZV/t77pr1-3H2Y-ml4l-MMJh-bp3H-zk2j-6z4W6w)/boot/vmlinuz-5.14.0-70.5.1.el9_0.x86_64
 root=LABEL=img-rootfs ro console=ttyS0 console=ttyS0,115200n81 no_timer_check 
crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M hugepagesz=1GB hugepages=32 
default_hugepagesz=1GB mem_encrypt=on kvm_amd.sev=1 console=tty0 
console=ttyS0,115200 no_timer_check nofb nomodeset vga=normal console=tty0 
console=ttyS0,115200 audit=1 nousb
[0.00] Any video related functionality will be severely degraded, and 
you may not even be able to suspend the system properly
[  101.753478] ccp :24:00.1: sev enabled
[  101.769894] ccp :24:00.1: SEV firmware update successful
[  102.058746] ccp :24:00.1: SEV API:0.24 build:14
[  120.398153] systemd[1]: Hostname set to .
[  149.487548] SEV supported: 509 ASIDs

# MEM_ENCRYPTION_CONTEXT is not present
(overcloud) [stack@undercloud-0 ~]$ !21
openstack  --os-placement-api-version 1.17 resource provider trait list 
ba3bccf9-c283-4cb5-a14d-35ae7ba88533
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: 
One or more Python packages bundled by this ansible-core distribution were 
already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core 
distribution were already '
+---+
| name  |
+---+
| COMPUTE_GRAPHICS_MODEL_NONE   |
| COMPUTE_ACCELERATORS  |
| COMPUTE_NET_VIF_MODEL_VMXNET3 |
| COMPUTE_STORAGE_BUS_VIRTIO|
| COMPUTE_NET_VIF_MODEL_E1000E  |
| COMPUTE_VOLUME_ATTACH_WITH_TAG|
| COMPUTE_NET_ATTACH_INTERFACE  |
| HW_CPU_X86_BMI2   |
| COMPUTE_VOLUME_EXTEND |
| HW_CPU_X86_SSE|
| COMPUTE_NET_VIF_MODEL_RTL8139 |
| COMPUTE_GRAPHICS_MODEL_VIRTIO |
| COMPUTE_IMAGE_TYPE_RAW|
| COMPUTE_TRUSTED_CERTS |
| HW_CPU_X86_SSE42  |
| HW_CPU_X86_SSSE3  |
| HW_CPU_X86_SSE2   |
| COMPUTE_STORAGE_BUS_IDE   |
| COMPUTE_SECURITY_UEFI_SECURE_BOOT |
| COMPUTE_SOCKET_PCI_NUMA_AFFINITY  |
| COMPUTE_IMAGE_TYPE_AMI|
| COMPUTE_GRAPHICS_MODEL_CIRRUS |
| COMPUTE_VOLUME_MULTI_ATTACH   |
| HW_CPU_X86_SSE4A  |
| HW_CPU_X86_SSE41  |
| COMPUTE_IMAGE_TYPE_QCOW2  |
| COMPUTE_IMAGE_TYPE_AKI|
| HW_CPU_X86_AVX2   |
| HW_CPU_X86_FMA3   |
| HW_CPU_X86_MMX|
| HW_CPU_HYPERTHREADING |
| COMPUTE_NET_VIF_MODEL_NE2K_PCI|
| HW_CPU_X86_SVM|
| HW_CPU_X86_AVX|
| COMPUTE_IMAGE_TYPE_ISO|
| HW_CPU_X86_CLMUL  |
| HW_CPU_X86_ABM|
| COMPUTE_NET_VIF_MODEL_SPAPR_VLAN  |
| COMPUTE_STORAGE_BUS_SCSI  |
| HW_CPU_X86_AMD_SVM|
| COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG |
| COMPUTE_STORAGE_BUS_FDC   |
| COMPUTE_NET_VIF_MODEL_VIRTIO  |
| COMPUTE_NET_VIF_MODEL_PCNET   |
| COMPUTE_STORAGE_BUS_SATA  |
| HW_CPU_X86_F16C   |
| COMPUTE_NET_VIF_MODEL_E1000   |
| COMPUTE_DEVICE_TAGGING|
| COMPUTE_NODE  |
| COMPUTE_GRAPHICS_MODEL_VGA|
| COMPUTE_IMAGE_TYPE_ARI|
| HW_CPU_X86_SHA|
| HW_CPU_X86_AESNI  |
| COMPUTE_RESCUE_BFV|
| COMPUTE_STO

[Yahoo-eng-team] [Bug 1943631] Re: Neutron with OVN fails to bind port if hostname has dots

2022-05-25 Thread Vladislav Belogrudov
I could re-create the same with the latest devstack and long hostname

stack@myci:~/devstack$ hostname
myci.home.org


stack@myci:~/devstack$ sudo ovs-vsctl list open_vswitch
_uuid   : 3859f81a-2f37-456c-b3da-bb068f30310f
bridges : [06b3fc03-5783-4401-be39-2562836f2058, 
3e1363b6-fb78-4160-a41e-9c47441ca481]
cur_cfg : 2
datapath_types  : [netdev, system]
datapaths   : {}
db_version  : "8.2.0"
dpdk_initialized: false
dpdk_version: none
external_ids: {hostname=myci, ovn-bridge=br-int, 
ovn-bridge-mappings="public:br-ex", ovn-cms-options=enable-chassis-as-gw, 
ovn-encap-ip="192.168.122.20", ovn-encap-type=geneve, 
ovn-remote="tcp:192.168.122.20:6642", rundir="/var/run/openvswitch", 
system-id="cd754343-5266-4d01-8328-f462916a2a2c"}
iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre, 
lisp, patch, stt, system, tap, vxlan]
manager_options : [a92f4ff7-f42f-4f4c-962a-647c82287cf1]
next_cfg: 2
other_config: {}
ovs_version : "2.13.5"
ssl : []
statistics  : {}
system_type : ubuntu
system_version  : "20.04"


A fix for this config:

stack@myci:~/devstack$ sudo ovs-vsctl set open_vswitch . external-
ids:hostname='myci.home.org'


After that everything works. So this is rather devstack bug


** Changed in: neutron
   Status: Expired => Confirmed

** Project changed: neutron => devstack

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1943631

Title:
  Neutron with OVN fails to bind port if hostname has dots

Status in devstack:
  Confirmed

Bug description:
  If the hostname has dots, as in for example "devstack.localdomain",
  when trying to create an instance, Neutron will fail with "Failed to
  bind port".

  Reproduced with the latest DevStack from master on top of Ubuntu
  Server 20.04.3 LTS with latest packages installed. Minimal
  installation, no additional packages installed (only removed
  python3-simplejson and python3-pyasn1-modules due to recent issues
  with those packages[1]).

  I find this weird, because afaik TripleO uses FQDNs, so in theory
  Neutron with OVN should break their CI (although I'm not sure if they
  use OVN or Open vSwitch). I'm still not sure if this is on my end or
  not, but I was able to reproduce this consistently, trying different
  hostnames, trying the most minimal local.conf possible (setting only
  passwords), so I decided to report it as a bug.

  Steps to reproduce:
  1. Set the system's hostname to something with at least one dot
  2. Deploy DevStack
  3. Create instance on any network
  4. Inspect devstack@q-svc.service to see the error

  Expected output:
  Instance is created successfully

  Actual output:
  Instance enters error state and devstack@q-svc.service reports this: 
https://paste.openstack.org/show/809315/

  Version:
  DevStack from master
  Ubuntu Server 20.04.3 LTS
  OVN mechanism driver

  Environment (local.conf): https://paste.openstack.org/show/809316/

  Perceived severity: low

  [1] https://bugs.launchpad.net/devstack/+bug/1871485

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1943631/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1975674] [NEW] Neutron agent blocks during VM deletion when a remote security group is involved

2022-05-25 Thread Henning Eggers
Public bug reported:

When deleting a VM that has a security group referring to a remote
security group, the neutron agent will block for as long as it takes to
remove the respective flows. This happens when the remote security group
contains many (thousands) ports referring to other VMs.

Steps to reproduce:
  - Create a VM with security group A
  - Add a rule to security group A allowing access from a remote security group 
B
  - Add a large number or ports to security group B (e.g. 2000)
- The respective ovs flows will be added
  - Delete the VM
- The ovs flows will be removed

Expected:
  - VM and flow to be deleted within seconds
  - No impact to other VMs on the same hypervisor

Actual:
  - Flow deletion takes a long time, sometimes up to 10 minutes
  - While flows are being deleted, no VMs can be created on the same hypervisor

The reason for this behavior is that under the hood the agent calls ovs-
ofctl (via execve()) once for each port in the remote security group.
These calls quickly add up to minutes if there are many ports.

The proposed solution would be to use deferred execution for the flow
deletion. In that case it becomes a bulk operation and around 400 flows
are deleted in one call. In addition it runs in the background and does
not block the agent for other operations.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1975674

Title:
  Neutron agent blocks during VM deletion when a remote security group
  is involved

Status in neutron:
  New

Bug description:
  When deleting a VM that has a security group referring to a remote
  security group, the neutron agent will block for as long as it takes
  to remove the respective flows. This happens when the remote security
  group contains many (thousands) ports referring to other VMs.

  Steps to reproduce:
- Create a VM with security group A
- Add a rule to security group A allowing access from a remote security 
group B
- Add a large number or ports to security group B (e.g. 2000)
  - The respective ovs flows will be added
- Delete the VM
  - The ovs flows will be removed

  Expected:
- VM and flow to be deleted within seconds
- No impact to other VMs on the same hypervisor

  Actual:
- Flow deletion takes a long time, sometimes up to 10 minutes
- While flows are being deleted, no VMs can be created on the same 
hypervisor

  The reason for this behavior is that under the hood the agent calls
  ovs-ofctl (via execve()) once for each port in the remote security
  group. These calls quickly add up to minutes if there are many ports.

  The proposed solution would be to use deferred execution for the flow
  deletion. In that case it becomes a bulk operation and around 400
  flows are deleted in one call. In addition it runs in the background
  and does not block the agent for other operations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1975674/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp