[Yahoo-eng-team] [Bug 1975771] [NEW] instance stuck in BUILD state with vm_state building
Public bug reported: Description === With a Train cellsv2 deployment we noticed an issue that instance randomly remain in BUILD state with vm_state building but nova-compute never seem to actually attempt building the instance. Once we retry the instances may build which makes it hard to debug this issue and generally the infrastructure seem to work: +--++ | Property | Value | +--++ | OS-DCF:diskConfig| MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | - | | OS-EXT-SRV-ATTR:hostname | test | | OS-EXT-SRV-ATTR:hypervisor_hostname | - | | OS-EXT-SRV-ATTR:instance_name| instance-26cd | | OS-EXT-SRV-ATTR:kernel_id| | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-rj1sb4zs | | OS-EXT-SRV-ATTR:root_device_name | - | | OS-EXT-SRV-ATTR:user_data| - | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state| scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | - | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | config_drive | | | created | 2022-05-25T22:59:18Z | | description | test | | flavor:disk | 1 | | flavor:ephemeral | 0 | | flavor:extra_specs | {} | | flavor:original_name | test-flavor | | flavor:ram | 512 | | flavor:swap | 0 | | flavor:vcpus | 1 | | hostId | | | host_status | | | id | 2a6cf0bf-8a25-4b9c-997f-e9dbfc7927e5 | | image| cirros-0.4.0-x86_64 (15f38ee5-b94c-4bc0-a6f4-63cb308ba7bf) | | key_name | - | | locked | False | | locked_reason| - | | metadata | {} | | name | test | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | server_groups| [] | | status | BUILD |
[Yahoo-eng-team] [Bug 1975743] [NEW] ML2 OVN - Creating an instance with hardware offloaded port is broken
Public bug reported: OpenStack Release: Yoga Platform: Ubuntu focal Creating an instance with vnic-type ‘direct’ port and ‘switchdev’ binding-profile is failing over the following validation error: ``` 2022-05-25 19:13:40.331 125269 DEBUG neutron.api.v2.base [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Request body: {'port': {'device_id': 'd46aef48-e42e-49c8-af9f-a83768747b4f', 'device_owner': 'compute:nova', 'binding:profile': {'capabilities': ['switchdev'], 'pci_vendor_info': '15b3:101e', 'pci_slot': ':08:03.2', 'physical_network': None, 'card_serial_number': 'MT2034X11488', 'pf_mac_address': '04:3f:72:9e:0b:a1', 'vf_num': 7}, 'binding:host_id': 'node3.maas', 'dns_name': 'vm1'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:729 2022-05-25 19:13:40.429 125269 DEBUG neutron_lib.callbacks.manager [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Publish callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-1311372', 'neutron.services.ovn_l3.plugin.OVNL3RouterPlugin._port_update-8735219071964'] for port (0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7), before_update _notify_loop /usr/lib/python3/dist-packages/neutron_lib/callbacks/manager.py:176 2022-05-25 19:13:41.221 125269 DEBUG neutron.notifiers.nova [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Ignoring state change previous_port_status: DOWN current_port_status: DOWN port_id 0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7 record_port_status_changed /usr/lib/python3/dist-packages/neutron/notifiers/nova.py:233 2022-05-25 19:13:41.229 125269 DEBUG neutron_lib.callbacks.manager [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Publish callbacks [] for port (0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7), precommit_update _notify_loop /usr/lib/python3/dist-packages/neutron_lib/callbacks/manager.py:176 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Mechanism driver 'ovn' failed in update_port_precommit: neutron_lib.exceptions.InvalidInput: Invalid input for operation: Invalid binding:profile. too many parameters. 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 482, in _call_on_drivers 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 792, in update_port_precommit 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers ovn_utils.validate_and_get_data_from_binding_profile(port) 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/common/ovn/utils.py", line 266, in validate_and_get_data_from_binding_profile 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers raise n_exc.InvalidInput(error_message=msg) 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers neutron_lib.exceptions.InvalidInput: Invalid input for operation: Invalid binding:profile. too many parameters. 2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers ``` Seems like the issue is related to the commit from: https://review.opendev.org/c/openstack/neutron/+/818420 To reproduce: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ovn.html 1. Prepare a setup with SR-IOV adjusted for OVN HW Offload 2. Create a port with switchdev capabilities $ openstack port create direct_overlay2 --vnic-type=direct --network gen_data --binding-profile '{"capabilities":["switchdev"]}' --security- group my_policy 3. Create an instance $ openstack server create --key-name bastion --flavor d1.demo --image ubuntu --port direct_overlay1 vm1 --availability-zone nova:node3.maas ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1975743 Title: ML2 OVN - Creating an instance with hardware offloaded port is broken Status in neutron: New Bug description: OpenStack Release: Yoga Platform: Ubuntu focal Creating an instance with
[Yahoo-eng-team] [Bug 1975732] [NEW] System Reader cannot read system scope resources
Public bug reported: I created a user with project member role and assigned reader role with system_scope:all. ``` $ openstack role assignment list --names --system all --role reader ++---+---+-+++---+ | Role | User | Group | Project | Domain | System | Inherited | ++---+---+-+++---+ | reader | user1@Default | | || all| False | ++---+---+-+++---+ ``` But this user can only list resources in his project. For example, failed to list all servers in the system with the following error. ``` $ openstack server list --all Policy doesn't allow os_compute_api:servers:detail:get_all_tenants to be performed. (HTTP 403) (Request-ID: req-0be7173f-83cc-4917-9735-82e31464da32) ``` In nova api log, I can see `system_scope: None` in policy check. ``` Policy check for os_compute_api:servers:allow_all_filters failed with scope check {'is_admin': False, 'user_id': 'c0f8017926b496459fa91995a502c68c', 'user_domain_id': 'default', 'system_scope': None, 'domain_id': None, 'project_id': '62a1872ed4a9ef9865311576145b3baa', 'project_domain_id': 'default', 'roles': ['reader'], 'is_admin_project': True, 'service_user_id': None, 'service_user_domain_id': None, 'service_project_id': None, 'service_project_domain_id': None, 'service_roles': []} authorize /var/lib/openstack/lib/python3.8/site-packages/nova/policy.py:192 ``` Also failed to get other resources such as service, endpoints, users which requires system scope permission. Seems system scope is not working at all. ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1975732 Title: System Reader cannot read system scope resources Status in OpenStack Identity (keystone): New Bug description: I created a user with project member role and assigned reader role with system_scope:all. ``` $ openstack role assignment list --names --system all --role reader ++---+---+-+++---+ | Role | User | Group | Project | Domain | System | Inherited | ++---+---+-+++---+ | reader | user1@Default | | || all| False | ++---+---+-+++---+ ``` But this user can only list resources in his project. For example, failed to list all servers in the system with the following error. ``` $ openstack server list --all Policy doesn't allow os_compute_api:servers:detail:get_all_tenants to be performed. (HTTP 403) (Request-ID: req-0be7173f-83cc-4917-9735-82e31464da32) ``` In nova api log, I can see `system_scope: None` in policy check. ``` Policy check for os_compute_api:servers:allow_all_filters failed with scope check {'is_admin': False, 'user_id': 'c0f8017926b496459fa91995a502c68c', 'user_domain_id': 'default', 'system_scope': None, 'domain_id': None, 'project_id': '62a1872ed4a9ef9865311576145b3baa', 'project_domain_id': 'default', 'roles': ['reader'], 'is_admin_project': True, 'service_user_id': None, 'service_user_domain_id': None, 'service_project_id': None, 'service_project_domain_id': None, 'service_roles': []} authorize /var/lib/openstack/lib/python3.8/site-packages/nova/policy.py:192 ``` Also failed to get other resources such as service, endpoints, users which requires system scope permission. Seems system scope is not working at all. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1975732/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1975711] Re: tox hangs due to pip backtracking during virtualenv generation
im settign this to opipion for now since this seam to be distro-specific we have confirmed that this can happen on fedora 36 but we have also confirmed that 3.10 work on debian and nixos and we have ci jobs running non voting unit test on 3.10 on ubuntu 22.04 so in general this does not appear to be a nova but it looks like its either a fedroa issue or its related to the tox/pip version that are bing used. we may be able to work around it in nova but I'm not sure we should do that or can do that without change what we test and how we test in an undesireable way. ** Changed in: nova Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1975711 Title: tox hangs due to pip backtracking during virtualenv generation Status in OpenStack Compute (nova): Opinion Bug description: Description === On a fresh checkout of nova, running tox -e pep8 results in the process maxing out a CPU core and seemingly getting stuck (I terminated it after 30 minutes of no progress). I believe this is due to pip trying to find a set of packages that exactly satisfy cross-requirements of all dependencies, checking multiple progressively older versions of each package until the tree becomes too complex to handle at all. Steps to reproduce == * Make a fresh checkout of nova, a shallow one works since we only need master: git clone --depth 1 https://opendev.org/openstack/nova.git nova This makes sure the tox virtualenv from an existing checkout isn't reused. * From within the repo, run tox pep8 with verbosity to see pip output: $ tox -vvv -e pep8 Expected result === Tox successfully sets up its virtualenv and runs pep8. Actual result = pip downloads several versions of packages, outputting a large amount of messages like these for a few packages along the way: INFO: pip is looking at multiple versions of certifi to determine which version is compatible with other requirements. This could take a while. Downloading certifi-2020.4.5-py2.py3-none-any.whl (156 kB) || 156 kB 81.6 MB/s Downloading certifi-2019.11.28-py2.py3-none-any.whl (156 kB) || 156 kB 86.8 MB/s Downloading certifi-2019.9.11-py2.py3-none-any.whl (154 kB) || 154 kB 79.5 MB/s Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB) || 157 kB 71.6 MB/s Downloading certifi-2019.3.9-py2.py3-none-any.whl (158 kB) || 158 kB 84.7 MB/s INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking Eventually it seems to get completely stuck after one of those downloads, maxing out a CPU core and seemingly making no more progress until terminated. Environment === This happens in dev environments no in Openstack deployments. We've reproduced it on Fedora 35 and 36, I would expect others to be similarly impacted. Some system python env info: $ python -V Python 3.10.4 $ pip show pip Name: pip Version: 21.3.1 $ pip show tox Name: tox Version: 3.24.5 Logs & Configs == Reproduced on a fresh checkout with no altered configs. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1975711/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1975711] [NEW] tox hangs due to pip backtracking during virtualenv generation
Public bug reported: Description === On a fresh checkout of nova, running tox -e pep8 results in the process maxing out a CPU core and seemingly getting stuck (I terminated it after 30 minutes of no progress). I believe this is due to pip trying to find a set of packages that exactly satisfy cross-requirements of all dependencies, checking multiple progressively older versions of each package until the tree becomes too complex to handle at all. Steps to reproduce == * Make a fresh checkout of nova, a shallow one works since we only need master: git clone --depth 1 https://opendev.org/openstack/nova.git nova This makes sure the tox virtualenv from an existing checkout isn't reused. * From within the repo, run tox pep8 with verbosity to see pip output: $ tox -vvv -e pep8 Expected result === Tox successfully sets up its virtualenv and runs pep8. Actual result = pip downloads several versions of packages, outputting a large amount of messages like these for a few packages along the way: INFO: pip is looking at multiple versions of certifi to determine which version is compatible with other requirements. This could take a while. Downloading certifi-2020.4.5-py2.py3-none-any.whl (156 kB) || 156 kB 81.6 MB/s Downloading certifi-2019.11.28-py2.py3-none-any.whl (156 kB) || 156 kB 86.8 MB/s Downloading certifi-2019.9.11-py2.py3-none-any.whl (154 kB) || 154 kB 79.5 MB/s Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB) || 157 kB 71.6 MB/s Downloading certifi-2019.3.9-py2.py3-none-any.whl (158 kB) || 158 kB 84.7 MB/s INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking Eventually it seems to get completely stuck after one of those downloads, maxing out a CPU core and seemingly making no more progress until terminated. Environment === This happens in dev environments no in Openstack deployments. We've reproduced it on Fedora 35 and 36, I would expect others to be similarly impacted. Some system python env info: $ python -V Python 3.10.4 $ pip show pip Name: pip Version: 21.3.1 $ pip show tox Name: tox Version: 3.24.5 Logs & Configs == Reproduced on a fresh checkout with no altered configs. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1975711 Title: tox hangs due to pip backtracking during virtualenv generation Status in OpenStack Compute (nova): New Bug description: Description === On a fresh checkout of nova, running tox -e pep8 results in the process maxing out a CPU core and seemingly getting stuck (I terminated it after 30 minutes of no progress). I believe this is due to pip trying to find a set of packages that exactly satisfy cross-requirements of all dependencies, checking multiple progressively older versions of each package until the tree becomes too complex to handle at all. Steps to reproduce == * Make a fresh checkout of nova, a shallow one works since we only need master: git clone --depth 1 https://opendev.org/openstack/nova.git nova This makes sure the tox virtualenv from an existing checkout isn't reused. * From within the repo, run tox pep8 with verbosity to see pip output: $ tox -vvv -e pep8 Expected result === Tox successfully sets up its virtualenv and runs pep8. Actual result = pip downloads several versions of packages, outputting a large amount of messages like these for a few packages along the way: INFO: pip is looking at multiple versions of certifi to determine which version is compatible with other requirements. This could take a while. Downloading certifi-2020.4.5-py2.py3-none-any.whl (156 kB) || 156 kB 81.6 MB/s Downloading certifi-2019.11.28-py2.py3-none-any.whl (156 kB) || 156 kB 86.8 MB/s Downloading certifi-2019.9.11-py2.py3-none-any.whl (154 kB) || 154 kB 79.5 MB/s Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB) || 157 kB 71.6 MB/s Downloading certifi-2019.3.9-py2.py3-none-any.whl (158 kB) || 158 kB 84.7 MB/s INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints
[Yahoo-eng-team] [Bug 1975609] Re: requirements-check job failing on PrettyTable
Fixed in openstack/requirements by https://review.opendev.org/c/openstack/requirements/+/843191 ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1975609 Title: requirements-check job failing on PrettyTable Status in Glance: Fix Released Bug description: I've got a patch up that updates requirements.txt, and I'm getting a requirements-check job failure (not due to the requirement I'm actually changing on the patch). Patch is: https://review.opendev.org/c/openstack/glance/+/841135 Failure is: WARNING: possible mismatch found for package "PrettyTable" Attribute "package" does not match "PrettyTable" does not match "prettytable" Requirement(package='PrettyTable', location='', specifiers='>=0.7.1', markers='', comment='# BSD', extras=frozenset()) Requirement(package='prettytable', location='', specifiers='', markers='', comment='# BSD', extras=frozenset()) ERROR: Package 'prettytable' requirement does not match number of lines (2) in openstack/requirements Both 'PrettyTable' and 'prettytable' are in global-requirements, but it's only 'prettytable' in upper-constraints. My guesses are: - maybe it needs to be all lowercase? (doubt it, but you never know) - maybe the version we're specifying as a minimum is so old it can't run in python 3.8? To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1975609/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1975692] [NEW] OVN migration failed due to unhandled error in neutron_ovn_db_sync_util
Public bug reported: While performing ovs2ovn migration the following exception occurred, from neutron/neutron-ovn-db-sync-util.log.1 on controller-0 Log: https://paste.opendev.org/show/b4OJEldZ3IBAjAJ1xOAd/ This bug is related to: - https://bugs.launchpad.net/neutron/+bug/1939704 - https://bugs.launchpad.net/neutron/+bug/1964640 Bugzilla reference: (OSP16.2) https://bugzilla.redhat.com/show_bug.cgi?id=2087721 ** Affects: neutron Importance: Medium Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) ** Changed in: neutron Importance: Undecided => Medium ** Changed in: neutron Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1975692 Title: OVN migration failed due to unhandled error in neutron_ovn_db_sync_util Status in neutron: In Progress Bug description: While performing ovs2ovn migration the following exception occurred, from neutron/neutron-ovn-db-sync-util.log.1 on controller-0 Log: https://paste.opendev.org/show/b4OJEldZ3IBAjAJ1xOAd/ This bug is related to: - https://bugs.launchpad.net/neutron/+bug/1939704 - https://bugs.launchpad.net/neutron/+bug/1964640 Bugzilla reference: (OSP16.2) https://bugzilla.redhat.com/show_bug.cgi?id=2087721 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1975692/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1947127] Fix included in openstack/neutron 19.3.0
This issue was fixed in the openstack/neutron 19.3.0 release. ** Changed in: cloud-archive/xena Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1947127 Title: [SRU] Some DNS extensions not working with OVN Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive xena series: Fix Released Status in Ubuntu Cloud Archive yoga series: New Status in neutron: Fix Released Status in neutron package in Ubuntu: New Status in neutron source package in Impish: New Status in neutron source package in Jammy: New Status in neutron source package in Kinetic: New Bug description: [Impact] On a fresh devstack install with the q-dns service enable from the neutron devstack plugin, some features still don't work, e.g.: $ openstack subnet set private-subnet --dns-publish-fixed-ip BadRequestException: 400: Client Error for url: https://10.250.8.102:9696/v2.0/subnets/9f50c79e-6396-4c5b-be92-f64aa0f25beb, Unrecognized attribute(s) 'dns_publish_fixed_ip' $ openstack port create p1 --network private --dns-name p1 --dns-domain a.b. BadRequestException: 400: Client Error for url: https://10.250.8.102:9696/v2.0/ports, Unrecognized attribute(s) 'dns_domain' The reason seems to be that https://review.opendev.org/c/openstack/neutron/+/686343/31/neutron/common/ovn/extensions.py only added dns_domain_keywords, but not e.g. dns_domain_ports as supported by OVN [Test Case] Create a normal OpenStack neutron test environment to see if we can successfully run the following commands: openstack subnet set private_subnet --dns-publish-fixed-ip openstack port create p1 --network private --dns-name p1 --dns-domain a.b. [Regression Potential] The fix has merged into the upstream stable/xena branch [1], here's just SRU into the 19.1.0 branch of UCA xena (the fix is already in 20.0.0 so it's already in jammy and kinetic and focal-yoga), so it is a clean backport and might be helpful for deployments migrating to OVN. [1] https://review.opendev.org/c/openstack/neutron/+/838650 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1947127/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1975686] [NEW] MEM_ENCRYPTION_CONTEXT trait is missing from the compute RP even if AMD SEV is enabled on the compute node
Public bug reported: Compute nodes with amd-sev enabled are reporting that support is available but MEM_ENCRYPTION_CONTEXT is not present in the placement traits for the compute nodes. # Domain capabilites report support [heat-admin@computeamdsev-1 log]$ sudo podman exec -it -u root nova_virtqemud virsh domcapabilities | grep -A 12 features 47 1 509 0 # It is active as well in /sys/module/kvm_amd [heat-admin@computeamdsev-1 log]$ cat /sys/module/kvm_amd/parameters/sev Y [heat-admin@computeamdsev-1 log]$ # I do not see any errors with sev during startup [heat-admin@computeamdsev-1 log]$ sudo dmesg | grep -i sev [0.00] Command line: BOOT_IMAGE=(lvmid/nZkWaZ-f6bk-Bfto-h9OG-k1Sc-Y6RB-1Q3yZV/t77pr1-3H2Y-ml4l-MMJh-bp3H-zk2j-6z4W6w)/boot/vmlinuz-5.14.0-70.5.1.el9_0.x86_64 root=LABEL=img-rootfs ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M hugepagesz=1GB hugepages=32 default_hugepagesz=1GB mem_encrypt=on kvm_amd.sev=1 console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset vga=normal console=tty0 console=ttyS0,115200 audit=1 nousb [0.00] Kernel command line: BOOT_IMAGE=(lvmid/nZkWaZ-f6bk-Bfto-h9OG-k1Sc-Y6RB-1Q3yZV/t77pr1-3H2Y-ml4l-MMJh-bp3H-zk2j-6z4W6w)/boot/vmlinuz-5.14.0-70.5.1.el9_0.x86_64 root=LABEL=img-rootfs ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M hugepagesz=1GB hugepages=32 default_hugepagesz=1GB mem_encrypt=on kvm_amd.sev=1 console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset vga=normal console=tty0 console=ttyS0,115200 audit=1 nousb [0.00] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly [ 101.753478] ccp :24:00.1: sev enabled [ 101.769894] ccp :24:00.1: SEV firmware update successful [ 102.058746] ccp :24:00.1: SEV API:0.24 build:14 [ 120.398153] systemd[1]: Hostname set to . [ 149.487548] SEV supported: 509 ASIDs # MEM_ENCRYPTION_CONTEXT is not present (overcloud) [stack@undercloud-0 ~]$ !21 openstack --os-placement-api-version 1.17 resource provider trait list ba3bccf9-c283-4cb5-a14d-35ae7ba88533 /usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior. warnings.warn('One or more Python packages bundled by this ansible-core distribution were already ' +---+ | name | +---+ | COMPUTE_GRAPHICS_MODEL_NONE | | COMPUTE_ACCELERATORS | | COMPUTE_NET_VIF_MODEL_VMXNET3 | | COMPUTE_STORAGE_BUS_VIRTIO| | COMPUTE_NET_VIF_MODEL_E1000E | | COMPUTE_VOLUME_ATTACH_WITH_TAG| | COMPUTE_NET_ATTACH_INTERFACE | | HW_CPU_X86_BMI2 | | COMPUTE_VOLUME_EXTEND | | HW_CPU_X86_SSE| | COMPUTE_NET_VIF_MODEL_RTL8139 | | COMPUTE_GRAPHICS_MODEL_VIRTIO | | COMPUTE_IMAGE_TYPE_RAW| | COMPUTE_TRUSTED_CERTS | | HW_CPU_X86_SSE42 | | HW_CPU_X86_SSSE3 | | HW_CPU_X86_SSE2 | | COMPUTE_STORAGE_BUS_IDE | | COMPUTE_SECURITY_UEFI_SECURE_BOOT | | COMPUTE_SOCKET_PCI_NUMA_AFFINITY | | COMPUTE_IMAGE_TYPE_AMI| | COMPUTE_GRAPHICS_MODEL_CIRRUS | | COMPUTE_VOLUME_MULTI_ATTACH | | HW_CPU_X86_SSE4A | | HW_CPU_X86_SSE41 | | COMPUTE_IMAGE_TYPE_QCOW2 | | COMPUTE_IMAGE_TYPE_AKI| | HW_CPU_X86_AVX2 | | HW_CPU_X86_FMA3 | | HW_CPU_X86_MMX| | HW_CPU_HYPERTHREADING | | COMPUTE_NET_VIF_MODEL_NE2K_PCI| | HW_CPU_X86_SVM| | HW_CPU_X86_AVX| | COMPUTE_IMAGE_TYPE_ISO| | HW_CPU_X86_CLMUL | | HW_CPU_X86_ABM| | COMPUTE_NET_VIF_MODEL_SPAPR_VLAN | | COMPUTE_STORAGE_BUS_SCSI | | HW_CPU_X86_AMD_SVM| | COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG | | COMPUTE_STORAGE_BUS_FDC | | COMPUTE_NET_VIF_MODEL_VIRTIO | | COMPUTE_NET_VIF_MODEL_PCNET | | COMPUTE_STORAGE_BUS_SATA | | HW_CPU_X86_F16C | | COMPUTE_NET_VIF_MODEL_E1000 | | COMPUTE_DEVICE_TAGGING| | COMPUTE_NODE | | COMPUTE_GRAPHICS_MODEL_VGA| | COMPUTE_IMAGE_TYPE_ARI| | HW_CPU_X86_SHA| | HW_CPU_X86_AESNI | | COMPUTE_RESCUE_BFV| | COMPUTE_STO
[Yahoo-eng-team] [Bug 1943631] Re: Neutron with OVN fails to bind port if hostname has dots
I could re-create the same with the latest devstack and long hostname stack@myci:~/devstack$ hostname myci.home.org stack@myci:~/devstack$ sudo ovs-vsctl list open_vswitch _uuid : 3859f81a-2f37-456c-b3da-bb068f30310f bridges : [06b3fc03-5783-4401-be39-2562836f2058, 3e1363b6-fb78-4160-a41e-9c47441ca481] cur_cfg : 2 datapath_types : [netdev, system] datapaths : {} db_version : "8.2.0" dpdk_initialized: false dpdk_version: none external_ids: {hostname=myci, ovn-bridge=br-int, ovn-bridge-mappings="public:br-ex", ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="192.168.122.20", ovn-encap-type=geneve, ovn-remote="tcp:192.168.122.20:6642", rundir="/var/run/openvswitch", system-id="cd754343-5266-4d01-8328-f462916a2a2c"} iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan] manager_options : [a92f4ff7-f42f-4f4c-962a-647c82287cf1] next_cfg: 2 other_config: {} ovs_version : "2.13.5" ssl : [] statistics : {} system_type : ubuntu system_version : "20.04" A fix for this config: stack@myci:~/devstack$ sudo ovs-vsctl set open_vswitch . external- ids:hostname='myci.home.org' After that everything works. So this is rather devstack bug ** Changed in: neutron Status: Expired => Confirmed ** Project changed: neutron => devstack -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1943631 Title: Neutron with OVN fails to bind port if hostname has dots Status in devstack: Confirmed Bug description: If the hostname has dots, as in for example "devstack.localdomain", when trying to create an instance, Neutron will fail with "Failed to bind port". Reproduced with the latest DevStack from master on top of Ubuntu Server 20.04.3 LTS with latest packages installed. Minimal installation, no additional packages installed (only removed python3-simplejson and python3-pyasn1-modules due to recent issues with those packages[1]). I find this weird, because afaik TripleO uses FQDNs, so in theory Neutron with OVN should break their CI (although I'm not sure if they use OVN or Open vSwitch). I'm still not sure if this is on my end or not, but I was able to reproduce this consistently, trying different hostnames, trying the most minimal local.conf possible (setting only passwords), so I decided to report it as a bug. Steps to reproduce: 1. Set the system's hostname to something with at least one dot 2. Deploy DevStack 3. Create instance on any network 4. Inspect devstack@q-svc.service to see the error Expected output: Instance is created successfully Actual output: Instance enters error state and devstack@q-svc.service reports this: https://paste.openstack.org/show/809315/ Version: DevStack from master Ubuntu Server 20.04.3 LTS OVN mechanism driver Environment (local.conf): https://paste.openstack.org/show/809316/ Perceived severity: low [1] https://bugs.launchpad.net/devstack/+bug/1871485 To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1943631/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1975674] [NEW] Neutron agent blocks during VM deletion when a remote security group is involved
Public bug reported: When deleting a VM that has a security group referring to a remote security group, the neutron agent will block for as long as it takes to remove the respective flows. This happens when the remote security group contains many (thousands) ports referring to other VMs. Steps to reproduce: - Create a VM with security group A - Add a rule to security group A allowing access from a remote security group B - Add a large number or ports to security group B (e.g. 2000) - The respective ovs flows will be added - Delete the VM - The ovs flows will be removed Expected: - VM and flow to be deleted within seconds - No impact to other VMs on the same hypervisor Actual: - Flow deletion takes a long time, sometimes up to 10 minutes - While flows are being deleted, no VMs can be created on the same hypervisor The reason for this behavior is that under the hood the agent calls ovs- ofctl (via execve()) once for each port in the remote security group. These calls quickly add up to minutes if there are many ports. The proposed solution would be to use deferred execution for the flow deletion. In that case it becomes a bulk operation and around 400 flows are deleted in one call. In addition it runs in the background and does not block the agent for other operations. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1975674 Title: Neutron agent blocks during VM deletion when a remote security group is involved Status in neutron: New Bug description: When deleting a VM that has a security group referring to a remote security group, the neutron agent will block for as long as it takes to remove the respective flows. This happens when the remote security group contains many (thousands) ports referring to other VMs. Steps to reproduce: - Create a VM with security group A - Add a rule to security group A allowing access from a remote security group B - Add a large number or ports to security group B (e.g. 2000) - The respective ovs flows will be added - Delete the VM - The ovs flows will be removed Expected: - VM and flow to be deleted within seconds - No impact to other VMs on the same hypervisor Actual: - Flow deletion takes a long time, sometimes up to 10 minutes - While flows are being deleted, no VMs can be created on the same hypervisor The reason for this behavior is that under the hood the agent calls ovs-ofctl (via execve()) once for each port in the remote security group. These calls quickly add up to minutes if there are many ports. The proposed solution would be to use deferred execution for the flow deletion. In that case it becomes a bulk operation and around 400 flows are deleted in one call. In addition it runs in the background and does not block the agent for other operations. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1975674/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp