[Yahoo-eng-team] [Bug 1965183] Re: ovn migration executes scripts from /tmp directory
Reviewed: https://review.opendev.org/c/openstack/neutron/+/834071 Committed: https://opendev.org/openstack/neutron/commit/0529ccdf71dcd093a80180097eeaa5d7cb5e15fb Submitter: "Zuul (22348)" Branch:master commit 0529ccdf71dcd093a80180097eeaa5d7cb5e15fb Author: Jakub Libosvar Date: Wed Mar 16 16:40:21 2022 -0400 ovn migration: Don't use executables in /tmp/ It's a common practice to have /tmp/ mounted separately with noexec option. This effectively means no scripts can be executed from the filesystem mounted to /tmp. This patch explicitly calls sh binary to execute scripts from /tmp and removes the executable flag from the scripts. Closes-Bug: #1965183 Change-Id: I2f9cd67979a8a75848fcdd7a8c3bb56dd3590473 Signed-off-by: Jakub Libosvar ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1965183 Title: ovn migration executes scripts from /tmp directory Status in neutron: Fix Released Bug description: Description of problem: The /tmp are often mounted with noexec option for security reasons. The migration roles rely that scripts in /tmp/ can be executed. Version-Release number of selected component (if applicable): 16.1 How reproducible: Always Steps to Reproduce: 1. Have /tmp mounted with noexec option 2. Run migration from ovs to ovn 3. Actual results: fatal: [tpa-vim-b-computecl-0]: FAILED! => { "changed": true, "cmd": "/tmp/clone-br-int.sh", "delta": "0:00:00.001773", "end": "2022-03-16 18:51:30.332449", "invocation": { "module_args": { "_raw_params": "/tmp/clone-br-int.sh", "_uses_shell": true, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": true } }, "msg": "non-zero return code", "rc": 126, "start": "2022-03-16 18:51:30.330676", "stderr": "/bin/sh: /tmp/clone-br-int.sh: Permission denied", "stderr_lines": [ "/bin/sh: /tmp/clone-br-int.sh: Permission denied" ], "stdout": "", "stdout_lines": [] } To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1965183/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1959567] Re: QoS Ingress bandwidth limit with OVS backend may not work as expected
Reviewed: https://review.opendev.org/c/openstack/neutron/+/832662 Committed: https://opendev.org/openstack/neutron/commit/f7ab90baad83823cfb94bc8b9450cd915ea49c03 Submitter: "Zuul (22348)" Branch:master commit f7ab90baad83823cfb94bc8b9450cd915ea49c03 Author: Slawek Kaplonski Date: Tue Mar 8 16:44:59 2022 +0100 Fix ingress bandwidth limit in the openvswitch agent For ingress bandwidth limiting openvswitch agent is using QoS and queues from the Open vSwitch. There is always queue 0 used for that purpose. Initially, when this feature was implemented, we assumed that queue 0 is kind of the "default" queue to which all traffic will be send if there are no other queues. But that's not true thus ingress bandwidth limiting wasn't working properly with this agent. This patch fixes that issue but adding in the table=0 of the br-int additional OF rule to send all traffic to the queue 0. In this queue for some ports there can be QoS configured and then it will be applied for the port. If port don't have any QoS configured, nothing will happen and all will work like before this patch. Biggest problem with that solution was the case when also ports with minimum bandwidth are on the same node becuase such ports are using different queues (queue number is the same as ofport number of the tap interface). In case when traffic is going from the port with minimum bandwidth QoS to the port which has ingress bw limit configured, traffic is going only through br-int and will use queue 0 to apply ingress bw limit properly. In case when traffic from port with minimum bandwidth set needs to go out from the host, it will always use physical bridge (minimum bandwidth is only supported for the provider networks) and proper queue will be set for such traffic in the physical bridge. To be able to set proper queue in the physical bridge, this patch adds additional OF rule to the br-int to set queue_num value in the pkt_mark field [1] as this seems to be only field which can "survive" passing bridges. [1] https://man7.org/linux/man-pages/man7/ovs-fields.7.html Closes-Bug: #1959567 Change-Id: I1e31565475f38c6ad817268699b165759ac05410 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1959567 Title: QoS Ingress bandwidth limit with OVS backend may not work as expected Status in neutron: Fix Released Bug description: According to the OVS faq https://docs.openvswitch.org/en/latest/faq/qos/ Q: I configured Quality of Service (QoS) in my OpenFlow network by adding records to the QoS and Queue table, but the results aren’t what I expect. A: Did you install OpenFlow flows that use your queues? This is the primary way to tell Open vSwitch which queues you want to use. If you don’t do this, then the default queue will be used, which will probably not have the effect you want. According to info from the OVS developer, Ilya Maximets "OVS doesn't define what the "default queue" is. [...] So, using the set_queue action is a correct way to configure QoS, even if the queue 0 is currently a "default queue". It's not guaranteed that it always will be." Because of that Neutron OVS agent should configure correct OF rules to send traffic to the required QoS queue always. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1959567/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1960230] Re: resize fails with FileExistsError if earlier resize attempt failed to cleanup
** Also affects: nova/xena Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1960230 Title: resize fails with FileExistsError if earlier resize attempt failed to cleanup Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) xena series: New Bug description: This bug is related to resize with the libvirt driver If you are performing a resize and it fails the _cleanup_remote_migration() [1] function in the libvirt driver will try to cleanup the /var/lib/nova/instances/_resize directory on the remote side [2] - if this fails the _resize directory will be left behind and block any future resize attempts. 2021-12-14 14:40:12.535 175177 INFO nova.virt.libvirt.driver [req-9d3477d4-3bb2-456f-9be6-dce9893b0e95 23d6aa8884ab44ef9f214ad195d273c0 050c556faa5944a8953126c867313770 - default default] [instance: 99287438-c37b-44b0-834e-55685b6e83eb] Deletion of /var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize failed Then on next resize attempt a long time later 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10429, in migrate_disk_and_power_off 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server os.rename(inst_base, inst_base_resize) 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server FileExistsError: [Errno 17] File exists: '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb' -> '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize' This is happens here [3] because os.rename tries to rename the /var/lib/nova/instances/ dir to _resize that already exists and fails with FileExistsError. We should check if the directory exists before trying to rename and delete it before. [1] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10773 [2] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10965 [3] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10915 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1960230/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1960346] Re: Volume detach failure in devstack-platform-centos-9-stream job
Closing it for nova as it is fixed on the tempest side. ** Changed in: nova Status: Triaged => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1960346 Title: Volume detach failure in devstack-platform-centos-9-stream job Status in OpenStack Compute (nova): Invalid Status in tempest: Fix Released Bug description: devstack-platform-centos-9-stream job is failing 100% with the compute server rescue test with volume detach error: traceback-1: {{{ Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/waiters.py", line 316, in wait_for_volume_resource_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to reach available status (current in-use) within the required time (196 s). }}} Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/waiters.py", line 384, in wait_for_volume_attachment_remove_from_server raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: Volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to detach from server cf57d12b-5e37-431e-8c71-4a7149e963ae within the required time (196 s) from the compute API perspective https://a886e0e70a23f464643f-7cd608bf14cafb686390b86bc06cde2a.ssl.cf1.rackcdn.com/827576/6/check/devstack- platform-centos-9-stream/53de74e/testr_results.html https://zuul.openstack.org/builds?job_name=devstack-platform-centos-9-stream&skip=0 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1960346/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1960346] Re: Volume detach failure in devstack-platform-centos-9-stream job
Reviewed: https://review.opendev.org/c/openstack/tempest/+/834350 Committed: https://opendev.org/openstack/tempest/commit/7304e3ac8973a42bcfff91d561ac9d238b187334 Submitter: "Zuul (22348)" Branch:master commit 7304e3ac8973a42bcfff91d561ac9d238b187334 Author: Ghanshyam Mann Date: Fri Mar 18 13:58:25 2022 -0500 Move ServerStableDeviceRescueTest to wait for SSH-able server ServerStableDeviceRescueTest also perform the attach_volume on rescue server and in cleanup detach_volume. As described in the bug#1960346 we need to wait for server readiness before detach volume called. Also making centos stream 9 job as voting. Closes-Bug: #1960346 Change-Id: Ia213297b13f42d39213dea9a3b2cfee561cdcf28 ** Changed in: tempest Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1960346 Title: Volume detach failure in devstack-platform-centos-9-stream job Status in OpenStack Compute (nova): Triaged Status in tempest: Fix Released Bug description: devstack-platform-centos-9-stream job is failing 100% with the compute server rescue test with volume detach error: traceback-1: {{{ Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/waiters.py", line 316, in wait_for_volume_resource_status raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to reach available status (current in-use) within the required time (196 s). }}} Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/waiters.py", line 384, in wait_for_volume_attachment_remove_from_server raise lib_exc.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: Volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to detach from server cf57d12b-5e37-431e-8c71-4a7149e963ae within the required time (196 s) from the compute API perspective https://a886e0e70a23f464643f-7cd608bf14cafb686390b86bc06cde2a.ssl.cf1.rackcdn.com/827576/6/check/devstack- platform-centos-9-stream/53de74e/testr_results.html https://zuul.openstack.org/builds?job_name=devstack-platform-centos-9-stream&skip=0 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1960346/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1965819] [NEW] list object has no attribute 'acked'
Public bug reported: Using python-ovs master, there are errors such as: ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn [None req-81ffbcd9-59d1-498a-aea2-d57e0d515ff2 None None] OVS database connection to OVN_Southbound failed with error: ''list' object has no attribute 'acked''. Verify that the OVS and OVN services are available and that the 'ovn_nb_connection' and 'ovn_sb_connection' configuration options are correct.: AttributeError: 'list' object has no attribute 'acked' ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn Traceback (most recent call last): ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 127, in start_connection ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn self.ovsdb_connection.start() ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/usr/local/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 83, in start ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn idlutils.wait_for_change(self.idl, self.timeout) ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 53, in wait_for_change ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn while idl_.change_seqno == seqno and not idl_.run(): ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/ovs/python/ovs/db/idl.py", line 506, in run ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn self.__send_monitor_request() ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/ovs/python/ovs/db/idl.py", line 814, in __send_monitor_request ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn not ConditionState.is_true(table.condition.acked)): ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn AttributeError: 'list' object has no attribute 'acked' ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn ovsdbapp, neutron, and networking-ovn all directly set 'table.condition' which is kinda-sorta not really public API. The type of this variable changed with https://github.com/openvswitch/ovs/commit/46d44cf3be0dbf4a44cebea3b279b3d16a326796 and there has been some breakage. ** Affects: networking-ovn Importance: Undecided Status: New ** Affects: neutron Importance: Undecided Status: New ** Affects: ovsdbapp Importance: Undecided Status: New ** Also affects: neutron Importance: Undecided Status: New ** Also affects: networking-ovn Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1965819 Title: list object has no attribute 'acked' Status in networking-ovn: New Status in neutron: New Status in ovsdbapp: New Bug description: Using python-ovs master, there are errors such as: ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn [None req-81ffbcd9-59d1-498a-aea2-d57e0d515ff2 None None] OVS database connection to OVN_Southbound failed with error: ''list' object has no attribute 'acked''. Verify that the OVS and OVN services are available and that the 'ovn_nb_connection' and 'ovn_sb_connection' configuration options are correct.: AttributeError: 'list' object has no attribute 'acked' ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn Traceback (most recent call last): ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 127, in start_connection ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn self.ovsdb_connection.start() ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/usr/local/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 83, in start ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn idlutils.wait_for_change(self.idl, self.timeout) ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 53, in wait_for_change ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn while idl_.change_seqno == seqno and not idl_.run(): ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn File "/opt/stack/ovs/python/ovs/db/idl.py", line 506, in run ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn self.__send_monitor_request() ERROR neutron.plugins.ml2.drive
[Yahoo-eng-team] [Bug 1965807] [NEW] Bulk port creation breaks IPAM module when driver port creation fails
Public bug reported: This bug is related to https://bugs.launchpad.net/neutron/+bug/1954763. There is a problem with the IPAM module, [1] and ML2 OVN driver. When a port is created in "create_port_bulk", we first create the IPAM allocations [1], to save time and DB accesses. However, if one port fails in the driver call, the port is deleted. In this port deletion, the IPAM reservation is deleted too. At the end of "create_port_bulk", the previously created IPAM reservations are deleted too (again). This is what is triggering the error in the Neutron server [2]. The DB now has a port register, a IP allocation register but not an IPAM allocation register. That prevents from deleting manually this port (error 500). Red Hat bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=2065634 [1]https://review.opendev.org/q/I8877c658446fed155130add6f1c69f2772113c27 [2]https://paste.opendev.org/show/b6e3INfzhPTkOFW5fhPb/ ** Affects: neutron Importance: Medium Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: New ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) ** Changed in: neutron Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1965807 Title: Bulk port creation breaks IPAM module when driver port creation fails Status in neutron: New Bug description: This bug is related to https://bugs.launchpad.net/neutron/+bug/1954763. There is a problem with the IPAM module, [1] and ML2 OVN driver. When a port is created in "create_port_bulk", we first create the IPAM allocations [1], to save time and DB accesses. However, if one port fails in the driver call, the port is deleted. In this port deletion, the IPAM reservation is deleted too. At the end of "create_port_bulk", the previously created IPAM reservations are deleted too (again). This is what is triggering the error in the Neutron server [2]. The DB now has a port register, a IP allocation register but not an IPAM allocation register. That prevents from deleting manually this port (error 500). Red Hat bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=2065634 [1]https://review.opendev.org/q/I8877c658446fed155130add6f1c69f2772113c27 [2]https://paste.opendev.org/show/b6e3INfzhPTkOFW5fhPb/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1965807/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1965772] [NEW] ovn-octavia-provider does not report status correctly to octavia
Public bug reported: Hi all, The OVN Octavia provider does not report status correctly to Octavia due to a few bugs in the health monitoring implementation: 1) https://opendev.org/openstack/ovn-octavia-provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2374 : In _get_lb_on_hm_event, the request to the OVN NB API (db_find_rows) is incorrect: lbs = self.ovn_nbdb_api.db_find_rows( 'Load_Balancer', (('ip_port_mappings', '=', mappings), ('protocol', '=', row.protocol))).execute() Should be : lbs = self.ovn_nbdb_api.db_find_rows( 'Load_Balancer', ('ip_port_mappings', '=', mappings), ('protocol', '=', row.protocol[0])).execute() Note the removed extra parenthesis and the protocol string which is found in the first element of the protocol[] list. 2) https://opendev.org/openstack/ovn-octavia- provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2426 : There is a confusion with the Pool object returned by (pool = self._octavia_driver_lib.get_pool(pool_id)) : this object does not contain any operating_status attribute and it seems given the current state of the octavia-lib that it possible to set and update the status for a listener/pool/member but not possible to retrieve the current status. See https://opendev.org/openstack/octavia- lib/src/branch/master/octavia_lib/api/drivers/data_models.py for the current Pool data model. As a result, the computation done by _get_new_operating_statuses cannot use the current operating status to set a new operating status. It is still possible to set an operating status for the members by setting them to "OFFLINE" separately when a HM update event is fired. 3) The Load_Balancer_Health_Check NB entry creates the Service_Monitor SB entries, but there isn't any way to link the Service_Monitor entries created with the original NB entry. The result is that health monitor events received from the SB and processed by the octavia driver agent cannot be accurately matched with the correct octavia health monitor entry. If we have for example two load balancer entries using the same pool members and the same ports, only the first LB returned with db_find_rows would be updated (given the #2 bug is fixed). The case for having 2 load balancers with the same members is perfectly valid when using separate load balancers for public traffic (using a VIP from a public pool) and another one for internal/admin traffic (using a VIP from another pool, and with a source range whitelist). The code selecting only the first LB in that case is the same as bug #1. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1965772 Title: ovn-octavia-provider does not report status correctly to octavia Status in neutron: New Bug description: Hi all, The OVN Octavia provider does not report status correctly to Octavia due to a few bugs in the health monitoring implementation: 1) https://opendev.org/openstack/ovn-octavia-provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2374 : In _get_lb_on_hm_event, the request to the OVN NB API (db_find_rows) is incorrect: lbs = self.ovn_nbdb_api.db_find_rows( 'Load_Balancer', (('ip_port_mappings', '=', mappings), ('protocol', '=', row.protocol))).execute() Should be : lbs = self.ovn_nbdb_api.db_find_rows( 'Load_Balancer', ('ip_port_mappings', '=', mappings), ('protocol', '=', row.protocol[0])).execute() Note the removed extra parenthesis and the protocol string which is found in the first element of the protocol[] list. 2) https://opendev.org/openstack/ovn-octavia- provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2426 : There is a confusion with the Pool object returned by (pool = self._octavia_driver_lib.get_pool(pool_id)) : this object does not contain any operating_status attribute and it seems given the current state of the octavia-lib that it possible to set and update the status for a listener/pool/member but not possible to retrieve the current status. See https://opendev.org/openstack/octavia- lib/src/branch/master/octavia_lib/api/drivers/data_models.py for the current Pool data model. As a result, the computation done by _get_new_operating_statuses cannot use the current operating status to set a new operating status. It is still possible to set an operating status for the members by setting them to "OFFLINE" separately when a HM update event is fired. 3) The Load_Balancer_Health_Check NB entry creates the Service_Monitor SB entries, but there is
[Yahoo-eng-team] [Bug 1965530] Re: ovn-octavia-provider health monitoring fails because doesn't set the correct source IP address for OVN health check
Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/834345 Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/90bd5dc71ecd4891fcd9292d56726f034a59fbdf Submitter: "Zuul (22348)" Branch:master commit 90bd5dc71ecd4891fcd9292d56726f034a59fbdf Author: Miguel Lavalle Date: Fri Mar 18 13:03:22 2022 -0500 Remove incorrect character in f-string Patch [1] switched to f-strings, unwillingly introducing a 'i' to an ip address. This change corrects that error [1] https://review.opendev.org/c/openstack/ovn-octavia-provider/+/816829 Closes-Bug: #1965530 Change-Id: I832780660a3658552dc8fb50c3e232306e1fa110 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1965530 Title: ovn-octavia-provider health monitoring fails because doesn't set the correct source IP address for OVN health check Status in neutron: Fix Released Bug description: Hello all, IT seems the health checking feature of ovn octavia provider is broken and all service monitors show offline status because of a typo in https://opendev.org/openstack/ovn-octavia- provider/src/commit/1b212ec54fa336cc7a22a502ad3d0ce39658138e/ovn_octavia_provider/helper.py#L2171 Note the "i" prepended in the IP address in member_src += f'i{hm_source_ip}' This causes the IP string to be casted including the "i" which give a random IP address for health check source, which in turn cause the response packet of the checked backend to be sent to the gateway MAC instead of the health check monitor source MAC defined in NB_Global options:svc_monitor_mac. The monitor does not get the check response and fails. The very simple fix is to remove this "i" letter to set the correct source IP address for the monitor. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1965530/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1964339] Re: [OVN Octavia Provider] OVN provider tests using too old version of OVN
Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/833798 Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/5bee600eddd872d43ba6c51400c20fa5f50bd97f Submitter: "Zuul (22348)" Branch:master commit 5bee600eddd872d43ba6c51400c20fa5f50bd97f Author: Lucas Alvares Gomes Date: Tue Mar 15 14:30:56 2022 + Fix zuul templates for functional and tempest tests * The "devstack_localrc" key was missing, without it local.conf won't be populated. * Added OVN_BUILD_FROM_SOURCE to True to make sure OVN is compiled from code and not installed from packages. * Updated the version of OVN and OVS in the -released job to a newer version. Closes-Bug: #1964339 Signed-off-by: Lucas Alvares Gomes Change-Id: I05ed8fb053c78bd16a52b5d82a3ab51faf856d78 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1964339 Title: [OVN Octavia Provider] OVN provider tests using too old version of OVN Status in neutron: Fix Released Bug description: While trying to test out ovn-octavia-provider in devstack, I copied the functional testing vars, e.g. OVN_BRANCH: v20.06.0 OVS_BRANCH: v2.13.0 However, in devstack the octavia-driver-agent keeps crashing with a trace like this ERROR octavia.cmd.driver_agent Traceback (most recent call last): ERROR octavia.cmd.driver_agent File "/opt/stack/octavia/octavia/cmd/driver_agent.py", line 65, in _process_wrapper ERROR octavia.cmd.driver_agent function(exit_event) ERROR octavia.cmd.driver_agent File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/agent.py", line 42, in OvnProviderAgent ERROR octavia.cmd.driver_agent ovn_sb_idl_for_events = impl_idl_ovn.OvnSbIdlForLb( ERROR octavia.cmd.driver_agent File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/ovsdb/impl_idl_ovn.py", line 237, in __init__ ERROR octavia.cmd.driver_agent super().__init__( ERROR octavia.cmd.driver_agent File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/ovsdb/ovsdb_monitor.py", line 42, in __init__ ERROR octavia.cmd.driver_agent super().__init__(remote, schema) ERROR octavia.cmd.driver_agent File "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 141, in __init__ ERROR octavia.cmd.driver_agent schema = schema_helper.get_idl_schema() ERROR octavia.cmd.driver_agent File "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2058, in get_idl_schema ERROR octavia.cmd.driver_agent self._keep_table_columns(schema, table, columns)) ERROR octavia.cmd.driver_agent File "/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2066, in _keep_table_columns ERROR octavia.cmd.driver_agent assert table_name in schema.tables ERROR octavia.cmd.driver_agent AssertionError ERROR octavia.cmd.driver_agent After a bit of debugging, I found out that it is trying to get the 'Load_Balancer' table in the SB database, which is a recent commit[1] in v20.12.0 Not sure if this is a valid bug, but if the tests can be incremented to a working version of OVN that might help prevent others from tripping over this case. Thanks! [1] https://github.com/ovn- org/ovn/commit/42e694f03c187137852c2d7349daa0541a4f5e62 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1964339/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1965732] [NEW] loadbalancer stuck in PENDING_X if delete_vip_port fails
Public bug reported: Load balancer are stuck in pending_x status if delete_vip_port function fails with an error different than PortNotFound when:: - deleting a loadbalancer - failed to created a loadbalancer The problem comes from proper status update not being sent back to octavia ** Affects: neutron Importance: Undecided Assignee: Luis Tomas Bolivar (ltomasbo) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1965732 Title: loadbalancer stuck in PENDING_X if delete_vip_port fails Status in neutron: In Progress Bug description: Load balancer are stuck in pending_x status if delete_vip_port function fails with an error different than PortNotFound when:: - deleting a loadbalancer - failed to created a loadbalancer The problem comes from proper status update not being sent back to octavia To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1965732/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp