[Yahoo-eng-team] [Bug 1965183] Re: ovn migration executes scripts from /tmp directory

2022-03-21 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/834071
Committed: 
https://opendev.org/openstack/neutron/commit/0529ccdf71dcd093a80180097eeaa5d7cb5e15fb
Submitter: "Zuul (22348)"
Branch:master

commit 0529ccdf71dcd093a80180097eeaa5d7cb5e15fb
Author: Jakub Libosvar 
Date:   Wed Mar 16 16:40:21 2022 -0400

ovn migration: Don't use executables in /tmp/

It's a common practice to have /tmp/ mounted separately with noexec
option. This effectively means no scripts can be executed from the
filesystem mounted to /tmp.

This patch explicitly calls sh binary to execute scripts from /tmp and
removes the executable flag from the scripts.

Closes-Bug: #1965183

Change-Id: I2f9cd67979a8a75848fcdd7a8c3bb56dd3590473
Signed-off-by: Jakub Libosvar 


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965183

Title:
  ovn migration executes scripts from /tmp directory

Status in neutron:
  Fix Released

Bug description:
  Description of problem:
  The /tmp are often mounted with noexec option for security reasons. The 
migration roles rely that scripts in /tmp/ can be executed.

  Version-Release number of selected component (if applicable):
  16.1

  How reproducible:
  Always

  Steps to Reproduce:
  1. Have /tmp mounted with noexec option
  2. Run migration from ovs to ovn
  3.

  Actual results:
  fatal: [tpa-vim-b-computecl-0]: FAILED! => {
  "changed": true,
  "cmd": "/tmp/clone-br-int.sh",
  "delta": "0:00:00.001773",
  "end": "2022-03-16 18:51:30.332449",
  "invocation": {
  "module_args": {
  "_raw_params": "/tmp/clone-br-int.sh",
  "_uses_shell": true,
  "argv": null,
  "chdir": null,
  "creates": null,
  "executable": null,
  "removes": null,
  "stdin": null,
  "stdin_add_newline": true,
  "strip_empty_ends": true,
  "warn": true
  }
  },
  "msg": "non-zero return code",
  "rc": 126,
  "start": "2022-03-16 18:51:30.330676",
  "stderr": "/bin/sh: /tmp/clone-br-int.sh: Permission denied",
  "stderr_lines": [
  "/bin/sh: /tmp/clone-br-int.sh: Permission denied"
  ],
  "stdout": "",
  "stdout_lines": []
  }

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1965183/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1959567] Re: QoS Ingress bandwidth limit with OVS backend may not work as expected

2022-03-21 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/832662
Committed: 
https://opendev.org/openstack/neutron/commit/f7ab90baad83823cfb94bc8b9450cd915ea49c03
Submitter: "Zuul (22348)"
Branch:master

commit f7ab90baad83823cfb94bc8b9450cd915ea49c03
Author: Slawek Kaplonski 
Date:   Tue Mar 8 16:44:59 2022 +0100

Fix ingress bandwidth limit in the openvswitch agent

For ingress bandwidth limiting openvswitch agent is using QoS and queues
from the Open vSwitch. There is always queue 0 used for that purpose.
Initially, when this feature was implemented, we assumed that queue 0 is
kind of the "default" queue to which all traffic will be send if there
are no other queues. But that's not true thus ingress bandwidth limiting
wasn't working properly with this agent.

This patch fixes that issue but adding in the table=0 of the br-int
additional OF rule to send all traffic to the queue 0.
In this queue for some ports there can be QoS configured
and then it will be applied for the port. If port don't have any QoS
configured, nothing will happen and all will work like before this
patch.

Biggest problem with that solution was the case when also ports with
minimum bandwidth are on the same node becuase such ports are using
different queues (queue number is the same as ofport number of the tap
interface).
In case when traffic is going from the port with minimum bandwidth QoS
to the port which has ingress bw limit configured, traffic is going only
through br-int and will use queue 0 to apply ingress bw limit properly.
In case when traffic from port with minimum bandwidth set needs to go
out from the host, it will always use physical bridge (minimum bandwidth
is only supported for the provider networks) and proper queue will be
set for such traffic in the physical bridge.
To be able to set proper queue in the physical bridge, this patch adds
additional OF rule to the br-int to set queue_num value in the pkt_mark
field [1] as this seems to be only field which can "survive" passing
bridges.

[1] https://man7.org/linux/man-pages/man7/ovs-fields.7.html

Closes-Bug: #1959567
Change-Id: I1e31565475f38c6ad817268699b165759ac05410


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1959567

Title:
  QoS Ingress bandwidth limit with OVS backend may not work as expected

Status in neutron:
  Fix Released

Bug description:
  According to the OVS faq
  https://docs.openvswitch.org/en/latest/faq/qos/

  Q: I configured Quality of Service (QoS) in my OpenFlow network by adding
 records to the QoS and Queue table, but the results aren’t what I expect.

  A: Did you install OpenFlow flows that use your queues? This is the primary
 way to tell Open vSwitch which queues you want to use. If you don’t do 
this,
 then the default queue will be used, which will probably not have the 
effect
 you want.

  
  According to info from the OVS developer, Ilya Maximets "OVS doesn't define 
what the "default queue" is. [...] So, using the set_queue action is a correct 
way to configure QoS, even if the queue 0 is currently a "default queue".  It's 
not guaranteed that it always will be."

  Because of that Neutron OVS agent should configure correct OF rules to
  send traffic to the required QoS queue always.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1959567/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1960230] Re: resize fails with FileExistsError if earlier resize attempt failed to cleanup

2022-03-21 Thread melanie witt
** Also affects: nova/xena
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1960230

Title:
  resize fails with FileExistsError if earlier resize attempt failed to
  cleanup

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) xena series:
  New

Bug description:
  This bug is related to resize with the libvirt driver

  If you are performing a resize and it fails the
  _cleanup_remote_migration() [1] function in the libvirt driver will
  try to cleanup the /var/lib/nova/instances/_resize directory on
  the remote side [2] - if this fails the _resize directory will
  be left behind and block any future resize attempts.

  2021-12-14 14:40:12.535 175177 INFO nova.virt.libvirt.driver
  [req-9d3477d4-3bb2-456f-9be6-dce9893b0e95
  23d6aa8884ab44ef9f214ad195d273c0 050c556faa5944a8953126c867313770 -
  default default] [instance: 99287438-c37b-44b0-834e-55685b6e83eb]
  Deletion of
  /var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize
  failed

  Then on next resize attempt a long time later

  2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10429, in 
migrate_disk_and_power_off
  2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server 
os.rename(inst_base, inst_base_resize)
  2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server 
FileExistsError: [Errno 17] File exists: 
'/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb' -> 
'/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize'

  This is happens here [3] because os.rename tries to rename the
  /var/lib/nova/instances/ dir to _resize that already
  exists and fails with FileExistsError.

  We should check if the directory exists before trying to rename and
  delete it before.

  [1] 
https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10773
  [2] 
https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10965
  [3] 
https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10915

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1960230/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1960346] Re: Volume detach failure in devstack-platform-centos-9-stream job

2022-03-21 Thread Ghanshyam Mann
Closing it for nova as it is fixed on the tempest side.

** Changed in: nova
   Status: Triaged => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1960346

Title:
  Volume detach failure in devstack-platform-centos-9-stream job

Status in OpenStack Compute (nova):
  Invalid
Status in tempest:
  Fix Released

Bug description:
  devstack-platform-centos-9-stream job is failing 100% with the compute
  server rescue test with volume detach error:

  traceback-1: {{{
  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/waiters.py", line 316, in 
wait_for_volume_resource_status
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to reach 
available status (current in-use) within the required time (196 s).
  }}}

  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/waiters.py", line 384, in 
wait_for_volume_attachment_remove_from_server
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: Volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to detach from 
server cf57d12b-5e37-431e-8c71-4a7149e963ae within the required time (196 s) 
from the compute API perspective

  
https://a886e0e70a23f464643f-7cd608bf14cafb686390b86bc06cde2a.ssl.cf1.rackcdn.com/827576/6/check/devstack-
  platform-centos-9-stream/53de74e/testr_results.html

  
  
https://zuul.openstack.org/builds?job_name=devstack-platform-centos-9-stream&skip=0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1960346/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1960346] Re: Volume detach failure in devstack-platform-centos-9-stream job

2022-03-21 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/tempest/+/834350
Committed: 
https://opendev.org/openstack/tempest/commit/7304e3ac8973a42bcfff91d561ac9d238b187334
Submitter: "Zuul (22348)"
Branch:master

commit 7304e3ac8973a42bcfff91d561ac9d238b187334
Author: Ghanshyam Mann 
Date:   Fri Mar 18 13:58:25 2022 -0500

Move ServerStableDeviceRescueTest to wait for SSH-able server

ServerStableDeviceRescueTest also perform the attach_volume
on rescue server and in cleanup detach_volume. As described in
the bug#1960346 we need to wait for server readiness before
detach volume called.

Also making centos stream 9 job as voting.

Closes-Bug: #1960346
Change-Id: Ia213297b13f42d39213dea9a3b2cfee561cdcf28


** Changed in: tempest
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1960346

Title:
  Volume detach failure in devstack-platform-centos-9-stream job

Status in OpenStack Compute (nova):
  Triaged
Status in tempest:
  Fix Released

Bug description:
  devstack-platform-centos-9-stream job is failing 100% with the compute
  server rescue test with volume detach error:

  traceback-1: {{{
  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/waiters.py", line 316, in 
wait_for_volume_resource_status
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to reach 
available status (current in-use) within the required time (196 s).
  }}}

  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/waiters.py", line 384, in 
wait_for_volume_attachment_remove_from_server
  raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: Volume 70cedb4b-e74d-4a86-a73d-ba8bce29bc99 failed to detach from 
server cf57d12b-5e37-431e-8c71-4a7149e963ae within the required time (196 s) 
from the compute API perspective

  
https://a886e0e70a23f464643f-7cd608bf14cafb686390b86bc06cde2a.ssl.cf1.rackcdn.com/827576/6/check/devstack-
  platform-centos-9-stream/53de74e/testr_results.html

  
  
https://zuul.openstack.org/builds?job_name=devstack-platform-centos-9-stream&skip=0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1960346/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1965819] [NEW] list object has no attribute 'acked'

2022-03-21 Thread Terry Wilson
Public bug reported:

Using python-ovs master, there are errors such as:

ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn [None 
req-81ffbcd9-59d1-498a-aea2-d57e0d515ff2 None None] OVS database connection to 
OVN_Southbound failed with error: ''list' object has no attribute 'acked''. 
Verify that the OVS and OVN services are available and that the 
'ovn_nb_connection' and 'ovn_sb_connection' configuration options are correct.: 
AttributeError: 'list' object has no attribute 'acked'
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn Traceback 
(most recent call last):
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py",
 line 127, in start_connection
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
self.ovsdb_connection.start()
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/usr/local/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/connection.py",
 line 83, in start
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
idlutils.wait_for_change(self.idl, self.timeout)
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py",
 line 53, in wait_for_change
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn while 
idl_.change_seqno == seqno and not idl_.run():
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/ovs/python/ovs/db/idl.py", line 506, in run
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
self.__send_monitor_request()
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/ovs/python/ovs/db/idl.py", line 814, in __send_monitor_request
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn not 
ConditionState.is_true(table.condition.acked)):
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
AttributeError: 'list' object has no attribute 'acked'
ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn

ovsdbapp, neutron, and networking-ovn all directly set 'table.condition'
which is kinda-sorta not really public API. The type of this variable
changed with
https://github.com/openvswitch/ovs/commit/46d44cf3be0dbf4a44cebea3b279b3d16a326796
and there has been some breakage.

** Affects: networking-ovn
 Importance: Undecided
 Status: New

** Affects: neutron
 Importance: Undecided
 Status: New

** Affects: ovsdbapp
 Importance: Undecided
 Status: New

** Also affects: neutron
   Importance: Undecided
   Status: New

** Also affects: networking-ovn
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965819

Title:
  list object has no attribute 'acked'

Status in networking-ovn:
  New
Status in neutron:
  New
Status in ovsdbapp:
  New

Bug description:
  Using python-ovs master, there are errors such as:

  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn [None 
req-81ffbcd9-59d1-498a-aea2-d57e0d515ff2 None None] OVS database connection to 
OVN_Southbound failed with error: ''list' object has no attribute 'acked''. 
Verify that the OVS and OVN services are available and that the 
'ovn_nb_connection' and 'ovn_sb_connection' configuration options are correct.: 
AttributeError: 'list' object has no attribute 'acked'
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
Traceback (most recent call last):
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py",
 line 127, in start_connection
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
self.ovsdb_connection.start()
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/usr/local/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/connection.py",
 line 83, in start
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
idlutils.wait_for_change(self.idl, self.timeout)
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py",
 line 53, in wait_for_change
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
while idl_.change_seqno == seqno and not idl_.run():
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn   File 
"/opt/stack/ovs/python/ovs/db/idl.py", line 506, in run
  ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.impl_idl_ovn 
self.__send_monitor_request()
  ERROR neutron.plugins.ml2.drive

[Yahoo-eng-team] [Bug 1965807] [NEW] Bulk port creation breaks IPAM module when driver port creation fails

2022-03-21 Thread Rodolfo Alonso
Public bug reported:

This bug is related to https://bugs.launchpad.net/neutron/+bug/1954763.

There is a problem with the IPAM module, [1] and ML2 OVN driver. When a
port is created in "create_port_bulk", we first create the IPAM
allocations [1], to save time and DB accesses.

However, if one port fails in the driver call, the port is deleted. In
this port deletion, the IPAM reservation is deleted too. At the end of
"create_port_bulk", the previously created IPAM reservations are deleted
too (again). This is what is triggering the error in the Neutron server
[2]. The DB now has a port register, a IP allocation register but not an
IPAM allocation register. That prevents from deleting manually this port
(error 500).

Red Hat bugzilla reference:
https://bugzilla.redhat.com/show_bug.cgi?id=2065634

[1]https://review.opendev.org/q/I8877c658446fed155130add6f1c69f2772113c27
[2]https://paste.opendev.org/show/b6e3INfzhPTkOFW5fhPb/

** Affects: neutron
 Importance: Medium
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

** Changed in: neutron
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965807

Title:
  Bulk port creation breaks IPAM module when driver port creation fails

Status in neutron:
  New

Bug description:
  This bug is related to
  https://bugs.launchpad.net/neutron/+bug/1954763.

  There is a problem with the IPAM module, [1] and ML2 OVN driver. When
  a port is created in "create_port_bulk", we first create the IPAM
  allocations [1], to save time and DB accesses.

  However, if one port fails in the driver call, the port is deleted. In
  this port deletion, the IPAM reservation is deleted too. At the end of
  "create_port_bulk", the previously created IPAM reservations are
  deleted too (again). This is what is triggering the error in the
  Neutron server [2]. The DB now has a port register, a IP allocation
  register but not an IPAM allocation register. That prevents from
  deleting manually this port (error 500).

  Red Hat bugzilla reference:
  https://bugzilla.redhat.com/show_bug.cgi?id=2065634

  [1]https://review.opendev.org/q/I8877c658446fed155130add6f1c69f2772113c27
  [2]https://paste.opendev.org/show/b6e3INfzhPTkOFW5fhPb/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1965807/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1965772] [NEW] ovn-octavia-provider does not report status correctly to octavia

2022-03-21 Thread Gabriel Barazer
Public bug reported:

Hi all,

The OVN Octavia provider does not report status correctly to Octavia due
to a few bugs in the health monitoring implementation:

1) 
https://opendev.org/openstack/ovn-octavia-provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2374
 :
In _get_lb_on_hm_event, the request to the OVN NB API (db_find_rows) is 
incorrect:
lbs = self.ovn_nbdb_api.db_find_rows(
'Load_Balancer', (('ip_port_mappings', '=', mappings),
  ('protocol', '=', row.protocol))).execute()

Should be :
lbs = self.ovn_nbdb_api.db_find_rows(
'Load_Balancer', ('ip_port_mappings', '=', mappings),
  ('protocol', '=', row.protocol[0])).execute()

Note the removed extra parenthesis and the protocol string which is
found in the first element of the protocol[] list.

2) https://opendev.org/openstack/ovn-octavia-
provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2426
:

There is a confusion with the Pool object returned by (pool =
self._octavia_driver_lib.get_pool(pool_id)) : this object does not
contain any operating_status attribute and it seems given the current
state of the octavia-lib that it possible to set and update the status
for a listener/pool/member but not possible to retrieve the current
status.

See https://opendev.org/openstack/octavia-
lib/src/branch/master/octavia_lib/api/drivers/data_models.py for the
current Pool data model.

As a result, the computation done by _get_new_operating_statuses cannot
use the current operating status to set a new operating status. It is
still possible to set an operating status for the members by setting
them to "OFFLINE" separately when a HM update event is fired.

3) The Load_Balancer_Health_Check NB entry creates the Service_Monitor
SB entries, but there isn't any way to link the Service_Monitor entries
created with the original NB entry. The result is that health monitor
events received from the SB and processed by the octavia driver agent
cannot be accurately matched with the correct octavia health monitor
entry. If we have for example two load balancer entries using the same
pool members and the same ports, only the first LB returned with
db_find_rows would be updated (given the #2 bug is fixed). The case for
having 2 load balancers with the same members is perfectly valid when
using separate load balancers for public traffic (using a VIP from a
public pool) and another one for internal/admin traffic (using a VIP
from another pool, and with a source range whitelist). The code
selecting only the first LB in that case is the same as bug #1.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965772

Title:
  ovn-octavia-provider does not report status correctly to octavia

Status in neutron:
  New

Bug description:
  Hi all,

  The OVN Octavia provider does not report status correctly to Octavia
  due to a few bugs in the health monitoring implementation:

  1) 
https://opendev.org/openstack/ovn-octavia-provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2374
 :
  In _get_lb_on_hm_event, the request to the OVN NB API (db_find_rows) is 
incorrect:
  lbs = self.ovn_nbdb_api.db_find_rows(
  'Load_Balancer', (('ip_port_mappings', '=', mappings),
('protocol', '=', row.protocol))).execute()

  Should be :
  lbs = self.ovn_nbdb_api.db_find_rows(
  'Load_Balancer', ('ip_port_mappings', '=', mappings),
('protocol', '=', row.protocol[0])).execute()

  Note the removed extra parenthesis and the protocol string which is
  found in the first element of the protocol[] list.

  2) https://opendev.org/openstack/ovn-octavia-
  
provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2426
  :

  There is a confusion with the Pool object returned by (pool =
  self._octavia_driver_lib.get_pool(pool_id)) : this object does not
  contain any operating_status attribute and it seems given the current
  state of the octavia-lib that it possible to set and update the status
  for a listener/pool/member but not possible to retrieve the current
  status.

  See https://opendev.org/openstack/octavia-
  lib/src/branch/master/octavia_lib/api/drivers/data_models.py for the
  current Pool data model.

  As a result, the computation done by _get_new_operating_statuses
  cannot use the current operating status to set a new operating status.
  It is still possible to set an operating status for the members by
  setting them to "OFFLINE" separately when a HM update event is fired.

  3) The Load_Balancer_Health_Check NB entry creates the Service_Monitor
  SB entries, but there is

[Yahoo-eng-team] [Bug 1965530] Re: ovn-octavia-provider health monitoring fails because doesn't set the correct source IP address for OVN health check

2022-03-21 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/ovn-octavia-provider/+/834345
Committed: 
https://opendev.org/openstack/ovn-octavia-provider/commit/90bd5dc71ecd4891fcd9292d56726f034a59fbdf
Submitter: "Zuul (22348)"
Branch:master

commit 90bd5dc71ecd4891fcd9292d56726f034a59fbdf
Author: Miguel Lavalle 
Date:   Fri Mar 18 13:03:22 2022 -0500

Remove incorrect character in f-string

Patch [1] switched to f-strings, unwillingly introducing a 'i' to an
ip address. This change corrects that error

[1] https://review.opendev.org/c/openstack/ovn-octavia-provider/+/816829

Closes-Bug: #1965530
Change-Id: I832780660a3658552dc8fb50c3e232306e1fa110


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965530

Title:
  ovn-octavia-provider health monitoring fails because doesn't set the
  correct source IP address for OVN health check

Status in neutron:
  Fix Released

Bug description:
  Hello all,

  IT seems the health checking feature of ovn octavia provider is broken
  and all service monitors show offline status because of a typo in
  https://opendev.org/openstack/ovn-octavia-
  
provider/src/commit/1b212ec54fa336cc7a22a502ad3d0ce39658138e/ovn_octavia_provider/helper.py#L2171

  Note the "i" prepended in the IP address in member_src +=
  f'i{hm_source_ip}'

  This causes the IP string to be casted including the "i" which give a
  random IP address for health check source, which in turn cause the
  response packet of the checked backend to be sent to the gateway MAC
  instead of the health check monitor source MAC defined in NB_Global
  options:svc_monitor_mac. The monitor does not get the check response
  and fails.

  The very simple fix is to remove this "i" letter to set the correct
  source IP address for the monitor.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1965530/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1964339] Re: [OVN Octavia Provider] OVN provider tests using too old version of OVN

2022-03-21 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/ovn-octavia-provider/+/833798
Committed: 
https://opendev.org/openstack/ovn-octavia-provider/commit/5bee600eddd872d43ba6c51400c20fa5f50bd97f
Submitter: "Zuul (22348)"
Branch:master

commit 5bee600eddd872d43ba6c51400c20fa5f50bd97f
Author: Lucas Alvares Gomes 
Date:   Tue Mar 15 14:30:56 2022 +

Fix zuul templates for functional and tempest tests

* The "devstack_localrc" key was missing, without it local.conf won't be
  populated.

* Added OVN_BUILD_FROM_SOURCE to True to make sure OVN is compiled
from code and not installed from packages.

* Updated the version of OVN and OVS in the -released job to a newer
  version.

Closes-Bug: #1964339
Signed-off-by: Lucas Alvares Gomes 
Change-Id: I05ed8fb053c78bd16a52b5d82a3ab51faf856d78


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1964339

Title:
  [OVN Octavia Provider] OVN provider tests using too old version of OVN

Status in neutron:
  Fix Released

Bug description:
  While trying to test out ovn-octavia-provider in devstack, I copied
  the functional testing vars, e.g.

   OVN_BRANCH: v20.06.0
   OVS_BRANCH: v2.13.0

  However, in devstack the octavia-driver-agent keeps crashing with a
  trace like this

  ERROR octavia.cmd.driver_agent Traceback (most recent call last):
  ERROR octavia.cmd.driver_agent   File 
"/opt/stack/octavia/octavia/cmd/driver_agent.py", line 65, in _process_wrapper
  ERROR octavia.cmd.driver_agent function(exit_event)
  ERROR octavia.cmd.driver_agent   File 
"/opt/stack/ovn-octavia-provider/ovn_octavia_provider/agent.py", line 42, in 
OvnProviderAgent
  ERROR octavia.cmd.driver_agent ovn_sb_idl_for_events = 
impl_idl_ovn.OvnSbIdlForLb(
  ERROR octavia.cmd.driver_agent   File 
"/opt/stack/ovn-octavia-provider/ovn_octavia_provider/ovsdb/impl_idl_ovn.py", 
line 237, in __init__
  ERROR octavia.cmd.driver_agent super().__init__(
  ERROR octavia.cmd.driver_agent   File 
"/opt/stack/ovn-octavia-provider/ovn_octavia_provider/ovsdb/ovsdb_monitor.py", 
line 42, in __init__
  ERROR octavia.cmd.driver_agent super().__init__(remote, schema)
  ERROR octavia.cmd.driver_agent   File 
"/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 141, in __init__
  ERROR octavia.cmd.driver_agent schema = schema_helper.get_idl_schema()
  ERROR octavia.cmd.driver_agent   File 
"/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2058, in 
get_idl_schema
  ERROR octavia.cmd.driver_agent self._keep_table_columns(schema, table, 
columns))
  ERROR octavia.cmd.driver_agent   File 
"/usr/local/lib/python3.8/dist-packages/ovs/db/idl.py", line 2066, in 
_keep_table_columns
  ERROR octavia.cmd.driver_agent assert table_name in schema.tables
  ERROR octavia.cmd.driver_agent AssertionError
  ERROR octavia.cmd.driver_agent

  After a bit of debugging, I found out that it is trying to get the
  'Load_Balancer' table in the SB database, which is a recent commit[1]
  in v20.12.0

  Not sure if this is a valid bug, but if the tests can be incremented
  to a working version of OVN that might help prevent others from
  tripping over this case. Thanks!

  [1] https://github.com/ovn-
  org/ovn/commit/42e694f03c187137852c2d7349daa0541a4f5e62

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1964339/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1965732] [NEW] loadbalancer stuck in PENDING_X if delete_vip_port fails

2022-03-21 Thread Luis Tomas Bolivar
Public bug reported:

Load balancer are stuck in pending_x status if delete_vip_port function fails
with an error different than PortNotFound when::
- deleting a loadbalancer
- failed to created a loadbalancer

The problem comes from proper status update not being sent back to
octavia

** Affects: neutron
 Importance: Undecided
 Assignee: Luis Tomas Bolivar (ltomasbo)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965732

Title:
  loadbalancer stuck in PENDING_X if delete_vip_port fails

Status in neutron:
  In Progress

Bug description:
  Load balancer are stuck in pending_x status if delete_vip_port function fails
  with an error different than PortNotFound when::
  - deleting a loadbalancer
  - failed to created a loadbalancer

  The problem comes from proper status update not being sent back to
  octavia

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1965732/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp