[Yahoo-eng-team] [Bug 1921150] Re: [QoS min bw] repeated ERROR log: Unable to save resource provider ... because: re-parenting a provider is not currently allowed

2021-05-01 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/782553
Committed: 
https://opendev.org/openstack/neutron/commit/7f35e4e857f7c6e83c635125ce9b42df6e10a510
Submitter: "Zuul (22348)"
Branch:master

commit 7f35e4e857f7c6e83c635125ce9b42df6e10a510
Author: Bence Romsics 
Date:   Tue Mar 23 14:07:36 2021 +0100

Physical NIC RP should be child of agent RP

In the fix for #1853840 I made a mistake and since then we created
the physical NIC resource providers as a child of the hypervisor
resource provider instead of the agent resource provider. Here:


https://review.opendev.org/c/openstack/neutron/+/696600/3/neutron/agent/common/placement_report.py#159

This *did not* break the minimum bandwidth aware scheduling.
But still there are multiple problems:

1) If you created your physical NIC RPs before the fix for #1853840
   but upgraded to after the fix for #1853840, then resource syncs
   will throw an error in neutron-server at each physical NIC RP
   update. That pollutes the logs and wastes some resources since
   the prohibited update will be forever retried.

2) If you created your physical NIC RPs after the fix for #1853840
   then your physical NIC RPs have the wrong parent. Which again
   does not break minimum bandwidth aware scheduling. But it may pose
   problems for later features wanting to build on the originally
   planned RP tree structure.

3) Cleanup of decommissioned RPs is a bit different than expected.
   This cleanup was always left to the admin, so it only affects a
   manual process.

The proper RP structure was and should be the following:

The hypervisor RP(s) must be the root(s).
As a child of each hypervisor RP, there should be an agent RP.
The physical NIC RPs should be the children of the agent RPs.

Unfortunately at the moment the Placement API generically prohibits
update of the parent resource provider id in a PUT request:


https://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-detail#update-resource-provider

Therefore without a later Placement change we cannot fix the RPs
already created with the wrong parent. However we can fix the RPs
to be created later. We do that here. We also fix a bug in the unit
tests that allowed the wrong parent to pass unnoticed. Plus we
add an extra log message to direct the user seeing the pollution
in the logs to the proper bug report.

There may be a follow up patch later, because not all RP re-parenting
operations are problematic, therefore we are thinking of relaxing
this blanket prohibition in Placement. When Placement allows updates
to the parent id we can fix RPs already created with the wrong parent
too.

Change-Id: I7caa8827d22103600ca685a58294640fc831dbd9
Closes-Bug: #1921150
Co-Authored-By: "Balazs Gibizer" 
Related-Bug: #1853840


** Changed in: neutron
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921150

Title:
  [QoS min bw] repeated ERROR log: Unable to save resource provider ...
  because: re-parenting a provider is not currently allowed

Status in neutron:
  Fix Released

Bug description:
  Description
  ===
  If neutron is configured with QoS guaranteed minimum bandwidth and the 
deployment is upgraded from a Stein 14.0.4 or older, or Train 15.0.1 or older 
to any newer OpenStack versions the following stack trace appears repeatedly in 
the neutron-server log:

  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin Traceback (most recent call last):
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return f(self, *a, **k)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 232, in 
update_resource_provider
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return self._put(url, 
update_body).json()
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 188, in _put
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin endpoint_filter=self._ks_filter, 
**kwargs)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 1114, 
in pu

[Yahoo-eng-team] [Bug 1832021] Re: Checksum drop of metadata traffic on isolated networks with DPDK

2021-05-01 Thread Mathew Hodson
Fixed in neutron 17.0.0 and later.

** Changed in: neutron (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1832021

Title:
  Checksum drop of metadata traffic on isolated networks with DPDK

Status in OpenStack neutron-openvswitch charm:
  Fix Released
Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive queens series:
  New
Status in Ubuntu Cloud Archive rocky series:
  New
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  New
Status in neutron source package in Focal:
  Fix Released

Bug description:
  [Impact]

  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs
  in the qdhcp ip netns rather than the qrouter netns.

  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True

  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.

  [Test Plan]

  1. Create an OpenStack deployment with DPDK options enabled and
  'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A
  sample, simple 3 node bundle can be found here[1].

  2. Create an external flat network and subnet:

  openstack network show dpdk_net || \
openstack network create --provider-network-type flat \
 --provider-physical-network physnet1 dpdk_net \
 --external

  openstack subnet show dpdk_net || \
  openstack subnet create --allocation-pool 
start=10.230.58.100,end=10.230.58.200 \
  --subnet-range 10.230.56.0/21 --dhcp --gateway 
10.230.56.1 \
  --dns-nameserver 10.230.56.2 \
  --ip-version 4 --network dpdk_net dpdk_subnet

  
  3. Create an instance attached to that network. The instance must have a 
flavor that uses huge pages.

  openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
  openstack flavor set m1.dpdk --property hw:mem_page_size=large

  openstack server create --wait --image xenial --flavor m1.dpdk --key-
  name testkey --network dpdk_net i1

  4. Log into the instance host and check the instance console. The
  instance will hang into the boot and show the following message:

  2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
  path http://169.254.169.254/openstack/2015-10-15/user_data due to:
  HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
  (read timeout=10.0)

  5. Apply the fix in all computes, restart the DHCP agents in all
  computes and create the instance again.

  6. No errors should be shown and the instance quickly boots.

  
  [Where problems could occur]

  * This change is only touched if datapath_type and ovs_use_veth. Those 
settings are mostly used for DPDK environments. The core of the fix is
  to toggle off checksum offload done by the DHCP namespace interfaces.
  This will have the drawback of adding some overhead on the packet processing 
for DHCP traffic but given DHCP does not demand too much data, this should be a 
minor proble.

  * Future changes on the syntax of the ethtool command could cause
  regressions


  [Other Info]

   * None


  [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1832021/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1832021] Re: Checksum drop of metadata traffic on isolated networks with DPDK

2021-05-01 Thread Mathew Hodson
Fixed in Ubuntu Focal.

---
neutron (2:16.1.0-0ubuntu1) focal; urgency=medium

  * New stable point release for OpenStack Ussuri (LP: #1892139).
  * d/control: Align (Build-)Depends with upstream.

 -- Chris MacNaughton   Thu, 27 Aug
2020 05:31:05 +

** Also affects: neutron (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Changed in: neutron (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: neutron (Ubuntu Focal)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1832021

Title:
  Checksum drop of metadata traffic on isolated networks with DPDK

Status in OpenStack neutron-openvswitch charm:
  Fix Released
Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive queens series:
  New
Status in Ubuntu Cloud Archive rocky series:
  New
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  New
Status in neutron source package in Focal:
  Fix Released

Bug description:
  [Impact]

  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs
  in the qdhcp ip netns rather than the qrouter netns.

  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True

  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.

  [Test Plan]

  1. Create an OpenStack deployment with DPDK options enabled and
  'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A
  sample, simple 3 node bundle can be found here[1].

  2. Create an external flat network and subnet:

  openstack network show dpdk_net || \
openstack network create --provider-network-type flat \
 --provider-physical-network physnet1 dpdk_net \
 --external

  openstack subnet show dpdk_net || \
  openstack subnet create --allocation-pool 
start=10.230.58.100,end=10.230.58.200 \
  --subnet-range 10.230.56.0/21 --dhcp --gateway 
10.230.56.1 \
  --dns-nameserver 10.230.56.2 \
  --ip-version 4 --network dpdk_net dpdk_subnet

  
  3. Create an instance attached to that network. The instance must have a 
flavor that uses huge pages.

  openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
  openstack flavor set m1.dpdk --property hw:mem_page_size=large

  openstack server create --wait --image xenial --flavor m1.dpdk --key-
  name testkey --network dpdk_net i1

  4. Log into the instance host and check the instance console. The
  instance will hang into the boot and show the following message:

  2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
  path http://169.254.169.254/openstack/2015-10-15/user_data due to:
  HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
  (read timeout=10.0)

  5. Apply the fix in all computes, restart the DHCP agents in all
  computes and create the instance again.

  6. No errors should be shown and the instance quickly boots.

  
  [Where problems could occur]

  * This change is only touched if datapath_type and ovs_use_veth. Those 
settings are mostly used for DPDK environments. The core of the fix is
  to toggle off checksum offload done by the DHCP namespace interfaces.
  This will have the drawback of adding some overhead on the packet processing 
for DHCP traffic but given DHCP does not demand too much data, this should be a 
minor proble.

  * Future changes on the syntax of the ethtool command could cause
  regressions


  [Other Info]

   * None


  [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1832021/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1832021] Re: Checksum drop of metadata traffic on isolated networks with DPDK

2021-05-01 Thread Mathew Hodson
** Changed in: neutron (Ubuntu)
   Importance: Undecided => Medium

** Also affects: neutron (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: neutron (Ubuntu Bionic)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1832021

Title:
  Checksum drop of metadata traffic on isolated networks with DPDK

Status in OpenStack neutron-openvswitch charm:
  Fix Released
Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive queens series:
  New
Status in Ubuntu Cloud Archive rocky series:
  New
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New
Status in neutron source package in Bionic:
  New

Bug description:
  [Impact]

  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs
  in the qdhcp ip netns rather than the qrouter netns.

  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True

  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.

  [Test Plan]

  1. Create an OpenStack deployment with DPDK options enabled and
  'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A
  sample, simple 3 node bundle can be found here[1].

  2. Create an external flat network and subnet:

  openstack network show dpdk_net || \
openstack network create --provider-network-type flat \
 --provider-physical-network physnet1 dpdk_net \
 --external

  openstack subnet show dpdk_net || \
  openstack subnet create --allocation-pool 
start=10.230.58.100,end=10.230.58.200 \
  --subnet-range 10.230.56.0/21 --dhcp --gateway 
10.230.56.1 \
  --dns-nameserver 10.230.56.2 \
  --ip-version 4 --network dpdk_net dpdk_subnet

  
  3. Create an instance attached to that network. The instance must have a 
flavor that uses huge pages.

  openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
  openstack flavor set m1.dpdk --property hw:mem_page_size=large

  openstack server create --wait --image xenial --flavor m1.dpdk --key-
  name testkey --network dpdk_net i1

  4. Log into the instance host and check the instance console. The
  instance will hang into the boot and show the following message:

  2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
  path http://169.254.169.254/openstack/2015-10-15/user_data due to:
  HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
  (read timeout=10.0)

  5. Apply the fix in all computes, restart the DHCP agents in all
  computes and create the instance again.

  6. No errors should be shown and the instance quickly boots.

  
  [Where problems could occur]

  * This change is only touched if datapath_type and ovs_use_veth. Those 
settings are mostly used for DPDK environments. The core of the fix is
  to toggle off checksum offload done by the DHCP namespace interfaces.
  This will have the drawback of adding some overhead on the packet processing 
for DHCP traffic but given DHCP does not demand too much data, this should be a 
minor proble.

  * Future changes on the syntax of the ethtool command could cause
  regressions


  [Other Info]

   * None


  [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1832021/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1915815] Re: vmware: Rescue impossible if VM folder renamed

2021-05-01 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/nova/+/775852
Committed: 
https://opendev.org/openstack/nova/commit/472825a83911c5351cfbd40458ad811dafdacacc
Submitter: "Zuul (22348)"
Branch:master

commit 472825a83911c5351cfbd40458ad811dafdacacc
Author: Johannes Kulik 
Date:   Tue Feb 16 15:56:38 2021 +0100

vmware: Handle folder renames in rescue cmd

When a VM is storage-vMotioned, the name of the folder its files are in
can change. Then, a rescue command trying to put the rescue disk into a
folder named after the instance's UUID cannot work anymore and actually
raises a FileNotFoundException for the directory path.

To fix this, we now take the root VMDK's folder and copy the rescue
image into that.

Change-Id: Icef785b96e51942e7bac2df10c116078c77fedc4
Closes-Bug: #1915815


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1915815

Title:
  vmware: Rescue impossible if VM folder renamed

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Steps to reproduce
  ==

  * storage-vMotion a VM (this renames the folder to the VM name, i.e. "$uuid" 
to "$name ($uuid)")
  * openstack server rescue $uuid

  Actual Result
  =

  Nova's vmware driver raises an exception:

  Traceback (most recent call last):
    File 
"/nova-base-source/nova-base-archive-stable-queens-m3/nova/compute/manager.py", 
line 3621, in rescue_instance
  rescue_image_meta, admin_password)
    File 
"/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/driver.py",
 line 601, in rescue
  self._vmops.rescue(context, instance, network_info, image_meta)
    File 
"/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/vmops.py",
 line 1802, in rescue
  vi.cache_image_path, rescue_disk_path)
    File 
"/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/ds_util.py",
 line 311, in disk_copy
  session._wait_for_task(copy_disk_task)
    File 
"/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/driver.py",
 line 725, in _wait_for_task
  return self.wait_for_task(task_ref)
    File 
"/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/api.py",
 line 402, in wait_for_task
     return evt.wait()
    File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/eventlet/event.py", line 
121, in wait
  return hubs.get_hub().switch()
    File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", 
line 294, in switch
  return self.greenlet.switch()
    File 
"/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/common/loopingcall.py",
 line 75, in _inner
  self.f(*self.args, **self.kw)
    File 
"/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/api.py",
 line 449, in _poll_task
  raise exceptions.translate_fault(task_info.error)
  FileNotFoundException: File [eph-bb145-3] 
551e5570-cf70-4ca0-9f37-e50210c4d2f5/ was not found

  Expected Result
  ===

  VM is put into rescue mode and boots the rescue image.

  Environment
  ===

  This happend on queens, but the same code is still there in master:
  
https://github.com/openstack/nova/blob/a7dd1f8881484ba0bf4270dd48109c2be142c333/nova/virt/vmwareapi/vmops.py#L1228-L1229

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1915815/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1926838] [NEW] [OVN] infinite loop in ovsdb_monitor

2021-05-01 Thread frigo
Public bug reported:

I am running the ovn sandbox, a second chassis, and neutron. I
synchronize neutron database with the databases of the sandbox, run
neutron-server, and possibly run a few ovs-vsctl commands on chassis to
set up ovs ports.

I notice that some commands on the chassis can trigger some sort of
infinite loop in neutron. For example

ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw
ovs-vsctl set open . external-ids:ovn-cms-options=xx
ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw

on the second chassis, will trigger transactions "in a loop" on the
neutron-server:

...
Successfully bumped revision number for resource f32ac6cc (type: ports) to 
571
Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
Running txn n=1 command(idx=0): CheckRevisionNumberCommand
Running txn n=1 command(idx=1): UpdateLRouterPortCommand
Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
Successfully bumped revision number for resource f32ac6cc (type: 
router_ports) to 572
Running txn n=1 command(idx=0): CheckRevisionNumberCommand
Running txn n=1 command(idx=1): SetLSwitchPortCommand
Running txn n=1 command(idx=2): PgDelPortCommand
Successfully bumped revision number for resource f32ac6cc (type: ports) to 
572
Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
Running txn n=1 command(idx=0): CheckRevisionNumberCommand
Running txn n=1 command(idx=1): UpdateLRouterPortCommand
Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
Successfully bumped revision number for resource f32ac6cc (type: 
router_ports) to 573
Running txn n=1 command(idx=0): CheckRevisionNumberCommand
Running txn n=1 command(idx=1): SetLSwitchPortCommand
Running txn n=1 command(idx=2): PgDelPortCommand
...


This is not limited to the change of external-ids:ovn-cmd-options, other 
ovs-vsctl commands can trigger the same issue.

neutron-server CPU consumption jumps to 100% and the revision_number of
ports keep increasing. Restarting neutron-server fixes the issue
temporarily.

I am not sure how to provide a simple reproducer because I did not found
any instructions to run neutron standalone and two OVN chassis. I will
investigate what is happening locally.

Version: main branch from OVN (d41a337fe3b608a8f90de8722d148344011f0bd8)
and of Neutron  (94d36862c207b1e4d984d28874ca2f3bd09c855f)

It's not a blocker as long as it happens only on my laptop.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: ovn

** Attachment added: "logs of one loop"
   https://bugs.launchpad.net/bugs/1926838/+attachment/5494052/+files/logs1

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1926838

Title:
  [OVN] infinite loop in ovsdb_monitor

Status in neutron:
  New

Bug description:
  I am running the ovn sandbox, a second chassis, and neutron. I
  synchronize neutron database with the databases of the sandbox, run
  neutron-server, and possibly run a few ovs-vsctl commands on chassis
  to set up ovs ports.

  I notice that some commands on the chassis can trigger some sort of
  infinite loop in neutron. For example

  ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw
  ovs-vsctl set open . external-ids:ovn-cms-options=xx
  ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw

  on the second chassis, will trigger transactions "in a loop" on the
  neutron-server:

  ...
  Successfully bumped revision number for resource f32ac6cc (type: ports) 
to 571
  Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
  Running txn n=1 command(idx=0): CheckRevisionNumberCommand
  Running txn n=1 command(idx=1): UpdateLRouterPortCommand
  Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
  Successfully bumped revision number for resource f32ac6cc (type: 
router_ports) to 572
  Running txn n=1 command(idx=0): CheckRevisionNumberCommand
  Running txn n=1 command(idx=1): SetLSwitchPortCommand
  Running txn n=1 command(idx=2): PgDelPortCommand
  Successfully bumped revision number for resource f32ac6cc (type: ports) 
to 572
  Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
  Running txn n=1 command(idx=0): CheckRevisionNumberCommand
  Running txn n=1 command(idx=1): UpdateLRouterPortCommand
  Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
  Successfully bumped revision number for resource f32ac6cc (type: 
router_ports) to 573
  Running txn n=1 command(idx=0): CheckRevisionNumberCommand
  Running txn n=1 command(idx=1): SetLSwitchPortCommand
  Running txn n=1 command(idx=2): PgDelPortCommand
  ...

  
  This is not limited to the change of external-ids:ovn-cm

[Yahoo-eng-team] [Bug 1926836] [NEW] Keystone Redis Caching

2021-05-01 Thread Behzad
Public bug reported:

As trying to integrate the Keystone with Redis as the caching layer (the 
procedures are working fine with the Memcached), the following error is being 
shown:
The information:
OS = Ubuntu 20.04
Openstack repo: Ubuntu cloud (Wallaby edition)
keystone version 19.0.0

When i issue the "openstack endpoint list --service identity" command, and the 
output shows the "Internal Server Error (HTTP 500)".

The following error is the error shown in the /var/log/apache2/keystone.log:
here is the link for https://paste.ubuntu.com/p/CmrgZ2JrkX/
the following is the content of the link:

2021-05-01 16:37:17.350115 mod_wsgi (pid=241861): Exception occurred processing 
WSGI script '/usr/bin/keystone-wsgi-public'.
2021-05-01 16:37:17.358549 Traceback (most recent call last):
2021-05-01 16:37:17.359016   File 
"/usr/lib/python3/dist-packages/flask/app.py", line 2463, in __call__
2021-05-01 16:37:17.359044 return self.wsgi_app(environ, start_response)
2021-05-01 16:37:17.359072   File 
"/usr/lib/python3/dist-packages/werkzeug/middleware/proxy_fix.py", line 232, in 
__call__
2021-05-01 16:37:17.359083 return self.app(environ, start_response)
2021-05-01 16:37:17.359104   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 129, in __call__
2021-05-01 16:37:17.359114 resp = self.call_func(req, *args, **kw)
2021-05-01 16:37:17.359134   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 193, in call_func
2021-05-01 16:37:17.359144 return self.func(req, *args, **kwargs)
2021-05-01 16:37:17.359165   File 
"/usr/lib/python3/dist-packages/oslo_middleware/base.py", line 124, in __call__
2021-05-01 16:37:17.359174 response = req.get_response(self.application)
2021-05-01 16:37:17.359195   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1313, in send
2021-05-01 16:37:17.359204 status, headers, app_iter = 
self.call_application(
2021-05-01 16:37:17.359225   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1278, in 
call_application
2021-05-01 16:37:17.359239 app_iter = application(self.environ, 
start_response)
2021-05-01 16:37:17.359260   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 143, in __call__
2021-05-01 16:37:17.359271 return resp(environ, start_response)
2021-05-01 16:37:17.359293   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 129, in __call__
2021-05-01 16:37:17.359304 resp = self.call_func(req, *args, **kw)
2021-05-01 16:37:17.359325   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 193, in call_func
2021-05-01 16:37:17.359339 return self.func(req, *args, **kwargs)
2021-05-01 16:37:17.359363   File 
"/usr/lib/python3/dist-packages/oslo_middleware/base.py", line 124, in __call__
2021-05-01 16:37:17.359373 response = req.get_response(self.application)
2021-05-01 16:37:17.359395   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1313, in send
2021-05-01 16:37:17.359406 status, headers, app_iter = 
self.call_application(
2021-05-01 16:37:17.359429   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1278, in 
call_application
2021-05-01 16:37:17.359439 app_iter = application(self.environ, 
start_response)
2021-05-01 16:37:17.359460   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 129, in __call__
2021-05-01 16:37:17.359470 resp = self.call_func(req, *args, **kw)
2021-05-01 16:37:17.359492   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 193, in call_func
2021-05-01 16:37:17.359502 return self.func(req, *args, **kwargs)
2021-05-01 16:37:17.359523   File 
"/usr/lib/python3/dist-packages/osprofiler/web.py", line 112, in __call__
2021-05-01 16:37:17.359534 return request.get_response(self.application)
2021-05-01 16:37:17.359579   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1313, in send
2021-05-01 16:37:17.359589 status, headers, app_iter = 
self.call_application(
2021-05-01 16:37:17.359611   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1278, in 
call_application
2021-05-01 16:37:17.359625 app_iter = application(self.environ, 
start_response)
2021-05-01 16:37:17.359648   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 129, in __call__
2021-05-01 16:37:17.359658 resp = self.call_func(req, *args, **kw)
2021-05-01 16:37:17.359679   File 
"/usr/lib/python3/dist-packages/webob/dec.py", line 193, in call_func
2021-05-01 16:37:17.359688 return self.func(req, *args, **kwargs)
2021-05-01 16:37:17.359709   File 
"/usr/lib/python3/dist-packages/oslo_middleware/request_id.py", line 58, in 
__call__
2021-05-01 16:37:17.359722 response = req.get_response(self.application)
2021-05-01 16:37:17.359745   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1313, in send
2021-05-01 16:37:17.359754 status, headers, app_iter = 
self.call_application(
2021-05-01 16:37:17.359776   File 
"/usr/lib/python3/dist-packages/webob/request.py", line 1