[Yahoo-eng-team] [Bug 1886969] Re: dhcp bulk reload fails with python3

2020-10-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/742363
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=20b138ff3118029e86f0525695160c4c7ca8b551
Submitter: Zuul
Branch:master

commit 20b138ff3118029e86f0525695160c4c7ca8b551
Author: Matt Vinall 
Date:   Thu Jul 9 21:08:21 2020 +0100

fix dhcp bulk reload exceptions

1886969 - The bulk reload code was written for python2 and caused
an exception running under python3. This change works under python3.

1890027 - There was an additional exception triggered when
deleting networks - reading the network from the cache returned 'None'
and this was not properly checked before use.

Change-Id: I4e546c0e37146b1f34d8b5e6637c407b0c04ad4d
Closes-Bug: 1886969
Closes-Bug: 1890027
Signed-off-by: Matt Vinall 


** Changed in: neutron
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1886969

Title:
  dhcp bulk reload fails with python3

Status in neutron:
  Fix Released

Bug description:
  In ussuri, configuring the neutron.conf bulk_reload_interval enables
  bulk reload mode. The current code looks to be incompatible with
  python3.

  With the current ussuri code, which looks unchanged on master, I get
  the following error in docker logs:

  ---
  Running command: 'neutron-dhcp-agent --config-file /etc/neutron/neutron.conf 
--config-file /etc/neutron/dhcp_agent.ini'
  + exec neutron-dhcp-agent --config-file /etc/neutron/neutron.conf 
--config-file /etc/neutron/dhcp_agent.ini
  Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 461, in 
fire_timers
  timer()
File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in 
__call__
  cb(*args, **kw)
File "/usr/lib/python3.6/site-packages/eventlet/semaphore.py", line 147, in 
_do_acquire
  waiter.switch()
File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 
154, in _reload_bulk_allocations
  for network_id in self._network_bulk_allocations.keys():
  RuntimeError: dictionary changed size during iteration
  ---

  After this, I see no further updates to the dnsmasq hosts file.

  The current code looks like this:

  
https://github.com/openstack/neutron/blob/56bb42fcbc43b619c3c07897c7de88f29158e4b8/neutron/agent/dhcp/agent.py#L157
  ---
  def _reload_bulk_allocations(self):
  while True:
  for network_id in self._network_bulk_allocations.keys():
  network = self.cache.get_network_by_id(network_id)
  self.call_driver('bulk_reload_allocations', network)
  del self._network_bulk_allocations[network_id]
  eventlet.greenthread.sleep(self.conf.bulk_reload_interval)
  ---

  I think the problem is the "del" statement .. code like this works in
  python2 but not in python3.  However, I also wonder if there's some
  race hazard here with new IDs being appended.

  I suspect something like this might work better:

  ---
  def _reload_bulk_allocations(self):
  while True:
  deleted = self._network_bulk_allocations.copy()
  self._network_bulk_allocations = {}

  for network_id in deleted:
  network = self.cache.get_network_by_id(network_id)
  self.call_driver('bulk_reload_allocations', network)
  eventlet.greenthread.sleep(self.conf.bulk_reload_interval)
  ---

  However, even this is probably susceptible to a race hazard. Probably
  need a mutex around any update to self._network_bulk_allocations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1886969/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1890027] Re: dhcp agent crashes when deleting network if bulk reload enabled

2020-10-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/742363
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=20b138ff3118029e86f0525695160c4c7ca8b551
Submitter: Zuul
Branch:master

commit 20b138ff3118029e86f0525695160c4c7ca8b551
Author: Matt Vinall 
Date:   Thu Jul 9 21:08:21 2020 +0100

fix dhcp bulk reload exceptions

1886969 - The bulk reload code was written for python2 and caused
an exception running under python3. This change works under python3.

1890027 - There was an additional exception triggered when
deleting networks - reading the network from the cache returned 'None'
and this was not properly checked before use.

Change-Id: I4e546c0e37146b1f34d8b5e6637c407b0c04ad4d
Closes-Bug: 1886969
Closes-Bug: 1890027
Signed-off-by: Matt Vinall 


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1890027

Title:
  dhcp agent crashes when deleting network if bulk reload enabled

Status in neutron:
  Fix Released

Bug description:
  If DHCP bulk reload is enabled, I get the following crash in the bulk
  reload loop when deleting a network:

  Running command: 'neutron-dhcp-agent --config-file /etc/neutron/neutron.conf 
--config-file /etc/neutron/dhcp_agent.ini'
  + exec neutron-dhcp-agent --config-file /etc/neutron/neutron.conf 
--config-file /etc/neutron/dhcp_agent.ini
  Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 461, in 
fire_timers
  timer()
File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in 
__call__
  cb(*args, **kw)
File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 
161, in _reload_bulk_allocations
  self.call_driver('bulk_reload_allocations', network)
File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, 
in wrapper
  result = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 
167, in call_driver
  {'net': network.id, 'action': action})
  AttributeError: 'NoneType' object has no attribute 'id'
  + sudo -E kolla_set_configs

  Line numbers might differ slightly due to my patch for
  https://bugs.launchpad.net/neutron/+bug/1886969 but the issue is
  present without my change.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1890027/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1882521] Re: Failing device detachments on Focal

2020-10-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/755799
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=dd1e6d4b0cee465fd89744e306fcd25228b3f7cc
Submitter: Zuul
Branch:master

commit dd1e6d4b0cee465fd89744e306fcd25228b3f7cc
Author: Lee Yarwood 
Date:   Fri Oct 2 15:11:25 2020 +0100

libvirt: Increase incremental and max sleep time during device detach

Bug #1894804 outlines how DEVICE_DELETED events were often missing from
QEMU on Focal based OpenStack CI hosts as originally seen in bug
 #1882521. This has eventually been tracked down to some undefined QEMU
behaviour when a new device_del QMP command is received while another is
still being processed, causing the original attempt to be aborted.

We hit this race in slower OpenStack CI envs as n-cpu rather crudely
retries attempts to detach devices using the RetryDecorator from
oslo.service. The default incremental sleep time currently being tight
enough to ensure QEMU is still processing the first device_del request
on these slower CI hosts when n-cpu asks libvirt to retry the detach,
sending another device_del to QEMU hitting the above behaviour.

Additionally we have also seen the following check being hit when
testing with QEMU >= v5.0.0. This check now rejects overlapping
device_del requests in QEMU rather than aborting the original:

https://github.com/qemu/qemu/commit/cce8944cc9efab47d4bf29cfffb3470371c3541b

This change aims to avoid this situation entirely by raising the default
incremental sleep time between detach requests from 2 seconds to 10,
leaving enough time for the first attempt to complete. The overall
maximum sleep time is also increased from 30 to 60 seconds.

Future work will aim to entirely remove this retry logic with a libvirt
event driven approach, polling for the the
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying.

Finally, the cleanup of unused arguments in detach_device_with_retry is
left for a follow up change in order to keep this initial change small
enough to quickly backport.

Closes-Bug: #1882521
Related-Bug: #1894804
Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882521

Title:
  Failing device detachments on Focal

Status in Cinder:
  New
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The following tests are failing consistently when deploying devstack
  on Focal in the CI, see https://review.opendev.org/734029 for detailed
  logs:

  
tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume
  
tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume
  
tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached
  tearDownClass 
(tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest)

  Sample extract from nova-compute log:

  Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of 
exceptions occurred while invoking function: 
nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach.
 {{(pid=82495) _func 
/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}}
  Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
DEBUG oslo.service.loopingcall [-] Cannot retry 
nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach
 upon suggested exception since retry count (7) reached max retry count (7). 
{{(pid=82495) _func 
/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}}
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall [-] Dynamic interval looping call 
'oslo_service.loopingcall.RetryDecorator.__call__.._func' failed: 
nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to 
detach the device from the live config.
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall Traceback (most recent call last):
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall   File 
"/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, 
in _run_loop
  Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: 
ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
  Jun 08 08:48:24.388855 ubun

[Yahoo-eng-team] [Bug 1896496] Re: Combination of 'hw_video_ram' image metadata prop, 'hw_video:ram_max_mb' extra spec raises error

2020-10-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/753011
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=f2ca089bce842127e7d0644b38a11da9278db8ea
Submitter: Zuul
Branch:master

commit f2ca089bce842127e7d0644b38a11da9278db8ea
Author: Stephen Finucane 
Date:   Mon Sep 21 16:11:38 2020 +0100

libvirt: 'video.vram' property must be an integer

The 'vram' property of the 'video' device must be an integer else
libvirt will spit the dummy out, e.g.

  libvirt.libvirtError: XML error: cannot parse video vram '8192.0'

The division operator in Python 3 results in a float, not an integer
like in Python 2. Use the truncation division operator instead.

Change-Id: Iebf678c229da4f455459d068cafeee5f241aea1f
Signed-off-by: Stephen Finucane 
Closes-Bug: #1896496


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896496

Title:
  Combination of 'hw_video_ram' image metadata prop,
  'hw_video:ram_max_mb' extra spec raises error

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The 'hw_video_ram' image metadata property is used to configure the
  amount of memory allocated to VRAM. Using it requires specifying the
  'hw_video:ram_max_mb' extra spec or you'll get the following error:

nova.exception.RequestedVRamTooHigh: The requested amount of video
  memory 8 is higher than the maximum allowed by flavor 0.

  However, specifying these currently results in a libvirt failure.

ERROR nova.compute.manager [None ...] [instance: 
11a71ae4-e410-4856-aeab-eea6ca4784c5] Failed to build and run instance: 
libvirt.libvirtError: XML error: cannot parse video vram '8192.0'
ERROR nova.compute.manager [instance: ...] Traceback (most recent call 
last):
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/compute/manager.py", line 2333, in _build_and_run_instance
ERROR nova.compute.manager [instance: ...] accel_info=accel_info)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 3632, in spawn
ERROR nova.compute.manager [instance: ...] 
cleanup_instance_disks=created_disks)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 6527, in 
_create_domain_and_network
ERROR nova.compute.manager [instance: ...] 
cleanup_instance_disks=cleanup_instance_disks)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
ERROR nova.compute.manager [instance: ...] self.force_reraise()
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, 
self.value, self.tb)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
ERROR nova.compute.manager [instance: ...] raise value
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 6496, in 
_create_domain_and_network
ERROR nova.compute.manager [instance: ...] 
post_xml_callback=post_xml_callback)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 6425, in _create_domain
ERROR nova.compute.manager [instance: ...] guest = 
libvirt_guest.Guest.create(xml, self._host)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 127, in create
ERROR nova.compute.manager [instance: ...] encodeutils.safe_decode(xml))
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
ERROR nova.compute.manager [instance: ...] self.force_reraise()
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, 
self.value, self.tb)
ERROR nova.compute.manager [instance: ...]   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
ERROR nova.compute.manager [instance: ...] raise value
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 123, in create
ERROR nova.compute.manager [instance: ...] guest = 
host.write_instance_config(xml)
ERROR nova.compute.manager [instance: ...]   File 
"/opt/stack/nova/nova/virt/libvirt/host.py", line 1135, in write_instance_config
ERROR nova.compute.manager [instance: ...] domain = 
self.get_connection().

[Yahoo-eng-team] [Bug 1898842] Re: [OVN][QoS] "qos-fip" extension always loaded even without ML2 "qos", error while processing extensions

2020-10-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/756483
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=7e31f2ae41b4512afd2b3dd4fb72fcd16ef0a373
Submitter: Zuul
Branch:master

commit 7e31f2ae41b4512afd2b3dd4fb72fcd16ef0a373
Author: Rodolfo Alonso Hernandez 
Date:   Wed Oct 7 10:20:20 2020 +

Do not load "qos-fip" if "qos" plugin is not loaded

If QoS service plugin is not loaded, the L3 QoS extension in the OVN
L3 plugin should not be loaded neither.

Prior to this patch, the extension drivers were checked to find the
QoS extension. Although it is a misconfiguration to have the QoS
extension without loading the QoS driver, that is handled correctly
by the Neutron server, disabling the extension silently.

Closes-Bug: #1898842
Related-Bug: #1877408

Change-Id: Iea5ff76fe652ab1c04e23850b9259547c1d54365


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1898842

Title:
  [OVN][QoS] "qos-fip" extension always loaded even without ML2 "qos",
  error while processing extensions

Status in neutron:
  Fix Released

Bug description:
  Since [1][2], QoS is implemented for OVN. If ML2 QoS service plugin is
  not loaded, OVN L3 QoS extension should not be loaded neither.

  As reported in [3], without the ML2 QoS driver loaded [4], the "qos-
  fip" extension fails during the initialization: the driver is loaded
  but the extension is not supported by any plugin, raising an exception
  [5].

  [1]https://review.opendev.org/#/c/722415/
  [2]https://bugs.launchpad.net/neutron/+bug/1877408
  
[3]https://zuul.opendev.org/t/openstack/build/2e85321c072f4deebc456b75bda0fbf4/log/controller/logs/screen-q-svc.txt#551
  
[4]https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4ee/755726/4/check/kuryr-kubernetes-tempest-containerized-ovn/4ee7612/controller/logs/local_conf.txt
  [5]http://paste.openstack.org/show/798773/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1898842/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1898886] Re: Can't establish BGP session with password authentication

2020-10-10 Thread Dr. Jens Harbott
This is an issue with the os-ken library, see
https://storyboard.openstack.org/#!/story/2007910 . The issue is fixed
with the latest release of the library, make sure to upgrade.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1898886

Title:
  Can't establish BGP session with password authentication

Status in neutron:
  Invalid

Bug description:
  Creating a neutron BGP peer with password authentication leads to an
  error reported on neutron-bgp-dragent.log.

  2020-10-06 18:58:51.861 125213 DEBUG bgpspeaker.peer [-] Started peer 
Peer(ip: 100.94.2.2, asn: 65200) _run 
/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:676


  2020-10-06 18:58:51.861 125213 DEBUG bgpspeaker.peer [-] start connect loop. 
(mode: active) _on_update_connect_mode 
/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:582

  2020-10-06 18:58:52.862 125213 DEBUG bgpspeaker.peer [-] Peer 100.94.2.2 BGP 
FSM went from Idle to Connect bgp_state 
/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:236
   
  2020-10-06 18:58:52.863 125213 DEBUG bgpspeaker.peer [-] Peer(ip: 100.94.2.2, 
asn: 65200) trying to connect to ('100.94.2.2', 179) _connect_loop 
/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:1292   
   
  2020-10-06 18:58:52.863 125213 DEBUG bgpspeaker.base [-] Connect TCP called 
for 100.94.2.2:179 _connect_tcp 
/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/base.py:412


  2020-10-06 18:58:52.864 125213 ERROR os_ken.lib.hub [-] hub: uncaught 
exception: Traceback (most recent call last):   

  
File "/usr/lib/python3/dist-packages/os_ken/lib/hub.py", line 69, in 
_launch 

 
  return func(*args, **kwargs)  


  
File 
"/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py", line 
1296, in _connect_loop  


  self._connect_tcp(peer_address,   


File 
"/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/base.py", line 
422, in _connect_tcp


  sockopt.set_tcp_md5sig(sock, peer_addr[0], password)  


File 
"/usr/lib/python3/dist-packages/os_ken/lib/sockopt.py", line 71, in 
set_tcp_md5sig  

 
  impl(s, addr, key)


File 
"/usr/lib/python3/dist-packages/os_ken/lib/sockopt.py", line 38, in 
_set_tcp_md5sig_linux   

 
  sa = sockaddr.sa_in4(addr)