[Yahoo-eng-team] [Bug 1886969] Re: dhcp bulk reload fails with python3
Reviewed: https://review.opendev.org/742363 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=20b138ff3118029e86f0525695160c4c7ca8b551 Submitter: Zuul Branch:master commit 20b138ff3118029e86f0525695160c4c7ca8b551 Author: Matt Vinall Date: Thu Jul 9 21:08:21 2020 +0100 fix dhcp bulk reload exceptions 1886969 - The bulk reload code was written for python2 and caused an exception running under python3. This change works under python3. 1890027 - There was an additional exception triggered when deleting networks - reading the network from the cache returned 'None' and this was not properly checked before use. Change-Id: I4e546c0e37146b1f34d8b5e6637c407b0c04ad4d Closes-Bug: 1886969 Closes-Bug: 1890027 Signed-off-by: Matt Vinall ** Changed in: neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1886969 Title: dhcp bulk reload fails with python3 Status in neutron: Fix Released Bug description: In ussuri, configuring the neutron.conf bulk_reload_interval enables bulk reload mode. The current code looks to be incompatible with python3. With the current ussuri code, which looks unchanged on master, I get the following error in docker logs: --- Running command: 'neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini' + exec neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers timer() File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python3.6/site-packages/eventlet/semaphore.py", line 147, in _do_acquire waiter.switch() File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 154, in _reload_bulk_allocations for network_id in self._network_bulk_allocations.keys(): RuntimeError: dictionary changed size during iteration --- After this, I see no further updates to the dnsmasq hosts file. The current code looks like this: https://github.com/openstack/neutron/blob/56bb42fcbc43b619c3c07897c7de88f29158e4b8/neutron/agent/dhcp/agent.py#L157 --- def _reload_bulk_allocations(self): while True: for network_id in self._network_bulk_allocations.keys(): network = self.cache.get_network_by_id(network_id) self.call_driver('bulk_reload_allocations', network) del self._network_bulk_allocations[network_id] eventlet.greenthread.sleep(self.conf.bulk_reload_interval) --- I think the problem is the "del" statement .. code like this works in python2 but not in python3. However, I also wonder if there's some race hazard here with new IDs being appended. I suspect something like this might work better: --- def _reload_bulk_allocations(self): while True: deleted = self._network_bulk_allocations.copy() self._network_bulk_allocations = {} for network_id in deleted: network = self.cache.get_network_by_id(network_id) self.call_driver('bulk_reload_allocations', network) eventlet.greenthread.sleep(self.conf.bulk_reload_interval) --- However, even this is probably susceptible to a race hazard. Probably need a mutex around any update to self._network_bulk_allocations. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1886969/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1890027] Re: dhcp agent crashes when deleting network if bulk reload enabled
Reviewed: https://review.opendev.org/742363 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=20b138ff3118029e86f0525695160c4c7ca8b551 Submitter: Zuul Branch:master commit 20b138ff3118029e86f0525695160c4c7ca8b551 Author: Matt Vinall Date: Thu Jul 9 21:08:21 2020 +0100 fix dhcp bulk reload exceptions 1886969 - The bulk reload code was written for python2 and caused an exception running under python3. This change works under python3. 1890027 - There was an additional exception triggered when deleting networks - reading the network from the cache returned 'None' and this was not properly checked before use. Change-Id: I4e546c0e37146b1f34d8b5e6637c407b0c04ad4d Closes-Bug: 1886969 Closes-Bug: 1890027 Signed-off-by: Matt Vinall ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1890027 Title: dhcp agent crashes when deleting network if bulk reload enabled Status in neutron: Fix Released Bug description: If DHCP bulk reload is enabled, I get the following crash in the bulk reload loop when deleting a network: Running command: 'neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini' + exec neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers timer() File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 161, in _reload_bulk_allocations self.call_driver('bulk_reload_allocations', network) File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper result = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 167, in call_driver {'net': network.id, 'action': action}) AttributeError: 'NoneType' object has no attribute 'id' + sudo -E kolla_set_configs Line numbers might differ slightly due to my patch for https://bugs.launchpad.net/neutron/+bug/1886969 but the issue is present without my change. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1890027/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882521] Re: Failing device detachments on Focal
Reviewed: https://review.opendev.org/755799 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dd1e6d4b0cee465fd89744e306fcd25228b3f7cc Submitter: Zuul Branch:master commit dd1e6d4b0cee465fd89744e306fcd25228b3f7cc Author: Lee Yarwood Date: Fri Oct 2 15:11:25 2020 +0100 libvirt: Increase incremental and max sleep time during device detach Bug #1894804 outlines how DEVICE_DELETED events were often missing from QEMU on Focal based OpenStack CI hosts as originally seen in bug #1882521. This has eventually been tracked down to some undefined QEMU behaviour when a new device_del QMP command is received while another is still being processed, causing the original attempt to be aborted. We hit this race in slower OpenStack CI envs as n-cpu rather crudely retries attempts to detach devices using the RetryDecorator from oslo.service. The default incremental sleep time currently being tight enough to ensure QEMU is still processing the first device_del request on these slower CI hosts when n-cpu asks libvirt to retry the detach, sending another device_del to QEMU hitting the above behaviour. Additionally we have also seen the following check being hit when testing with QEMU >= v5.0.0. This check now rejects overlapping device_del requests in QEMU rather than aborting the original: https://github.com/qemu/qemu/commit/cce8944cc9efab47d4bf29cfffb3470371c3541b This change aims to avoid this situation entirely by raising the default incremental sleep time between detach requests from 2 seconds to 10, leaving enough time for the first attempt to complete. The overall maximum sleep time is also increased from 30 to 60 seconds. Future work will aim to entirely remove this retry logic with a libvirt event driven approach, polling for the the VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying. Finally, the cleanup of unused arguments in detach_device_with_retry is left for a follow up change in order to keep this initial change small enough to quickly backport. Closes-Bug: #1882521 Related-Bug: #1894804 Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882521 Title: Failing device detachments on Focal Status in Cinder: New Status in OpenStack Compute (nova): Fix Released Bug description: The following tests are failing consistently when deploying devstack on Focal in the CI, see https://review.opendev.org/734029 for detailed logs: tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached tearDownClass (tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest) Sample extract from nova-compute log: Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of exceptions occurred while invoking function: nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach. {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}} Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Cannot retry nova.virt.libvirt.guest.Guest.detach_device_with_retry.._do_wait_and_retry_detach upon suggested exception since retry count (7) reached max retry count (7). {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}} Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall.RetryDecorator.__call__.._func' failed: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config. Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall Traceback (most recent call last): Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, in _run_loop Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) Jun 08 08:48:24.388855 ubun
[Yahoo-eng-team] [Bug 1896496] Re: Combination of 'hw_video_ram' image metadata prop, 'hw_video:ram_max_mb' extra spec raises error
Reviewed: https://review.opendev.org/753011 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f2ca089bce842127e7d0644b38a11da9278db8ea Submitter: Zuul Branch:master commit f2ca089bce842127e7d0644b38a11da9278db8ea Author: Stephen Finucane Date: Mon Sep 21 16:11:38 2020 +0100 libvirt: 'video.vram' property must be an integer The 'vram' property of the 'video' device must be an integer else libvirt will spit the dummy out, e.g. libvirt.libvirtError: XML error: cannot parse video vram '8192.0' The division operator in Python 3 results in a float, not an integer like in Python 2. Use the truncation division operator instead. Change-Id: Iebf678c229da4f455459d068cafeee5f241aea1f Signed-off-by: Stephen Finucane Closes-Bug: #1896496 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896496 Title: Combination of 'hw_video_ram' image metadata prop, 'hw_video:ram_max_mb' extra spec raises error Status in OpenStack Compute (nova): Fix Released Bug description: The 'hw_video_ram' image metadata property is used to configure the amount of memory allocated to VRAM. Using it requires specifying the 'hw_video:ram_max_mb' extra spec or you'll get the following error: nova.exception.RequestedVRamTooHigh: The requested amount of video memory 8 is higher than the maximum allowed by flavor 0. However, specifying these currently results in a libvirt failure. ERROR nova.compute.manager [None ...] [instance: 11a71ae4-e410-4856-aeab-eea6ca4784c5] Failed to build and run instance: libvirt.libvirtError: XML error: cannot parse video vram '8192.0' ERROR nova.compute.manager [instance: ...] Traceback (most recent call last): ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/compute/manager.py", line 2333, in _build_and_run_instance ERROR nova.compute.manager [instance: ...] accel_info=accel_info) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3632, in spawn ERROR nova.compute.manager [instance: ...] cleanup_instance_disks=created_disks) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6527, in _create_domain_and_network ERROR nova.compute.manager [instance: ...] cleanup_instance_disks=cleanup_instance_disks) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ ERROR nova.compute.manager [instance: ...] self.force_reraise() ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, self.value, self.tb) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise ERROR nova.compute.manager [instance: ...] raise value ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6496, in _create_domain_and_network ERROR nova.compute.manager [instance: ...] post_xml_callback=post_xml_callback) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6425, in _create_domain ERROR nova.compute.manager [instance: ...] guest = libvirt_guest.Guest.create(xml, self._host) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 127, in create ERROR nova.compute.manager [instance: ...] encodeutils.safe_decode(xml)) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ ERROR nova.compute.manager [instance: ...] self.force_reraise() ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, self.value, self.tb) ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise ERROR nova.compute.manager [instance: ...] raise value ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 123, in create ERROR nova.compute.manager [instance: ...] guest = host.write_instance_config(xml) ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1135, in write_instance_config ERROR nova.compute.manager [instance: ...] domain = self.get_connection().
[Yahoo-eng-team] [Bug 1898842] Re: [OVN][QoS] "qos-fip" extension always loaded even without ML2 "qos", error while processing extensions
Reviewed: https://review.opendev.org/756483 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7e31f2ae41b4512afd2b3dd4fb72fcd16ef0a373 Submitter: Zuul Branch:master commit 7e31f2ae41b4512afd2b3dd4fb72fcd16ef0a373 Author: Rodolfo Alonso Hernandez Date: Wed Oct 7 10:20:20 2020 + Do not load "qos-fip" if "qos" plugin is not loaded If QoS service plugin is not loaded, the L3 QoS extension in the OVN L3 plugin should not be loaded neither. Prior to this patch, the extension drivers were checked to find the QoS extension. Although it is a misconfiguration to have the QoS extension without loading the QoS driver, that is handled correctly by the Neutron server, disabling the extension silently. Closes-Bug: #1898842 Related-Bug: #1877408 Change-Id: Iea5ff76fe652ab1c04e23850b9259547c1d54365 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1898842 Title: [OVN][QoS] "qos-fip" extension always loaded even without ML2 "qos", error while processing extensions Status in neutron: Fix Released Bug description: Since [1][2], QoS is implemented for OVN. If ML2 QoS service plugin is not loaded, OVN L3 QoS extension should not be loaded neither. As reported in [3], without the ML2 QoS driver loaded [4], the "qos- fip" extension fails during the initialization: the driver is loaded but the extension is not supported by any plugin, raising an exception [5]. [1]https://review.opendev.org/#/c/722415/ [2]https://bugs.launchpad.net/neutron/+bug/1877408 [3]https://zuul.opendev.org/t/openstack/build/2e85321c072f4deebc456b75bda0fbf4/log/controller/logs/screen-q-svc.txt#551 [4]https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4ee/755726/4/check/kuryr-kubernetes-tempest-containerized-ovn/4ee7612/controller/logs/local_conf.txt [5]http://paste.openstack.org/show/798773/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1898842/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1898886] Re: Can't establish BGP session with password authentication
This is an issue with the os-ken library, see https://storyboard.openstack.org/#!/story/2007910 . The issue is fixed with the latest release of the library, make sure to upgrade. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1898886 Title: Can't establish BGP session with password authentication Status in neutron: Invalid Bug description: Creating a neutron BGP peer with password authentication leads to an error reported on neutron-bgp-dragent.log. 2020-10-06 18:58:51.861 125213 DEBUG bgpspeaker.peer [-] Started peer Peer(ip: 100.94.2.2, asn: 65200) _run /usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:676 2020-10-06 18:58:51.861 125213 DEBUG bgpspeaker.peer [-] start connect loop. (mode: active) _on_update_connect_mode /usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:582 2020-10-06 18:58:52.862 125213 DEBUG bgpspeaker.peer [-] Peer 100.94.2.2 BGP FSM went from Idle to Connect bgp_state /usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:236 2020-10-06 18:58:52.863 125213 DEBUG bgpspeaker.peer [-] Peer(ip: 100.94.2.2, asn: 65200) trying to connect to ('100.94.2.2', 179) _connect_loop /usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py:1292 2020-10-06 18:58:52.863 125213 DEBUG bgpspeaker.base [-] Connect TCP called for 100.94.2.2:179 _connect_tcp /usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/base.py:412 2020-10-06 18:58:52.864 125213 ERROR os_ken.lib.hub [-] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/os_ken/lib/hub.py", line 69, in _launch return func(*args, **kwargs) File "/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/peer.py", line 1296, in _connect_loop self._connect_tcp(peer_address, File "/usr/lib/python3/dist-packages/os_ken/services/protocols/bgp/base.py", line 422, in _connect_tcp sockopt.set_tcp_md5sig(sock, peer_addr[0], password) File "/usr/lib/python3/dist-packages/os_ken/lib/sockopt.py", line 71, in set_tcp_md5sig impl(s, addr, key) File "/usr/lib/python3/dist-packages/os_ken/lib/sockopt.py", line 38, in _set_tcp_md5sig_linux sa = sockaddr.sa_in4(addr)