[Yahoo-eng-team] [Bug 1927868] Re: vRouter not working after update to 16.3.1
This bug was fixed in the package neutron - 2:18.1.0+git2021072117.147830620f-0ubuntu2 --- neutron (2:18.1.0+git2021072117.147830620f-0ubuntu2) impish; urgency=medium * d/p/revert-l3-ha-retry-when-setting-ha-router-gw-status.patch: Revert upstream patch that introduced regression that prevented full restore of HA routers on restart of L3 agent (LP: #1927868). -- Corey Bryant Wed, 28 Jul 2021 16:40:07 -0400 ** Changed in: neutron (Ubuntu Impish) Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1927868 Title: vRouter not working after update to 16.3.1 Status in Ubuntu Cloud Archive: Fix Committed Status in Ubuntu Cloud Archive train series: Fix Committed Status in Ubuntu Cloud Archive ussuri series: Triaged Status in Ubuntu Cloud Archive victoria series: Fix Committed Status in Ubuntu Cloud Archive wallaby series: Triaged Status in Ubuntu Cloud Archive xena series: Fix Committed Status in neutron: New Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Focal: Fix Committed Status in neutron source package in Hirsute: Fix Committed Status in neutron source package in Impish: Fix Released Bug description: We run a juju managed Openstack Ussuri on Bionic. After updating neutron packages from 16.3.0 to 16.3.1 all virtual routers stopped working. It seems that most (not all) namespaces are created but have only the lo interface and sometime the ha-XYZ interface in DOWN state. The underlying tap interfaces are also in down. neutron-l3-agent has many logs similar to the following: 2021-05-08 15:01:45.286 39411 ERROR neutron.agent.l3.ha_router [-] Gateway interface for router 02945b59-639b-41be-8237-3b7933b4e32d was not set up; router will not work properly and journal logs report at around the same time May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.765 18596 INFO neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 62.62.62.62 on qg-5a6efe8c-6b in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d: Exit code: 2; Stdin: ; Stdout: Interface "qg-5a6efe8c-6b" is down May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.767 18596 INFO neutron.agent.linux.ip_lib [-] Interface qg-5a6efe8c-6b or address 62.62.62.62 in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d was deleted concurrently The neutron packages installed are: ii neutron-common 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - common ii neutron-dhcp-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - DHCP agent ii neutron-l3-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - l3 agent ii neutron-metadata-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - metadata agent ii neutron-metering-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - metering agent ii neutron-openvswitch-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent ii python3-neutron2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - Python library ii python3-neutron-lib2.3.0-0ubuntu1~cloud0 all Neutron shared routines and utilities - Python 3.x ii python3-neutronclient 1:7.1.1-0ubuntu1~cloud0 all client API library for Neutron - Python 3.x Downgrading to 16.3.0 resolves the issues. = Ubuntu SRU details: [Impact] See above. [Test Case] Deploy openstack with l3ha and create several HA routers, the number required varies per environment. It is probably best to deploy a known bad version of the package, ensure it is failing, upgrade to the version in proposed, and re-test several times to confirm it is fixed. Restarting neutron-l3-agent should expect all HA Routers are restored. [Regression Potential] This change is fixing a regression by reverting a patch that was introduced in a stable point release of neutron. To manage notifications about this bug go to:
[Yahoo-eng-team] [Bug 1896734] Re: A privsep daemon spawned by neutron-openvswitch-agent hangs when debug logging is enabled (large number of registered NICs) - an RPC response is too large for msgpac
The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release ** Changed in: python-oslo.privsep (Ubuntu Groovy) Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896734 Title: A privsep daemon spawned by neutron-openvswitch-agent hangs when debug logging is enabled (large number of registered NICs) - an RPC response is too large for msgpack Status in OpenStack neutron-openvswitch charm: Invalid Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive ussuri series: Fix Released Status in Ubuntu Cloud Archive victoria series: Fix Released Status in neutron: Fix Released Status in oslo.privsep: New Status in neutron package in Ubuntu: Fix Released Status in python-oslo.privsep package in Ubuntu: New Status in neutron source package in Focal: Fix Released Status in python-oslo.privsep source package in Focal: New Status in neutron source package in Groovy: Fix Released Status in python-oslo.privsep source package in Groovy: Won't Fix Status in neutron source package in Hirsute: Fix Released Status in python-oslo.privsep source package in Hirsute: New Bug description: [Impact] When there is a large amount of netdevs registered in the kernel and debug logging is enabled, neutron-openvswitch-agent and the privsep daemon spawned by it hang since the RPC call result sent by the privsep daemon over a unix socket exceeds the message sizes that the msgpack library can handle. The impact of this is that enabling debug logging on the cloud completely stalls neutron-openvswitch-agents and makes them "dead" from the Neutron server perspective. The issue is summarized in detail in comment #5 https://bugs.launchpad.net/oslo.privsep/+bug/1896734/comments/5 [Test Plan] * deploy Openstack Train/Ussuri/Victoria * need at least one compute host * enable neutron debug logging * create a load of interfaces on your compute host to create a large 'ip addr show' output * for ((i=0;i<400;i++)); do ip tuntap add mode tap tap-`uuidgen| cut -c1-11`; done * create a single vm * add floating ip * ping fip * create 20 ports and attach them to the vm * for ((i=0;i<20;i++)); do id=`uuidgen`; openstack port create --network private --security-group __SG__ X-$id; openstack server add port __VM__ X-$id; done * attaching ports should not result in errors [Where problems could occur] No problems anticipated this patchset. When there is a large amount of netdevs registered in the kernel and debug logging is enabled, neutron-openvswitch-agent and the privsep daemon spawned by it hang since the RPC call result sent by the privsep daemon over a unix socket exceeds the message sizes that the msgpack library can handle. The impact of this is that enabling debug logging on the cloud completely stalls neutron-openvswitch-agents and makes them "dead" from the Neutron server perspective. The issue is summarized in detail in comment #5 https://bugs.launchpad.net/oslo.privsep/+bug/1896734/comments/5 Old Description While trying to debug a different issue, I encountered a situation where privsep hangs in the process of handling a request from neutron- openvswitch-agent when debug logging is enabled (juju debug-log neutron-openvswitch=true): https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1895652/comments/11 https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1895652/comments/12 The issue gets reproduced reliably in the environment where I encountered it on all units. As a result, neutron-openvswitch-agent services hang while waiting for a response from the privsep daemon and do not progress past basic initialization. They never post any state back to the Neutron server and thus are marked dead by it. The processes though are shown as "active (running)" by systemd which adds to the confusion since they do indeed start from the systemd's perspective. systemctl --no-pager status neutron-openvswitch-agent.service ● neutron-openvswitch-agent.service - Openstack Neutron Open vSwitch Plugin Agent Loaded: loaded (/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2020-09-23 08:28:41 UTC; 25min ago Main PID: 247772 (/usr/bin/python) Tasks: 4 (limit: 9830) CGroup: /system.slice/neutron-openvswitch-agent.service ├─247772 /usr/bin/python3 /usr/bin/neutron-openvswitch-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/openvswitch_…og └─248272 /usr/bin/python3 /usr/bin/privsep-helper
[Yahoo-eng-team] [Bug 1906266] Re: After upgrade: "libvirt.libvirtError: Requested operation is not valid: format of backing image %s of image %s was not specified"
The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release ** Changed in: nova (Ubuntu Groovy) Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1906266 Title: After upgrade: "libvirt.libvirtError: Requested operation is not valid: format of backing image %s of image %s was not specified" Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive ussuri series: Fix Committed Status in OpenStack Compute (nova): Won't Fix Status in libvirt package in Ubuntu: Fix Released Status in nova package in Ubuntu: New Status in libvirt source package in Focal: Fix Released Status in nova source package in Focal: New Status in libvirt source package in Groovy: Fix Released Status in nova source package in Groovy: Won't Fix Bug description: [Impact] * New libvirt got more strict in regard to file format specification. While this is generally the right approach it causes some issues for upgraders that have old image chains now failing. * Upstream has added code to relax those checks under a set of conditions which will allow to go forward with stricter conditions as planned but at the same time not break/block upgrades. [Test Plan] * Thanks to Brett Milford for sharing his test steps for this sudo apt-get update sudo apt-get install libvirt-daemon-system cloud-image-utils virtinst -y IMG="focal-server-cloudimg-amd64.img" IMG_PATH="/var/lib/libvirt/images/base/$IMG" INSTANCE_NAME=testinst [ -f $IMG_PATH ] || { sudo mkdir -p /var/lib/libvirt/images/base sudo wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img \ -O $IMG_PATH } sudo mkdir -p /var/lib/libvirt/images/$INSTANCE_NAME sudo qemu-img convert -O raw $IMG_PATH ${IMG_PATH%.*} sudo qemu-img create -f qcow2 -o backing_file=${IMG_PATH%.*} /var/lib/libvirt/images/$INSTANCE_NAME/root.img sudo qemu-img resize /var/lib/libvirt/images/$INSTANCE_NAME/root.img 5G virt-install --connect qemu:///system --name $INSTANCE_NAME --cpu host --os-type linux --os-variant generic --graphics vnc --console pty,target_type=serial --disk path=/var/lib/libvirt/images/$INSTANCE_NAME/root.img,bus=virtio,format=qcow2 --network default,model=virtio --noautoconsole --vcpus 1 --memory 1024 --import [Where problems could occur] * Of the many things that qemu/libvirt do this changes only the format probing. So issues (hopefully not) would be expected to appear mostly around complex scenarios of image files. We've had a look at image files and image file chains, and so far all were good. But there are more obscure (and not supported) cases like image backed by real-disk that might misbehave. But still it would fix Focal to be the outlier as the past was ok (didn't care) and the future (relaxed check) and only focal is left broken in between. [Other Info] * A lot has changes in that area, but instead of pulling in a vast set of changes a smaller set was identified to suite the SRU needs. It was so far found not found regressing anything and OTOH fixed the issue (tested form PPA) for affected people. In a site upgraded to Ussuri we are getting faults starting instances 2020-11-30 13:41:40.586 232871 ERROR oslo_messaging.rpc.server libvirt.libvirtError: Requested operation is not valid: format of backing image '/var/lib/nova/instances/_base/xxx' of image '/var/lib/nova/instances/xxx' was not specified in the image metadata (See https://libvirt.org/kbase/backing_chains.html for troubleshooting) Bug #1864020 reports similar symptoms, where due to an upstream change in Libvirt v6.0.0+ images need the backing format specified. The fix for Bug #1864020 handles the case for new instances. However, for upgraded instances we're hitting the same problem, as those still don't have backing format specified. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1906266/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1927868] Re: vRouter not working after update to 16.3.1
** Also affects: neutron (Ubuntu Hirsute) Importance: Undecided Status: New ** Also affects: neutron (Ubuntu Impish) Importance: Critical Status: Triaged ** Also affects: neutron (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: neutron (Ubuntu Focal) Status: New => Triaged ** Changed in: neutron (Ubuntu Hirsute) Importance: Undecided => Critical ** Changed in: neutron (Ubuntu Focal) Importance: Undecided => Critical ** Changed in: neutron (Ubuntu Hirsute) Status: New => Triaged ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/wallaby Importance: Undecided Status: New ** Also affects: cloud-archive/victoria Importance: Undecided Status: New ** Also affects: cloud-archive/train Importance: Undecided Status: New ** Also affects: cloud-archive/ussuri Importance: Undecided Status: New ** Also affects: cloud-archive/xena Importance: Undecided Status: New ** Changed in: cloud-archive/xena Importance: Undecided => Critical ** Changed in: cloud-archive/xena Status: New => Triaged ** Changed in: cloud-archive/wallaby Importance: Undecided => Critical ** Changed in: cloud-archive/wallaby Status: New => Triaged ** Changed in: cloud-archive/victoria Importance: Undecided => Critical ** Changed in: cloud-archive/victoria Status: New => Triaged ** Changed in: cloud-archive/ussuri Importance: Undecided => Critical ** Changed in: cloud-archive/ussuri Status: New => Triaged ** Changed in: cloud-archive/train Importance: Undecided => Critical ** Changed in: cloud-archive/train Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1927868 Title: vRouter not working after update to 16.3.1 Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive train series: Triaged Status in Ubuntu Cloud Archive ussuri series: Triaged Status in Ubuntu Cloud Archive victoria series: Triaged Status in Ubuntu Cloud Archive wallaby series: Triaged Status in Ubuntu Cloud Archive xena series: Triaged Status in neutron: New Status in neutron package in Ubuntu: Triaged Status in neutron source package in Focal: Triaged Status in neutron source package in Hirsute: Triaged Status in neutron source package in Impish: Triaged Bug description: We run a juju managed Openstack Ussuri on Bionic. After updating neutron packages from 16.3.0 to 16.3.1 all virtual routers stopped working. It seems that most (not all) namespaces are created but have only the lo interface and sometime the ha-XYZ interface in DOWN state. The underlying tap interfaces are also in down. neutron-l3-agent has many logs similar to the following: 2021-05-08 15:01:45.286 39411 ERROR neutron.agent.l3.ha_router [-] Gateway interface for router 02945b59-639b-41be-8237-3b7933b4e32d was not set up; router will not work properly and journal logs report at around the same time May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.765 18596 INFO neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 62.62.62.62 on qg-5a6efe8c-6b in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d: Exit code: 2; Stdin: ; Stdout: Interface "qg-5a6efe8c-6b" is down May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.767 18596 INFO neutron.agent.linux.ip_lib [-] Interface qg-5a6efe8c-6b or address 62.62.62.62 in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d was deleted concurrently The neutron packages installed are: ii neutron-common 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - common ii neutron-dhcp-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - DHCP agent ii neutron-l3-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - l3 agent ii neutron-metadata-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - metadata agent ii neutron-metering-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - metering agent ii neutron-openvswitch-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent ii python3-neutron
[Yahoo-eng-team] [Bug 1938344] [NEW] Setting snap-store-assertions in a model config causes cloud-init to fail
Public bug reported: When the juju model config includes a `snap-store-assertions`, the cloud-init.service runs into problem when trying to contact snapd's socket. In this setup, cloud-ctrl01 runs a Juju controller in a LXD container and cloud-vm02 is a VM created by MAAS. Here is how to reproduce the issue: ubuntu@cloud-ctrl01:~$ juju model-config ... snap-store-assertions model|- type: account-key authority-id: canonical revision: 2 ... snap-store-proxy modeldI5E5ZV6U3wOc919eLmZ0MtOxAyxxTIP snap-store-proxy-url default "" ... ubuntu@cloud-ctrl01:~$ juju add-machine created machine 2 ubuntu@cloud-ctrl01:~$ juju ssh 2 ... Inside "machine 2", cloud-init's journal output: ubuntu@cloud-vm02:~$ journalctl -u cloud-init.service | cat -- Logs begin at Wed 2021-07-28 21:06:26 UTC, end at Wed 2021-07-28 21:17:01 UTC. -- Jul 28 21:06:31 ubuntu systemd[1]: Starting Initial cloud-init job (metadata service crawler)... Jul 28 21:06:33 cloud-vm02 cloud-init[893]: Cloud-init v. 21.2-3-g899bfaa9-0ubuntu2~20.04.1 running 'init' at Wed, 28 Jul 2021 21:06:32 +. Up 10.23 seconds. Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: ++Net device info+++ Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: ++--+-+---++---+ Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address| ... Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: +---+---+-+---+---+ Jul 28 21:06:33 cloud-vm02 cloud-init[893]: error: cannot assert: cannot communicate with server: Post http://localhost/v2/assertions: dial unix /run/snapd.socket: connect: no such file or directory Jul 28 21:06:33 cloud-vm02 cloud-init[893]: error: cannot communicate with server: Put http://localhost/v2/snaps/core/conf: dial unix /run/snapd.socket: connect: no such file or directory Jul 28 21:06:33 cloud-vm02 cloud-init[893]: 2021-07-28 21:06:33,810 - util.py[WARNING]: Failed to run bootcmd module bootcmd Jul 28 21:06:33 cloud-vm02 cloud-init[893]: 2021-07-28 21:06:33,822 - util.py[WARNING]: Running module bootcmd () failed Jul 28 21:06:34 cloud-vm02 useradd[989]: new group: name=ubuntu, GID=1000 ... Jul 28 21:06:35 cloud-vm02 systemd[1]: cloud-init.service: Main process exited, code=exited, status=1/FAILURE Jul 28 21:06:35 cloud-vm02 systemd[1]: cloud-init.service: Failed with result 'exit-code'. Jul 28 21:06:35 cloud-vm02 systemd[1]: Failed to start Initial cloud-init job (metadata service crawler). Removing the snap-store-assertions/snap-store-proxy configs from the model make cloud-init work again. FYI, my naive attempt at adding "After=sockets.target" to the cloud- init.service didn't work :/ Additional information: ubuntu@cloud-vm02:~$ /var/lib/juju/tools/machine-2/jujud version 2.9.9-ubuntu-amd64 $ lsb_release -rd Description:Ubuntu 20.04.2 LTS Release:20.04 ubuntu@cloud-vm02:~$ apt-cache policy cloud-init cloud-init: Installed: 21.2-3-g899bfaa9-0ubuntu2~20.04.1 Candidate: 21.2-3-g899bfaa9-0ubuntu2~20.04.1 Version table: *** 21.2-3-g899bfaa9-0ubuntu2~20.04.1 500 500 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 100 /var/lib/dpkg/status 20.1-10-g71af48df-0ubuntu5 500 500 http://us.archive.ubuntu.com/ubuntu focal/main amd64 Packages ubuntu@cloud-ctrl01:~$ juju version 2.9.9-ubuntu-amd64 ** Affects: cloud-init Importance: Undecided Status: New ** Affects: juju Importance: Undecided Status: New ** Also affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1938344 Title: Setting snap-store-assertions in a model config causes cloud-init to fail Status in cloud-init: New Status in juju: New Bug description: When the juju model config includes a `snap-store-assertions`, the cloud-init.service runs into problem when trying to contact snapd's socket. In this setup, cloud-ctrl01 runs a Juju controller in a LXD container and cloud-vm02 is a VM created by MAAS. Here is how to reproduce the issue: ubuntu@cloud-ctrl01:~$ juju model-config ... snap-store-assertions model|- type: account-key authority-id: canonical revision: 2 ... snap-store-proxy modeldI5E5ZV6U3wOc919eLmZ0MtOxAyxxTIP snap-store-proxy-url default "" ... ubuntu@cloud-ctrl01:~$ juju add-machine created machine 2 ubuntu@cloud-ctrl01:~$ juju ssh 2 ... Inside "machine 2", cloud-init's journal output: ubuntu@cloud-vm02:~$ journalctl -u cloud-init.service | cat -- Logs begin at Wed
[Yahoo-eng-team] [Bug 1563069] Re: Centralize Configuration Options
** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1563069 Title: Centralize Configuration Options Status in neutron: Fix Released Bug description: [Overview] Refactor Neutron configuration options to be in one place 'neutron/conf' similar to the Nova implementation found here: http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/centralize-config-options.html This would allow for centralization of all configuration options and provide an easy way to import and gain access to the wide breadth of configuration options available to Neutron. [Proposal] 1. Introduce a new package: neutron/conf Neutron Quotas Example: 2. Group modules logically under new package: 2a. Example: options from neutron/quotas 2b. Move to neutron/conf/quotas/common.py 2c. Aggregate quota options in __init__.py 4. Import neutron.conf.quotas for usage Neutron DB Example /w Agent Options: 2. Group modules logically under new package: 2a. Example: options from neutron/db/agents_db.py 2b. Move to neutron/conf/db/agents.py 2c. Aggregate db options in __init__.py 4. Import neutron.conf.db for usage Neutron DB Example /w Migration CLI: 2. Group modules logically under new package: 2a. Example: options from neutron/db/migrations/cli.py 2b. Move to neutron/conf/db/migrations_cli.py 2c. Migrations CLI does not get aggregated in __init__.py 4. Import neutron.conf.db.migrations_cli ** neutron.opts list options methods all get moved to neutron/conf as well in their respective modules and setup.cfg is modified for this adjustment. [Benefits] - As a developer I will find all config options in one place and will add further config options to that central place. - End user is not affected by this change. [Related information] [1] Nova Implementation: http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/centralize-config-options.html [2] Cross Project Spec: https://review.openstack.org/#/c/295543 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1563069/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1642770] Re: Security group code is doing unnecessary work removing chains
** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1642770 Title: Security group code is doing unnecessary work removing chains Status in neutron: Fix Released Bug description: The security group code is generating a lot of these messages when trying to boot VMs: Attempted to remove chain sg-chain which does not exist There's also ones specific to the port. It seems to be calling remove_chain(), even when it's a new port and it's initially setting up it's filter. I dropped a print_stack() in remove_chain() and see tracebacks like this: Prepare port filter for e8f41910-c24e-41f1-ae7f-355e9bb1d18a _apply_port_filter /opt/stack/neutron/neutron/agent/securitygroups_rpc.py:163 Preparing device (e8f41910-c24e-41f1-ae7f-355e9bb1d18a) filter prepare_port_filter /opt/stack/neutron/neutron/agent/linux/iptables_firewall.py:170 Attempted to remove chain sg-chain which does not exist remove_chain /opt/stack/neutron/neutron/agent/linux/iptables_manager.py:177 File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main result = function(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/ryu/lib/hub.py", line 54, in _launch return func(*args, **kwargs) File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 37, in agent_main_wrapper ovs_agent.main(bridge_classes) File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2177, in main agent.daemon_loop() File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 154, in wrapper return f(*args, **kwargs) File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2098, in daemon_loop self.rpc_loop(polling_manager=pm) File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 154, in wrapper return f(*args, **kwargs) File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2049, in rpc_loop port_info, ovs_restarted) File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 154, in wrapper return f(*args, **kwargs) File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1657, in process_network_ports port_info.get('updated', set())) File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 266, in setup_port_filters self.prepare_devices_filter(new_devices) File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 131, in decorated_function *args, **kwargs) File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 139, in prepare_devices_filter self._apply_port_filter(device_ids) File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 164, in _apply_port_filter self.firewall.prepare_port_filter(device) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/opt/stack/neutron/neutron/agent/firewall.py", line 139, in defer_apply self.filter_defer_apply_off() File "/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py", line 838, in filter_defer_apply_off self._pre_defer_unfiltered_ports) File "/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py", line 248, in _remove_chains_apply self._remove_chain_by_name_v4v6(SG_CHAIN) File "/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py", line 279, in _remove_chain_by_name_v4v6 self.iptables.ipv4['filter'].remove_chain(chain_name) File "/opt/stack/neutron/neutron/agent/linux/iptables_manager.py", line 178, in remove_chain traceback.print_stack() Looking at the code, there's a couple of things that are interesting: 1) prepare_port_filter() calls self._remove_chains() - why? 2) in the "defer" case above we always do _remove_chains_apply()/_setup_chains_apply() - is there some way to skip the remove? This also led to us timing how long it's taking in the remove_chain() code, since that's where the message is getting printed. As the number of ports and rules grow, it's spending more time spinning through chains and rules. It looks like that can be helped with a small code change, which is just fallout from the real problem. I'll send that out since it helps a little. More work still required. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1642770/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help :
[Yahoo-eng-team] [Bug 1816485] Re: [rfe] change neutron process names to match their role
** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1816485 Title: [rfe] change neutron process names to match their role Status in neutron: Fix Released Bug description: See the commit message description here: https://review.openstack.org/#/c/637019/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1816485/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1856600] Re: Unit test jobs are failing with ImportError: cannot import name 'engine' from 'flake8'
** Changed in: neutron Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1856600 Title: Unit test jobs are failing with ImportError: cannot import name 'engine' from 'flake8' Status in neutron: Invalid Bug description: Neutron unit test CI jobs are failing with the following error: = Failures during discovery = --- import errors --- Failed to import test module: neutron.tests.unit.hacking.test_checks Traceback (most recent call last): File "/usr/lib/python3.7/unittest/loader.py", line 436, in _find_test_path module = self._get_module_from_name(name) File "/usr/lib/python3.7/unittest/loader.py", line 377, in _get_module_from_name __import__(name) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/unit/hacking/test_checks.py", line 15, in from flake8 import engine ImportError: cannot import name 'engine' from 'flake8' (/home/zuul/src/opendev.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/flake8/__init__.py) Example: https://e859f0a6f5995c9142c5-a232ce3bdc50fca913ceba9a1c600c62.ssl.cf5.rackcdn.com/572767/23/check/openstack- tox-py37/1d036e0/job-output.txt Looks like flake8 no longer has an engine but they had kept the api for backward compatibility [1], perhaps they broke it somehow. [1] based on comment in https://gitlab.com/pycqa/flake8/blob/master/src/flake8/api/legacy.py#L3 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1856600/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938326] [NEW] Migration gets stuck at pre-migrating status if source compute node is down but maintenance enabled
Public bug reported: Description === Currently nova rejects migration(resize) if the source compute node is down but not if the service has previously been disabled. Steps to reproduce == 1. Create an instance 2. Shutdown the compute node where the instance is started 3. Enable maintenance of the nova-compute service on the source compute node 4. Migrate the instance Expected result === Migration is rejected Actual result = Migration is accepted but gets stuck in pre-migrating status Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ If this is from a distro please provide $ dpkg -l | grep nova or $ rpm -ql | grep nova If this is from git, please provide $ git log -1 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) Logs & Configs == https://bugzilla.redhat.com/show_bug.cgi?id=1985712#c0 ** Affects: nova Importance: Undecided Assignee: Lee Yarwood (lyarwood) Status: New ** Tags: api resize -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1938326 Title: Migration gets stuck at pre-migrating status if source compute node is down but maintenance enabled Status in OpenStack Compute (nova): New Bug description: Description === Currently nova rejects migration(resize) if the source compute node is down but not if the service has previously been disabled. Steps to reproduce == 1. Create an instance 2. Shutdown the compute node where the instance is started 3. Enable maintenance of the nova-compute service on the source compute node 4. Migrate the instance Expected result === Migration is rejected Actual result = Migration is accepted but gets stuck in pre-migrating status Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ If this is from a distro please provide $ dpkg -l | grep nova or $ rpm -ql | grep nova If this is from git, please provide $ git log -1 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) Logs & Configs == https://bugzilla.redhat.com/show_bug.cgi?id=1985712#c0 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1938326/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938323] [NEW] [Queens] tokens generated with nocatalog are not usable in some requests
Public bug reported: NOTE - this is happening only on Queens, and possibly earlier, and is already silently fixed in Rocky as part of the major refactor of token model. I am posing this issue here so that anyone still running Queens has a reference and probably a patch to apply. In Queens release, if I create a token using a nocatalog option: curl -X POST /v3/auth/tokens?nocatalog and then use this token to e.g. list servers with details curl -X GET /v2.1/servers/detail I get 500 error from nova, with nova api logs containing ERROR nova.api.openstack EmptyCatalog: The service catalog is empty. When repeating the same request with the same token after 5-10 minutes, the token starts to work. Tokens generated with catalog are working as well. AFAIU this goes down to token caching - in Queens tokens are cached/memoized with catalog - or without it if token was requested w/o catalog. Then on token validation, the token validation response takes the token - with or without catalog - from cache and returns it to caller with minimal processing - e.g. removing catalog if token validation call asked for it. It does not however ensures that the catalog is present otherwise. This breaks some other services like Nova, that expects the catalog to be present in the request context constructed from the keystonemiddleware results. Nova needs this for example to make API requests to other services - exactly what happens in server/details call where it has to ask Neutron for some network info about instances. After the cache is invalidated, the catalog starts to be generated for token validation response anew, and everything starts to work as expected. ** Affects: keystone Importance: Undecided Assignee: Pavlo Shchelokovskyy (pshchelo) Status: In Progress ** Changed in: keystone Status: New => In Progress ** Changed in: keystone Assignee: (unassigned) => Pavlo Shchelokovskyy (pshchelo) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1938323 Title: [Queens] tokens generated with nocatalog are not usable in some requests Status in OpenStack Identity (keystone): In Progress Bug description: NOTE - this is happening only on Queens, and possibly earlier, and is already silently fixed in Rocky as part of the major refactor of token model. I am posing this issue here so that anyone still running Queens has a reference and probably a patch to apply. In Queens release, if I create a token using a nocatalog option: curl -X POST /v3/auth/tokens?nocatalog and then use this token to e.g. list servers with details curl -X GET /v2.1/servers/detail I get 500 error from nova, with nova api logs containing ERROR nova.api.openstack EmptyCatalog: The service catalog is empty. When repeating the same request with the same token after 5-10 minutes, the token starts to work. Tokens generated with catalog are working as well. AFAIU this goes down to token caching - in Queens tokens are cached/memoized with catalog - or without it if token was requested w/o catalog. Then on token validation, the token validation response takes the token - with or without catalog - from cache and returns it to caller with minimal processing - e.g. removing catalog if token validation call asked for it. It does not however ensures that the catalog is present otherwise. This breaks some other services like Nova, that expects the catalog to be present in the request context constructed from the keystonemiddleware results. Nova needs this for example to make API requests to other services - exactly what happens in server/details call where it has to ask Neutron for some network info about instances. After the cache is invalidated, the catalog starts to be generated for token validation response anew, and everything starts to work as expected. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1938323/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1931440] Re: Unable to use multiattach volume as boot for new server
Reviewed: https://review.opendev.org/c/openstack/horizon/+/798730 Committed: https://opendev.org/openstack/horizon/commit/64fe0abb653950c85d455dedd09ef42856c6b07b Submitter: "Zuul (22348)" Branch:master commit 64fe0abb653950c85d455dedd09ef42856c6b07b Author: manchandavishal Date: Tue Jun 29 23:38:41 2021 +0530 Fix Unable to use multiattach volume as boot for new server If we try to create a new server from a bootable volume that supports multiattach, it will fail to create with an error message that ``multiattach volumes are only supported starting with compute API version 2.60``. This patch fixes the issue. Closes-Bug: #1931440 Change-Id: Ic8330b947b1a733f70c3bdad8b3493f20a2f26fb ** Changed in: horizon Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1931440 Title: Unable to use multiattach volume as boot for new server Status in OpenStack Dashboard (Horizon): Fix Released Bug description: I'm trying to create a new server using the dashboard and use an existing volume for boot. That volume was created previously (from image) and has a type with multiattach property enabled. The operation fails with the dashboard showing an error message baloon with: Multiattach volumes are only supported starting with compute API version 2.60 (HTTP 400 req: xxx) If I execute the action in the command line, what I get is: $ openstack server create --volume test --flavor g1t1.small --netwo rk ubuntu-net --wait testserver Multiattach volumes are only supported starting with compute API version 2.60. (HTTP 400) (Request-ID: req-abe1edc4-527e-469a-8936-fda61e9e8395) But if repeat the operation with the API version, it works: $ openstack --os-compute-api-version 2.60 server create --volume test --flavor g1t1.small --netwo rk ubuntu-net --wait testserver So it works, the problem is that the dashboard is not sending the API version parameter when trying to create the server. Inspecting the query arguments in browser I cannot see the "X-OpenStack-Nova-API- Version 2.60" parameter that I would expect to be there. This is very similar to another bug report -- Bug #1751564 -- but that one says "Fix Released" at dashboard version 14.0+ so this must not be exactly the same issue or this is a regression. I'm using Openstack Ussuri on Ubuntu Focal. Dashboard package version is openstack-dashboard 3:18.3.3-0ubuntu1. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1931440/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1885169] Re: Some arping version only accept integer number as -w argument
This bug was fixed in the package neutron - 2:16.4.0-0ubuntu2 --- neutron (2:16.4.0-0ubuntu2) focal; urgency=medium * d/p/provide-integer-argument-to-arping.patch: Cherry-pick upstream patch to ensure gratuitous APRs are correctly sent (LP: #1885169). neutron (2:16.4.0-0ubuntu1) focal; urgency=medium * New stable point release for OpenStack Ussuri (LP: #1935030). * Remove patches that have landed upstream in this point release: - d/p/updates-for-python3.8.patch - d/p/0001-Update-arp-entry-of-snat-port-on-qrouter-ns.patch -- Chris MacNaughton Fri, 16 Jul 2021 14:25:28 + ** Changed in: neutron (Ubuntu Focal) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1885169 Title: Some arping version only accept integer number as -w argument Status in devstack: Fix Released Status in neutron: Fix Released Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Focal: Fix Released Bug description: For example, Bionic v2.19-4 accepts "4.5", but not Focal v2.20-1. LOG: stack@u20:/opt/stack$ arping -A -c 3 -w 4.5 -I br-ex 192.168.20.70 arping: invalid argument: '4.5' To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1885169/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938284] [NEW] Missing Diffie-Hellman-Groups
Public bug reported: The values for the pfs (perfect forward secrecy) when creating an ike or ipsec policy are limited to the Diffie-Hellman-Groups 2,5 and 14. Strongswan as the default provider supports more than these 3 groups, e.g. group20(ecp384). ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1938284 Title: Missing Diffie-Hellman-Groups Status in neutron: New Bug description: The values for the pfs (perfect forward secrecy) when creating an ike or ipsec policy are limited to the Diffie-Hellman-Groups 2,5 and 14. Strongswan as the default provider supports more than these 3 groups, e.g. group20(ecp384). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1938284/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1860312] Re: compute service failed to delete
actully the operator woudl be deleting the comptue service after removing the compute nodes. you shoudl remove the compute service first but we shoudl fix this regardless. you should be able to recreate this bug by just creating a compute servce and then deleteing it. ** Changed in: nova Status: Expired => Triaged ** Changed in: nova Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1860312 Title: compute service failed to delete Status in OpenStack Compute (nova): Triaged Bug description: Description === I deployed openstack with openstack-helm on kubernetes.When one of the nova-compute service(driver=ironic replica of the deployment is 1) breakdown.It may be scheduled to another node by kubernetes.When I try to delete the old compute service(status down), it failed. Steps to reproduce == Firstly, openstack was deployed in kubernetes cluster, and the replica of the nova-compute-ironic is 1. * I deleted the pod nova-compute-ironic-x * then wait for the new pod to start * then exec openstack compute service list, there will be two compute service for ironic, the status of the old one would be down. * then I try to delete the old compute service Expected result === the old compute service could be deleted successfully Actual result = failed to delete, and returned an http 500 Environment === 1. Exact version of OpenStack you are running. See the following 18.2.2, rocky 2. Which hypervisor did you use? Libvirt + KVM 2. Which storage type did you use? ceph 3. Which networking type did you use? Neutron with OpenVSwitch Logs & Configs == 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi [req-922cc601-9aa1-4c3d-ad9c-71f73a341c28 40e7b8c3d59943e08a52acd24fe30652 d13f1690c08d41ac854d720ea510a710 - default default] Unexpected exception in API method: ComputeHostNotFound: Compute host mgt-slave03 could not be found. 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 801, in wrapped 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/api/openstack/compute/services.py", line 252, in delete 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi context, service.host) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi result = fn(cls, context, *args, **kwargs) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/objects/compute_node.py", line 443, in get_all_by_host 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi use_slave=use_slave) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 213, in wrapper 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/objects/compute_node.py", line 438, in _db_compute_node_get_all_by_host 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return db.compute_node_get_all_by_host(context, host) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/api.py", line 291, in compute_node_get_all_by_host 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return IMPL.compute_node_get_all_by_host(context, host) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 258, in wrapped 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return f(context, *args, **kwargs) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi File "/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 659, in compute_node_get_all_by_host 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi raise exception.ComputeHostNotFound(host=host) 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi ComputeHostNotFound: Compute host mgt-slave03 could not be found. 2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi 2020-01-20 06:44:53.480 1
[Yahoo-eng-team] [Bug 1938265] [NEW] nova-snapshot-fail-multi-store-rbd
Public bug reported: As of now, with multi store enabled, adding a new location to an image will make the add_location API call to fail if the store metadata is missing: Code in glance: https://github.com/openstack/glance/blob/master/glance/location.py#L134 Then in glance_store: https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111 This will raise a "KeyError: None" and raise a very standard "Invalid Location" 400 error when adding a new location. Point is, with a rbd backend, nova never specify this metadata when creating the image during the direct snapshot process (flatten the image directly in ceph image pool + adding the location directly in glance), so snapshot will always fail. A solution can be to infer the backend from the location uri, like we do during the store metadata lazy population. ** Affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1938265 Title: nova-snapshot-fail-multi-store-rbd Status in Glance: New Bug description: As of now, with multi store enabled, adding a new location to an image will make the add_location API call to fail if the store metadata is missing: Code in glance: https://github.com/openstack/glance/blob/master/glance/location.py#L134 Then in glance_store: https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111 This will raise a "KeyError: None" and raise a very standard "Invalid Location" 400 error when adding a new location. Point is, with a rbd backend, nova never specify this metadata when creating the image during the direct snapshot process (flatten the image directly in ceph image pool + adding the location directly in glance), so snapshot will always fail. A solution can be to infer the backend from the location uri, like we do during the store metadata lazy population. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1938265/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938262] [NEW] [stable/ussuri] Functional jobs timeout, many tests with fixtures._fixtures.timeout.TimeoutException
Public bug reported: This started recently (last backport successfully merged was on July 20th), functional tests now fail 100% on recent backports with TIMED_OUT. For example: https://review.opendev.org/c/openstack/neutron/+/801882 https://review.opendev.org/c/openstack/neutron/+/802528 I confirmed it with dummy change: https://review.opendev.org/c/openstack/neutron/+/802552 Many tests fail with: 2021-07-27 17:34:20.939074 | controller | File "/usr/lib/python3.6/threading.py", line 295, in wait 2021-07-27 17:34:20.939093 | controller | waiter.acquire() 2021-07-27 17:34:20.939107 | controller | 2021-07-27 17:34:20.939120 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/semaphore.py", line 115, in acquire 2021-07-27 17:34:20.939134 | controller | hubs.get_hub().switch() 2021-07-27 17:34:20.939147 | controller | 2021-07-27 17:34:20.939161 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch 2021-07-27 17:34:20.939177 | controller | return self.greenlet.switch() 2021-07-27 17:34:20.939191 | controller | 2021-07-27 17:34:20.939205 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run 2021-07-27 17:34:20.939219 | controller | self.wait(sleep_time) 2021-07-27 17:34:20.939233 | controller | 2021-07-27 17:34:20.939246 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 80, in wait 2021-07-27 17:34:20.939260 | controller | presult = self.do_poll(seconds) 2021-07-27 17:34:20.939316 | controller | 2021-07-27 17:34:20.939338 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/epolls.py", line 31, in do_poll 2021-07-27 17:34:20.939352 | controller | return self.poll.poll(seconds) 2021-07-27 17:34:20.939366 | controller | 2021-07-27 17:34:20.939380 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler 2021-07-27 17:34:20.939394 | controller | raise TimeoutException() 2021-07-27 17:34:20.939407 | controller | 2021-07-27 17:34:20.939421 | controller | fixtures._fixtures.timeout.TimeoutException Start of backtrace depends on tests (namespace creation, ip_lib, ...) so it does look like generic issue in related package Note this is specific to stable/ussuri, the first backport mentioned passed in newer branches and in stable/train without issue. Functional tests are passing in train with same OS, same python version 3.6 Nothing suspicious logged in the tests functional output itself ** Affects: neutron Importance: Undecided Status: New ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1938262 Title: [stable/ussuri] Functional jobs timeout, many tests with fixtures._fixtures.timeout.TimeoutException Status in neutron: New Bug description: This started recently (last backport successfully merged was on July 20th), functional tests now fail 100% on recent backports with TIMED_OUT. For example: https://review.opendev.org/c/openstack/neutron/+/801882 https://review.opendev.org/c/openstack/neutron/+/802528 I confirmed it with dummy change: https://review.opendev.org/c/openstack/neutron/+/802552 Many tests fail with: 2021-07-27 17:34:20.939074 | controller | File "/usr/lib/python3.6/threading.py", line 295, in wait 2021-07-27 17:34:20.939093 | controller | waiter.acquire() 2021-07-27 17:34:20.939107 | controller | 2021-07-27 17:34:20.939120 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/semaphore.py", line 115, in acquire 2021-07-27 17:34:20.939134 | controller | hubs.get_hub().switch() 2021-07-27 17:34:20.939147 | controller | 2021-07-27 17:34:20.939161 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch 2021-07-27 17:34:20.939177 | controller | return self.greenlet.switch() 2021-07-27 17:34:20.939191 | controller | 2021-07-27 17:34:20.939205 | controller | File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run 2021-07-27 17:34:20.939219 | controller | self.wait(sleep_time) 2021-07-27 17:34:20.939233 | controller | 2021-07-27 17:34:20.939246 | controller | File
[Yahoo-eng-team] [Bug 1938261] [NEW] [ovn]Router scheduler failing for config "default_availability_zones"
Public bug reported: I have 3 gateway chassis and all available zones are nova, have 1 chassis. The default_availability_zones=zone1 are configured in the neutron.conf. I create router and not set availability_zone_hits, I can create router success and router's availability_zones=zone1, though ovn-nbctl command, can found router's gateway_chassis include all chassis(4 nodes). I think should fail in this case, indicating that the available_zone does not exist. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1938261 Title: [ovn]Router scheduler failing for config "default_availability_zones" Status in neutron: New Bug description: I have 3 gateway chassis and all available zones are nova, have 1 chassis. The default_availability_zones=zone1 are configured in the neutron.conf. I create router and not set availability_zone_hits, I can create router success and router's availability_zones=zone1, though ovn-nbctl command, can found router's gateway_chassis include all chassis(4 nodes). I think should fail in this case, indicating that the available_zone does not exist. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1938261/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp