[Yahoo-eng-team] [Bug 1927868] Re: vRouter not working after update to 16.3.1

2021-07-28 Thread Launchpad Bug Tracker
This bug was fixed in the package neutron -
2:18.1.0+git2021072117.147830620f-0ubuntu2

---
neutron (2:18.1.0+git2021072117.147830620f-0ubuntu2) impish; urgency=medium

  * d/p/revert-l3-ha-retry-when-setting-ha-router-gw-status.patch: Revert
upstream patch that introduced regression that prevented full restore
of HA routers on restart of L3 agent (LP: #1927868).

 -- Corey Bryant   Wed, 28 Jul 2021 16:40:07
-0400

** Changed in: neutron (Ubuntu Impish)
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1927868

Title:
  vRouter not working after update to 16.3.1

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive train series:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Fix Committed
Status in Ubuntu Cloud Archive wallaby series:
  Triaged
Status in Ubuntu Cloud Archive xena series:
  Fix Committed
Status in neutron:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Committed
Status in neutron source package in Hirsute:
  Fix Committed
Status in neutron source package in Impish:
  Fix Released

Bug description:
  We run a juju managed Openstack Ussuri on Bionic. After updating
  neutron packages from 16.3.0 to 16.3.1 all virtual routers stopped
  working. It seems that most (not all) namespaces are created but have
  only the lo interface and sometime the ha-XYZ interface in DOWN state.
  The underlying tap interfaces are also in down.

  neutron-l3-agent has many logs similar to the following:
  2021-05-08 15:01:45.286 39411 ERROR neutron.agent.l3.ha_router [-] Gateway 
interface for router 02945b59-639b-41be-8237-3b7933b4e32d was not set up; 
router will not work properly

  and journal logs report at around the same time
  May 08 15:01:40 lar1615.srv-louros.grnet.gr 
neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.765 18596 INFO 
neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 62.62.62.62 on 
qg-5a6efe8c-6b in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d: Exit 
code: 2; Stdin: ; Stdout: Interface "qg-5a6efe8c-6b" is down
  May 08 15:01:40 lar1615.srv-louros.grnet.gr 
neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.767 18596 INFO 
neutron.agent.linux.ip_lib [-] Interface qg-5a6efe8c-6b or address 62.62.62.62 
in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d was deleted 
concurrently

  The neutron packages installed are:

  ii  neutron-common 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - common
  ii  neutron-dhcp-agent 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - DHCP agent
  ii  neutron-l3-agent   2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - l3 agent
  ii  neutron-metadata-agent 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - metadata agent
  ii  neutron-metering-agent 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - metering agent
  ii  neutron-openvswitch-agent  2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - Open vSwitch plugin agent
  ii  python3-neutron2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - Python library
  ii  python3-neutron-lib2.3.0-0ubuntu1~cloud0  
 all  Neutron shared routines and utilities - 
Python 3.x
  ii  python3-neutronclient  1:7.1.1-0ubuntu1~cloud0
 all  client API library for Neutron - Python 
3.x

  Downgrading to 16.3.0 resolves the issues.

  =

  Ubuntu SRU details:

  [Impact]
  See above.

  [Test Case]
  Deploy openstack with l3ha and create several HA routers, the number required 
varies per environment. It is probably best to deploy a known bad version of 
the package, ensure it is failing, upgrade to the version in proposed, and 
re-test several times to confirm it is fixed.

  Restarting neutron-l3-agent should expect all HA Routers are restored.

  [Regression Potential]
  This change is fixing a regression by reverting a patch that was introduced 
in a stable point release of neutron.

To manage notifications about this bug go to:

[Yahoo-eng-team] [Bug 1896734] Re: A privsep daemon spawned by neutron-openvswitch-agent hangs when debug logging is enabled (large number of registered NICs) - an RPC response is too large for msgpac

2021-07-28 Thread Brian Murray
The Groovy Gorilla has reached end of life, so this bug will not be
fixed for that release

** Changed in: python-oslo.privsep (Ubuntu Groovy)
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896734

Title:
  A privsep daemon spawned by neutron-openvswitch-agent hangs when debug
  logging is enabled (large number of registered NICs) - an RPC response
  is too large for msgpack

Status in OpenStack neutron-openvswitch charm:
  Invalid
Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in oslo.privsep:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in python-oslo.privsep package in Ubuntu:
  New
Status in neutron source package in Focal:
  Fix Released
Status in python-oslo.privsep source package in Focal:
  New
Status in neutron source package in Groovy:
  Fix Released
Status in python-oslo.privsep source package in Groovy:
  Won't Fix
Status in neutron source package in Hirsute:
  Fix Released
Status in python-oslo.privsep source package in Hirsute:
  New

Bug description:
  [Impact]

  When there is a large amount of netdevs registered in the kernel and
  debug logging is enabled, neutron-openvswitch-agent and the privsep
  daemon spawned by it hang since the RPC call result sent by the
  privsep daemon over a unix socket exceeds the message sizes that the
  msgpack library can handle.

  The impact of this is that enabling debug logging on the cloud
  completely stalls neutron-openvswitch-agents and makes them "dead"
  from the Neutron server perspective.

  The issue is summarized in detail in comment #5
  https://bugs.launchpad.net/oslo.privsep/+bug/1896734/comments/5

  [Test Plan]

    * deploy Openstack Train/Ussuri/Victoria
    * need at least one compute host
    * enable neutron debug logging
    * create a load of interfaces on your compute host to create a large 'ip 
addr show' output
    * for ((i=0;i<400;i++)); do ip tuntap add mode tap tap-`uuidgen| cut 
-c1-11`; done
    * create a single vm
    * add floating ip
    * ping fip
    * create 20 ports and attach them to the vm
    * for ((i=0;i<20;i++)); do id=`uuidgen`; openstack port create --network 
private --security-group __SG__ X-$id; openstack server add port __VM__ X-$id; 
done
    * attaching ports should not result in errors

  [Where problems could occur]

  No problems anticipated this patchset.

  

  When there is a large amount of netdevs registered in the kernel and
  debug logging is enabled, neutron-openvswitch-agent and the privsep
  daemon spawned by it hang since the RPC call result sent by the
  privsep daemon over a unix socket exceeds the message sizes that the
  msgpack library can handle.

  The impact of this is that enabling debug logging on the cloud
  completely stalls neutron-openvswitch-agents and makes them "dead"
  from the Neutron server perspective.

  The issue is summarized in detail in comment #5
  https://bugs.launchpad.net/oslo.privsep/+bug/1896734/comments/5

  
  Old Description

  While trying to debug a different issue, I encountered a situation
  where privsep hangs in the process of handling a request from neutron-
  openvswitch-agent when debug logging is enabled (juju debug-log
  neutron-openvswitch=true):

  https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1895652/comments/11
  https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1895652/comments/12

  The issue gets reproduced reliably in the environment where I
  encountered it on all units. As a result, neutron-openvswitch-agent
  services hang while waiting for a response from the privsep daemon and
  do not progress past basic initialization. They never post any state
  back to the Neutron server and thus are marked dead by it.

  The processes though are shown as "active (running)" by systemd which
  adds to the confusion since they do indeed start from the systemd's
  perspective.

  systemctl --no-pager status neutron-openvswitch-agent.service
  ● neutron-openvswitch-agent.service - Openstack Neutron Open vSwitch Plugin 
Agent
     Loaded: loaded (/lib/systemd/system/neutron-openvswitch-agent.service; 
enabled; vendor preset: enabled)
     Active: active (running) since Wed 2020-09-23 08:28:41 UTC; 25min ago
   Main PID: 247772 (/usr/bin/python)
  Tasks: 4 (limit: 9830)
     CGroup: /system.slice/neutron-openvswitch-agent.service
     ├─247772 /usr/bin/python3 /usr/bin/neutron-openvswitch-agent 
--config-file=/etc/neutron/neutron.conf 
--config-file=/etc/neutron/plugins/ml2/openvswitch_…og
     └─248272 /usr/bin/python3 /usr/bin/privsep-helper 

[Yahoo-eng-team] [Bug 1906266] Re: After upgrade: "libvirt.libvirtError: Requested operation is not valid: format of backing image %s of image %s was not specified"

2021-07-28 Thread Brian Murray
The Groovy Gorilla has reached end of life, so this bug will not be
fixed for that release

** Changed in: nova (Ubuntu Groovy)
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1906266

Title:
  After upgrade: "libvirt.libvirtError: Requested operation is not
  valid: format of backing image %s of image %s was not specified"

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in OpenStack Compute (nova):
  Won't Fix
Status in libvirt package in Ubuntu:
  Fix Released
Status in nova package in Ubuntu:
  New
Status in libvirt source package in Focal:
  Fix Released
Status in nova source package in Focal:
  New
Status in libvirt source package in Groovy:
  Fix Released
Status in nova source package in Groovy:
  Won't Fix

Bug description:
  [Impact]

   * New libvirt got more strict in regard to file format specification.
     While this is generally the right approach it causes some issues for
     upgraders that have old image chains now failing.

   * Upstream has added code to relax those checks under a set of conditions
     which will allow to go forward with stricter conditions as planned but
     at the same time not break/block upgrades.

  [Test Plan]

   * Thanks to Brett Milford for sharing his test steps for this
   
  sudo apt-get update
  sudo apt-get install libvirt-daemon-system cloud-image-utils virtinst -y

  IMG="focal-server-cloudimg-amd64.img"
  IMG_PATH="/var/lib/libvirt/images/base/$IMG"
  INSTANCE_NAME=testinst
  [ -f $IMG_PATH ] || {
  sudo mkdir -p /var/lib/libvirt/images/base
  sudo wget 
https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img \
  -O $IMG_PATH
  }
  sudo mkdir -p /var/lib/libvirt/images/$INSTANCE_NAME
  sudo qemu-img convert -O raw $IMG_PATH ${IMG_PATH%.*}
  sudo qemu-img create -f qcow2 -o backing_file=${IMG_PATH%.*} 
/var/lib/libvirt/images/$INSTANCE_NAME/root.img
  sudo qemu-img resize /var/lib/libvirt/images/$INSTANCE_NAME/root.img 5G

  virt-install --connect qemu:///system --name $INSTANCE_NAME --cpu host
  --os-type linux --os-variant generic --graphics vnc --console
  pty,target_type=serial --disk
  path=/var/lib/libvirt/images/$INSTANCE_NAME/root.img,bus=virtio,format=qcow2
  --network default,model=virtio --noautoconsole --vcpus 1 --memory 1024
  --import


  [Where problems could occur]

   * Of the many things that qemu/libvirt do this changes only the format
     probing. So issues (hopefully not) would be expected to appear mostly
     around complex scenarios of image files.
     We've had a look at image files and image file chains, and so far all
     were good. But there are more obscure (and not supported) cases like
     image backed by real-disk that might misbehave. But still it would
     fix Focal to be the outlier as the past was ok (didn't care) and the
     future (relaxed check) and only focal is left broken in between.

  [Other Info]

   * A lot has changes in that area, but instead of pulling in a vast set
     of changes a smaller set was identified to suite the SRU needs. It
     was so far found not found regressing anything and OTOH fixed the issue
     (tested form PPA) for affected people.

  

  In a site upgraded to Ussuri we are getting faults starting instances

  2020-11-30 13:41:40.586 232871 ERROR oslo_messaging.rpc.server
  libvirt.libvirtError: Requested operation is not valid: format of
  backing image '/var/lib/nova/instances/_base/xxx' of image
  '/var/lib/nova/instances/xxx' was not specified in the image metadata
  (See https://libvirt.org/kbase/backing_chains.html for
  troubleshooting)

  Bug #1864020 reports similar symptoms, where due to an upstream change
  in Libvirt v6.0.0+ images need the backing format specified.

  The fix for Bug #1864020 handles the case for new instances. However,
  for upgraded instances we're hitting the same problem, as those still
  don't have backing format specified.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1906266/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1927868] Re: vRouter not working after update to 16.3.1

2021-07-28 Thread Corey Bryant
** Also affects: neutron (Ubuntu Hirsute)
   Importance: Undecided
   Status: New

** Also affects: neutron (Ubuntu Impish)
   Importance: Critical
   Status: Triaged

** Also affects: neutron (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Changed in: neutron (Ubuntu Focal)
   Status: New => Triaged

** Changed in: neutron (Ubuntu Hirsute)
   Importance: Undecided => Critical

** Changed in: neutron (Ubuntu Focal)
   Importance: Undecided => Critical

** Changed in: neutron (Ubuntu Hirsute)
   Status: New => Triaged

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/wallaby
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/victoria
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/train
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/ussuri
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/xena
   Importance: Undecided
   Status: New

** Changed in: cloud-archive/xena
   Importance: Undecided => Critical

** Changed in: cloud-archive/xena
   Status: New => Triaged

** Changed in: cloud-archive/wallaby
   Importance: Undecided => Critical

** Changed in: cloud-archive/wallaby
   Status: New => Triaged

** Changed in: cloud-archive/victoria
   Importance: Undecided => Critical

** Changed in: cloud-archive/victoria
   Status: New => Triaged

** Changed in: cloud-archive/ussuri
   Importance: Undecided => Critical

** Changed in: cloud-archive/ussuri
   Status: New => Triaged

** Changed in: cloud-archive/train
   Importance: Undecided => Critical

** Changed in: cloud-archive/train
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1927868

Title:
  vRouter not working after update to 16.3.1

Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu Cloud Archive train series:
  Triaged
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Triaged
Status in Ubuntu Cloud Archive wallaby series:
  Triaged
Status in Ubuntu Cloud Archive xena series:
  Triaged
Status in neutron:
  New
Status in neutron package in Ubuntu:
  Triaged
Status in neutron source package in Focal:
  Triaged
Status in neutron source package in Hirsute:
  Triaged
Status in neutron source package in Impish:
  Triaged

Bug description:
  We run a juju managed Openstack Ussuri on Bionic. After updating
  neutron packages from 16.3.0 to 16.3.1 all virtual routers stopped
  working. It seems that most (not all) namespaces are created but have
  only the lo interface and sometime the ha-XYZ interface in DOWN state.
  The underlying tap interfaces are also in down.

  neutron-l3-agent has many logs similar to the following:
  2021-05-08 15:01:45.286 39411 ERROR neutron.agent.l3.ha_router [-] Gateway 
interface for router 02945b59-639b-41be-8237-3b7933b4e32d was not set up; 
router will not work properly

  and journal logs report at around the same time
  May 08 15:01:40 lar1615.srv-louros.grnet.gr 
neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.765 18596 INFO 
neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 62.62.62.62 on 
qg-5a6efe8c-6b in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d: Exit 
code: 2; Stdin: ; Stdout: Interface "qg-5a6efe8c-6b" is down
  May 08 15:01:40 lar1615.srv-louros.grnet.gr 
neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.767 18596 INFO 
neutron.agent.linux.ip_lib [-] Interface qg-5a6efe8c-6b or address 62.62.62.62 
in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d was deleted 
concurrently

  
  The neutron packages installed are:

  ii  neutron-common 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - common
  ii  neutron-dhcp-agent 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - DHCP agent
  ii  neutron-l3-agent   2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - l3 agent
  ii  neutron-metadata-agent 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - metadata agent
  ii  neutron-metering-agent 2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - metering agent
  ii  neutron-openvswitch-agent  2:16.3.1-0ubuntu1~cloud0   
 all  Neutron is a virtual network service for 
Openstack - Open vSwitch plugin agent
  ii  python3-neutron   

[Yahoo-eng-team] [Bug 1938344] [NEW] Setting snap-store-assertions in a model config causes cloud-init to fail

2021-07-28 Thread Simon Déziel
Public bug reported:

When the juju model config includes a `snap-store-assertions`, the
cloud-init.service runs into problem when trying to contact snapd's
socket.

In this setup, cloud-ctrl01 runs a Juju controller in a LXD container
and cloud-vm02 is a VM created by MAAS. Here is how to reproduce the
issue:

ubuntu@cloud-ctrl01:~$ juju model-config
...
snap-store-assertions model|-
  type: account-key
  authority-id: canonical
  revision: 2
...
snap-store-proxy  modeldI5E5ZV6U3wOc919eLmZ0MtOxAyxxTIP
snap-store-proxy-url  default  ""
...

ubuntu@cloud-ctrl01:~$ juju add-machine
created machine 2

ubuntu@cloud-ctrl01:~$ juju ssh 2
...

Inside "machine 2", cloud-init's journal output:

ubuntu@cloud-vm02:~$ journalctl -u cloud-init.service | cat
-- Logs begin at Wed 2021-07-28 21:06:26 UTC, end at Wed 2021-07-28 21:17:01 
UTC. --
Jul 28 21:06:31 ubuntu systemd[1]: Starting Initial cloud-init job (metadata 
service crawler)...
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: Cloud-init v. 
21.2-3-g899bfaa9-0ubuntu2~20.04.1 running 'init' at Wed, 28 Jul 2021 21:06:32 
+. Up 10.23 seconds.
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: 
++Net device 
info+++
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: 
++--+-+---++---+
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: | Device |  Up  |  
 Address   |  Mask | Scope  | Hw-Address|
...
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: ci-info: 
+---+---+-+---+---+
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: error: cannot assert: cannot 
communicate with server: Post http://localhost/v2/assertions: dial unix 
/run/snapd.socket: connect: no such file or directory
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: error: cannot communicate with 
server: Put http://localhost/v2/snaps/core/conf: dial unix /run/snapd.socket: 
connect: no such file or directory
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: 2021-07-28 21:06:33,810 - 
util.py[WARNING]: Failed to run bootcmd module bootcmd
Jul 28 21:06:33 cloud-vm02 cloud-init[893]: 2021-07-28 21:06:33,822 - 
util.py[WARNING]: Running module bootcmd () failed
Jul 28 21:06:34 cloud-vm02 useradd[989]: new group: name=ubuntu, GID=1000
...
Jul 28 21:06:35 cloud-vm02 systemd[1]: cloud-init.service: Main process exited, 
code=exited, status=1/FAILURE
Jul 28 21:06:35 cloud-vm02 systemd[1]: cloud-init.service: Failed with result 
'exit-code'.
Jul 28 21:06:35 cloud-vm02 systemd[1]: Failed to start Initial cloud-init job 
(metadata service crawler).

Removing the snap-store-assertions/snap-store-proxy configs from the
model make cloud-init work again.

FYI, my naive attempt at adding "After=sockets.target" to the cloud-
init.service didn't work :/

Additional information:

ubuntu@cloud-vm02:~$ /var/lib/juju/tools/machine-2/jujud version
2.9.9-ubuntu-amd64

$ lsb_release -rd
Description:Ubuntu 20.04.2 LTS
Release:20.04

ubuntu@cloud-vm02:~$ apt-cache policy cloud-init
cloud-init:
  Installed: 21.2-3-g899bfaa9-0ubuntu2~20.04.1
  Candidate: 21.2-3-g899bfaa9-0ubuntu2~20.04.1
  Version table:
 *** 21.2-3-g899bfaa9-0ubuntu2~20.04.1 500
500 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 
Packages
100 /var/lib/dpkg/status
 20.1-10-g71af48df-0ubuntu5 500
500 http://us.archive.ubuntu.com/ubuntu focal/main amd64 Packages

ubuntu@cloud-ctrl01:~$ juju version
2.9.9-ubuntu-amd64

** Affects: cloud-init
 Importance: Undecided
 Status: New

** Affects: juju
 Importance: Undecided
 Status: New

** Also affects: cloud-init
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1938344

Title:
  Setting snap-store-assertions in a model config causes cloud-init to
  fail

Status in cloud-init:
  New
Status in juju:
  New

Bug description:
  When the juju model config includes a `snap-store-assertions`, the
  cloud-init.service runs into problem when trying to contact snapd's
  socket.

  In this setup, cloud-ctrl01 runs a Juju controller in a LXD container
  and cloud-vm02 is a VM created by MAAS. Here is how to reproduce the
  issue:

  ubuntu@cloud-ctrl01:~$ juju model-config
  ...
  snap-store-assertions model|-
type: account-key
authority-id: canonical
revision: 2
  ...
  snap-store-proxy  modeldI5E5ZV6U3wOc919eLmZ0MtOxAyxxTIP
  snap-store-proxy-url  default  ""
  ...

  ubuntu@cloud-ctrl01:~$ juju add-machine
  created machine 2

  ubuntu@cloud-ctrl01:~$ juju ssh 2
  ...

  Inside "machine 2", cloud-init's journal output:

  ubuntu@cloud-vm02:~$ journalctl -u cloud-init.service | cat
  -- Logs begin at Wed 

[Yahoo-eng-team] [Bug 1563069] Re: Centralize Configuration Options

2021-07-28 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1563069

Title:
  Centralize Configuration Options

Status in neutron:
  Fix Released

Bug description:
  [Overview]
  Refactor Neutron configuration options to be in one place 'neutron/conf' 
similar to the Nova implementation found here: 
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/centralize-config-options.html

  This would allow for centralization of all configuration options and
  provide an easy way to import and gain access to the wide breadth of
  configuration options available to Neutron.

  [Proposal]

  1. Introduce a new package: neutron/conf

  Neutron Quotas Example:

  2. Group modules logically under new package:
2a. Example: options from neutron/quotas
2b. Move to neutron/conf/quotas/common.py
2c. Aggregate quota options in __init__.py
  4. Import neutron.conf.quotas for usage

  Neutron DB Example /w Agent Options:

  2. Group modules logically under new package:
2a. Example: options from neutron/db/agents_db.py
2b. Move to neutron/conf/db/agents.py
2c. Aggregate db options in __init__.py
  4. Import neutron.conf.db for usage

  Neutron DB Example /w Migration CLI:

  2. Group modules logically under new package:
2a. Example: options from neutron/db/migrations/cli.py
2b. Move to neutron/conf/db/migrations_cli.py
2c. Migrations CLI does not get aggregated in __init__.py
  4. Import neutron.conf.db.migrations_cli

  ** neutron.opts list options methods all get moved to neutron/conf as
  well in their respective modules and setup.cfg is modified for this
  adjustment.

  [Benefits]

  - As a developer I will find all config options in one place and will add 
further config options to that central place.
  - End user is not affected by this change.

  [Related information]
  [1] Nova Implementation: 
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/centralize-config-options.html
  [2] Cross Project Spec: https://review.openstack.org/#/c/295543

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1563069/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1642770] Re: Security group code is doing unnecessary work removing chains

2021-07-28 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1642770

Title:
  Security group code is doing unnecessary work removing chains

Status in neutron:
  Fix Released

Bug description:
  The security group code is generating a lot of these messages when
  trying to boot VMs:

  Attempted to remove chain sg-chain which does not exist

  There's also ones specific to the port.  It seems to be calling
  remove_chain(), even when it's a new port and it's initially setting
  up it's filter.  I dropped a print_stack() in remove_chain() and see
  tracebacks like this:

  Prepare port filter for e8f41910-c24e-41f1-ae7f-355e9bb1d18a 
_apply_port_filter /opt/stack/neutron/neutron/agent/securitygroups_rpc.py:163
  Preparing device (e8f41910-c24e-41f1-ae7f-355e9bb1d18a) filter 
prepare_port_filter 
/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py:170
  Attempted to remove chain sg-chain which does not exist remove_chain 
/opt/stack/neutron/neutron/agent/linux/iptables_manager.py:177
File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 
214, in main
  result = function(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/ryu/lib/hub.py", line 54, in 
_launch
  return func(*args, **kwargs)
File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py",
 line 37, in agent_main_wrapper
  ovs_agent.main(bridge_classes)
File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2177, in main
  agent.daemon_loop()
File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 
154, in wrapper
  return f(*args, **kwargs)
File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2098, in daemon_loop
  self.rpc_loop(polling_manager=pm)
File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 
154, in wrapper
  return f(*args, **kwargs)
File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2049, in rpc_loop
  port_info, ovs_restarted)
File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 
154, in wrapper
  return f(*args, **kwargs)
File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1657, in process_network_ports
  port_info.get('updated', set()))
File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 266, in 
setup_port_filters
  self.prepare_devices_filter(new_devices)
File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 131, in 
decorated_function
  *args, **kwargs)
File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 139, in 
prepare_devices_filter
  self._apply_port_filter(device_ids)
File "/opt/stack/neutron/neutron/agent/securitygroups_rpc.py", line 164, in 
_apply_port_filter
  self.firewall.prepare_port_filter(device)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
  self.gen.next()
File "/opt/stack/neutron/neutron/agent/firewall.py", line 139, in 
defer_apply
  self.filter_defer_apply_off()
File "/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py", line 
838, in filter_defer_apply_off
  self._pre_defer_unfiltered_ports)
File "/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py", line 
248, in _remove_chains_apply
  self._remove_chain_by_name_v4v6(SG_CHAIN)
File "/opt/stack/neutron/neutron/agent/linux/iptables_firewall.py", line 
279, in _remove_chain_by_name_v4v6
  self.iptables.ipv4['filter'].remove_chain(chain_name)
File "/opt/stack/neutron/neutron/agent/linux/iptables_manager.py", line 
178, in remove_chain
  traceback.print_stack()

  Looking at the code, there's a couple of things that are interesting:

  1) prepare_port_filter() calls self._remove_chains() - why?
  2) in the "defer" case above we always do 
_remove_chains_apply()/_setup_chains_apply() - is there some way to skip the 
remove?

  This also led to us timing how long it's taking in the remove_chain()
  code, since that's where the message is getting printed.  As the
  number of ports and rules grow, it's spending more time spinning
  through chains and rules.  It looks like that can be helped with a
  small code change, which is just fallout from the real problem.  I'll
  send that out since it helps a little.

  More work still required.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1642770/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : 

[Yahoo-eng-team] [Bug 1816485] Re: [rfe] change neutron process names to match their role

2021-07-28 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1816485

Title:
  [rfe] change neutron process names to match their role

Status in neutron:
  Fix Released

Bug description:
  See the commit message description here:
  https://review.openstack.org/#/c/637019/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1816485/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1856600] Re: Unit test jobs are failing with ImportError: cannot import name 'engine' from 'flake8'

2021-07-28 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1856600

Title:
  Unit test jobs are failing with ImportError: cannot import name
  'engine' from 'flake8'

Status in neutron:
  Invalid

Bug description:
  Neutron unit test CI jobs are failing with the following error:

  =
  Failures during discovery
  =
  --- import errors ---
  Failed to import test module: neutron.tests.unit.hacking.test_checks
  Traceback (most recent call last):
File "/usr/lib/python3.7/unittest/loader.py", line 436, in _find_test_path
  module = self._get_module_from_name(name)
File "/usr/lib/python3.7/unittest/loader.py", line 377, in 
_get_module_from_name
  __import__(name)
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/unit/hacking/test_checks.py",
 line 15, in 
  from flake8 import engine
  ImportError: cannot import name 'engine' from 'flake8' 
(/home/zuul/src/opendev.org/openstack/neutron/.tox/py37/lib/python3.7/site-packages/flake8/__init__.py)

  Example:
  
https://e859f0a6f5995c9142c5-a232ce3bdc50fca913ceba9a1c600c62.ssl.cf5.rackcdn.com/572767/23/check/openstack-
  tox-py37/1d036e0/job-output.txt

  Looks like flake8 no longer has an engine but they had kept the api
  for backward compatibility [1], perhaps they broke it somehow.

  [1] based on comment in
  https://gitlab.com/pycqa/flake8/blob/master/src/flake8/api/legacy.py#L3

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1856600/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938326] [NEW] Migration gets stuck at pre-migrating status if source compute node is down but maintenance enabled

2021-07-28 Thread Lee Yarwood
Public bug reported:

Description
===
Currently nova rejects migration(resize) if the source compute node is down but 
not if the service has previously been disabled.

Steps to reproduce
==
1. Create an instance
2. Shutdown the compute node where the instance is started
3. Enable maintenance of the nova-compute service on the source compute node
4. Migrate the instance

Expected result
===
Migration is rejected

Actual result
=
Migration is accepted but gets stuck in pre-migrating status

Environment
===
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   If this is from a distro please provide
   $ dpkg -l | grep nova
   or
   $ rpm -ql | grep nova
   If this is from git, please provide
   $ git log -1

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

Logs & Configs
==

https://bugzilla.redhat.com/show_bug.cgi?id=1985712#c0

** Affects: nova
 Importance: Undecided
 Assignee: Lee Yarwood (lyarwood)
 Status: New


** Tags: api resize

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1938326

Title:
  Migration gets stuck at pre-migrating status if source compute node is
  down but maintenance enabled

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  Currently nova rejects migration(resize) if the source compute node is down 
but not if the service has previously been disabled.

  Steps to reproduce
  ==
  1. Create an instance
  2. Shutdown the compute node where the instance is started
  3. Enable maintenance of the nova-compute service on the source compute node
  4. Migrate the instance

  Expected result
  ===
  Migration is rejected

  Actual result
  =
  Migration is accepted but gets stuck in pre-migrating status

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 If this is from a distro please provide
 $ dpkg -l | grep nova
 or
 $ rpm -ql | grep nova
 If this is from git, please provide
 $ git log -1

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

  Logs & Configs
  ==

  https://bugzilla.redhat.com/show_bug.cgi?id=1985712#c0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1938326/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938323] [NEW] [Queens] tokens generated with nocatalog are not usable in some requests

2021-07-28 Thread Pavlo Shchelokovskyy
Public bug reported:

NOTE - this is happening only on Queens, and possibly earlier, and is
already silently fixed in Rocky as part of the major refactor of token
model. I am posing this issue here so that anyone still running Queens
has a reference and probably a patch to apply.


In Queens release, if I create a token using a nocatalog option:

  curl -X POST /v3/auth/tokens?nocatalog

and then use this token to e.g. list servers with details

  curl -X GET /v2.1/servers/detail

I get 500 error from nova, with nova api logs containing

  ERROR nova.api.openstack EmptyCatalog: The service catalog is empty.

When repeating the same request with the same token after 5-10 minutes, the 
token starts to work.
Tokens generated with catalog are working as well.

AFAIU this goes down to token caching - in Queens tokens are
cached/memoized with catalog - or without it if token was requested w/o
catalog. Then on token validation, the token validation response takes
the token - with or without catalog - from cache and returns it to
caller with minimal processing - e.g. removing catalog if token
validation call asked for it. It does not however ensures that the
catalog is present otherwise.

This breaks some other services like Nova, that expects the catalog to
be present in the request context constructed from the
keystonemiddleware results. Nova needs this for example to make API
requests to other services - exactly what happens in server/details call
where it has to ask Neutron for some network info about instances.

After the cache is invalidated, the catalog starts to be generated for
token validation response anew, and everything starts to work as
expected.

** Affects: keystone
 Importance: Undecided
 Assignee: Pavlo Shchelokovskyy (pshchelo)
 Status: In Progress

** Changed in: keystone
   Status: New => In Progress

** Changed in: keystone
 Assignee: (unassigned) => Pavlo Shchelokovskyy (pshchelo)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1938323

Title:
  [Queens] tokens generated with nocatalog are not usable in some
  requests

Status in OpenStack Identity (keystone):
  In Progress

Bug description:
  NOTE - this is happening only on Queens, and possibly earlier, and is
  already silently fixed in Rocky as part of the major refactor of token
  model. I am posing this issue here so that anyone still running Queens
  has a reference and probably a patch to apply.

  
  In Queens release, if I create a token using a nocatalog option:

curl -X POST /v3/auth/tokens?nocatalog

  and then use this token to e.g. list servers with details

curl -X GET /v2.1/servers/detail

  I get 500 error from nova, with nova api logs containing

ERROR nova.api.openstack EmptyCatalog: The service catalog is empty.

  When repeating the same request with the same token after 5-10 minutes, the 
token starts to work.
  Tokens generated with catalog are working as well.

  AFAIU this goes down to token caching - in Queens tokens are
  cached/memoized with catalog - or without it if token was requested
  w/o catalog. Then on token validation, the token validation response
  takes the token - with or without catalog - from cache and returns it
  to caller with minimal processing - e.g. removing catalog if token
  validation call asked for it. It does not however ensures that the
  catalog is present otherwise.

  This breaks some other services like Nova, that expects the catalog to
  be present in the request context constructed from the
  keystonemiddleware results. Nova needs this for example to make API
  requests to other services - exactly what happens in server/details
  call where it has to ask Neutron for some network info about
  instances.

  After the cache is invalidated, the catalog starts to be generated for
  token validation response anew, and everything starts to work as
  expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1938323/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1931440] Re: Unable to use multiattach volume as boot for new server

2021-07-28 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/horizon/+/798730
Committed: 
https://opendev.org/openstack/horizon/commit/64fe0abb653950c85d455dedd09ef42856c6b07b
Submitter: "Zuul (22348)"
Branch:master

commit 64fe0abb653950c85d455dedd09ef42856c6b07b
Author: manchandavishal 
Date:   Tue Jun 29 23:38:41 2021 +0530

Fix Unable to use multiattach volume as boot for new server

If we try to create a new server from a bootable volume that
supports multiattach, it will fail to create with an error
message that ``multiattach volumes are only supported starting
with compute API version 2.60``. This patch fixes the issue.

Closes-Bug: #1931440
Change-Id: Ic8330b947b1a733f70c3bdad8b3493f20a2f26fb


** Changed in: horizon
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1931440

Title:
  Unable to use multiattach volume as boot for new server

Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  
  I'm trying to create a new server using the dashboard and use an existing 
volume for boot. That volume was created previously (from image) and has a type 
with multiattach property enabled. The operation fails with the dashboard 
showing an error message baloon with:

  Multiattach volumes are only supported starting with compute API
  version 2.60 (HTTP 400 req: xxx)

  If I execute the action in the command line, what I get is:

  $ openstack server create --volume test --flavor g1t1.small --netwo
  rk ubuntu-net --wait testserver
  Multiattach volumes are only supported starting with compute API version 
2.60. (HTTP 400) (Request-ID: req-abe1edc4-527e-469a-8936-fda61e9e8395)

  But if repeat the operation with the API version, it works:

  $ openstack --os-compute-api-version 2.60 server create --volume test 
--flavor g1t1.small --netwo
  rk ubuntu-net --wait testserver
  

  So it works, the problem is that the dashboard is not sending the API
  version parameter when trying to create the server. Inspecting the
  query arguments in browser I cannot see the "X-OpenStack-Nova-API-
  Version 2.60" parameter that I would expect to be there.

  This is very similar to another bug report -- Bug #1751564 -- but that
  one says "Fix Released" at dashboard version 14.0+ so this must not be
  exactly the same issue or this is a regression.

  I'm using Openstack Ussuri on Ubuntu Focal. Dashboard package version
  is openstack-dashboard  3:18.3.3-0ubuntu1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1931440/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1885169] Re: Some arping version only accept integer number as -w argument

2021-07-28 Thread Launchpad Bug Tracker
This bug was fixed in the package neutron - 2:16.4.0-0ubuntu2

---
neutron (2:16.4.0-0ubuntu2) focal; urgency=medium

  * d/p/provide-integer-argument-to-arping.patch: Cherry-pick upstream
patch to ensure gratuitous APRs are correctly sent (LP: #1885169).

neutron (2:16.4.0-0ubuntu1) focal; urgency=medium

  * New stable point release for OpenStack Ussuri (LP: #1935030).
  * Remove patches that have landed upstream in this point release:
- d/p/updates-for-python3.8.patch
- d/p/0001-Update-arp-entry-of-snat-port-on-qrouter-ns.patch

 -- Chris MacNaughton   Fri, 16 Jul 2021
14:25:28 +

** Changed in: neutron (Ubuntu Focal)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1885169

Title:
  Some arping version only accept integer number as -w argument

Status in devstack:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released

Bug description:
  For example, Bionic v2.19-4 accepts "4.5", but not Focal v2.20-1.

  LOG:
  stack@u20:/opt/stack$ arping -A -c 3 -w 4.5 -I br-ex 192.168.20.70
  arping: invalid argument: '4.5'

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1885169/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938284] [NEW] Missing Diffie-Hellman-Groups

2021-07-28 Thread Maxim Korezkij
Public bug reported:

The values for the pfs (perfect forward secrecy) when creating an ike or
ipsec policy are limited to the Diffie-Hellman-Groups 2,5 and 14.

Strongswan as the default provider supports more than these 3 groups,
e.g. group20(ecp384).

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938284

Title:
  Missing Diffie-Hellman-Groups

Status in neutron:
  New

Bug description:
  The values for the pfs (perfect forward secrecy) when creating an ike
  or ipsec policy are limited to the Diffie-Hellman-Groups 2,5 and 14.

  Strongswan as the default provider supports more than these 3 groups,
  e.g. group20(ecp384).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1938284/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1860312] Re: compute service failed to delete

2021-07-28 Thread sean mooney
actully the operator woudl be deleting the comptue service after removing the 
compute nodes.
you shoudl remove the compute service first but we shoudl fix this regardless.

you should be able to recreate this bug by just creating a compute servce
and then deleteing it.

** Changed in: nova
   Status: Expired => Triaged

** Changed in: nova
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1860312

Title:
  compute service failed to delete

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Description
  ===
  I deployed openstack with openstack-helm on kubernetes.When one of the 
nova-compute service(driver=ironic replica of the deployment is 1) breakdown.It 
may be scheduled to another node by kubernetes.When I try to delete the old 
compute service(status down), it failed.

  Steps to reproduce
  ==
  Firstly, openstack was deployed in kubernetes cluster, and the replica of the 
nova-compute-ironic is 1.
  * I deleted the pod nova-compute-ironic-x
  * then wait for the new pod to start
  * then exec openstack compute service list, there will be two compute service 
for ironic, the status of the old one would be down.
  * then I try to delete the old compute service

  Expected result
  ===
  the old compute service could be deleted successfully

  Actual result
  =
  failed to delete, and returned an http 500

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
 18.2.2, rocky

  2. Which hypervisor did you use?
 Libvirt + KVM

  2. Which storage type did you use?
 ceph

  3. Which networking type did you use?
 Neutron with OpenVSwitch

  Logs & Configs
  ==
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi 
[req-922cc601-9aa1-4c3d-ad9c-71f73a341c28 40e7b8c3d59943e08a52acd24fe30652 
d13f1690c08d41ac854d720ea510a710 - default default] Unexpected exception in API 
method: ComputeHostNotFound: Compute host mgt-slave03 could not be found.
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/api/openstack/wsgi.py",
 line 801, in wrapped
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return f(*args, 
**kwargs)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/api/openstack/compute/services.py",
 line 252, in delete
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi context, 
service.host)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py",
 line 184, in wrapper
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi result = fn(cls, 
context, *args, **kwargs)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/objects/compute_node.py",
 line 443, in get_all_by_host
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi 
use_slave=use_slave)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py",
 line 213, in wrapper
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return f(*args, 
**kwargs)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/objects/compute_node.py",
 line 438, in _db_compute_node_get_all_by_host
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return 
db.compute_node_get_all_by_host(context, host)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/api.py", line 
291, in compute_node_get_all_by_host
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return 
IMPL.compute_node_get_all_by_host(context, host)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py",
 line 258, in wrapped
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi return f(context, 
*args, **kwargs)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi   File 
"/var/lib/openstack/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py",
 line 659, in compute_node_get_all_by_host
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi raise 
exception.ComputeHostNotFound(host=host)
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi ComputeHostNotFound: 
Compute host mgt-slave03 could not be found.
  2020-01-20 06:44:53.480 1 ERROR nova.api.openstack.wsgi 
  2020-01-20 06:44:53.480 1 

[Yahoo-eng-team] [Bug 1938265] [NEW] nova-snapshot-fail-multi-store-rbd

2021-07-28 Thread Victor Coutellier
Public bug reported:

As of now, with multi store enabled, adding a new location to an image
will make the add_location API call to fail if the store metadata is
missing:

Code in glance: 
https://github.com/openstack/glance/blob/master/glance/location.py#L134
Then in glance_store: 
https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111

This will raise a "KeyError: None" and raise a very standard "Invalid
Location" 400 error when adding a new location.

Point is, with a rbd backend, nova never specify this metadata when
creating the image during the direct snapshot process (flatten the image
directly in ceph image pool + adding the location directly in glance),
so snapshot will always fail.


A solution can be to infer the backend from the location uri, like we do during 
the store metadata lazy population.

** Affects: glance
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1938265

Title:
  nova-snapshot-fail-multi-store-rbd

Status in Glance:
  New

Bug description:
  As of now, with multi store enabled, adding a new location to an image
  will make the add_location API call to fail if the store metadata is
  missing:

  Code in glance: 
https://github.com/openstack/glance/blob/master/glance/location.py#L134
  Then in glance_store: 
https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111

  This will raise a "KeyError: None" and raise a very standard "Invalid
  Location" 400 error when adding a new location.

  Point is, with a rbd backend, nova never specify this metadata when
  creating the image during the direct snapshot process (flatten the
  image directly in ceph image pool + adding the location directly in
  glance), so snapshot will always fail.

  
  A solution can be to infer the backend from the location uri, like we do 
during the store metadata lazy population.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1938265/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938262] [NEW] [stable/ussuri] Functional jobs timeout, many tests with fixtures._fixtures.timeout.TimeoutException

2021-07-28 Thread Bernard Cafarelli
Public bug reported:

This started recently (last backport successfully merged was on July 20th), 
functional tests now fail 100% on  recent backports with TIMED_OUT. For example:
https://review.opendev.org/c/openstack/neutron/+/801882
https://review.opendev.org/c/openstack/neutron/+/802528

I confirmed it with dummy change:
https://review.opendev.org/c/openstack/neutron/+/802552

Many tests fail with:
2021-07-27 17:34:20.939074 | controller |   File 
"/usr/lib/python3.6/threading.py", line 295, in wait
2021-07-27 17:34:20.939093 | controller | waiter.acquire()
2021-07-27 17:34:20.939107 | controller |
2021-07-27 17:34:20.939120 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/semaphore.py",
 line 115, in acquire
2021-07-27 17:34:20.939134 | controller | hubs.get_hub().switch()
2021-07-27 17:34:20.939147 | controller |
2021-07-27 17:34:20.939161 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py",
 line 298, in switch
2021-07-27 17:34:20.939177 | controller | return self.greenlet.switch()
2021-07-27 17:34:20.939191 | controller |
2021-07-27 17:34:20.939205 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py",
 line 350, in run
2021-07-27 17:34:20.939219 | controller | self.wait(sleep_time)
2021-07-27 17:34:20.939233 | controller |
2021-07-27 17:34:20.939246 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/poll.py",
 line 80, in wait
2021-07-27 17:34:20.939260 | controller | presult = self.do_poll(seconds)
2021-07-27 17:34:20.939316 | controller |
2021-07-27 17:34:20.939338 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/epolls.py",
 line 31, in do_poll
2021-07-27 17:34:20.939352 | controller | return self.poll.poll(seconds)
2021-07-27 17:34:20.939366 | controller |
2021-07-27 17:34:20.939380 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py",
 line 52, in signal_handler
2021-07-27 17:34:20.939394 | controller | raise TimeoutException()
2021-07-27 17:34:20.939407 | controller |
2021-07-27 17:34:20.939421 | controller | 
fixtures._fixtures.timeout.TimeoutException

Start of backtrace depends on tests (namespace creation, ip_lib, ...) so
it does look like generic issue in related package

Note this is specific to stable/ussuri, the first backport mentioned
passed in newer branches and in stable/train without issue. Functional
tests are passing in train with same OS, same python version 3.6

Nothing suspicious logged in the tests functional output itself

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938262

Title:
  [stable/ussuri] Functional jobs timeout, many tests with
  fixtures._fixtures.timeout.TimeoutException

Status in neutron:
  New

Bug description:
  This started recently (last backport successfully merged was on July 20th), 
functional tests now fail 100% on  recent backports with TIMED_OUT. For example:
  https://review.opendev.org/c/openstack/neutron/+/801882
  https://review.opendev.org/c/openstack/neutron/+/802528

  I confirmed it with dummy change:
  https://review.opendev.org/c/openstack/neutron/+/802552

  Many tests fail with:
  2021-07-27 17:34:20.939074 | controller |   File 
"/usr/lib/python3.6/threading.py", line 295, in wait
  2021-07-27 17:34:20.939093 | controller | waiter.acquire()
  2021-07-27 17:34:20.939107 | controller |
  2021-07-27 17:34:20.939120 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/semaphore.py",
 line 115, in acquire
  2021-07-27 17:34:20.939134 | controller | hubs.get_hub().switch()
  2021-07-27 17:34:20.939147 | controller |
  2021-07-27 17:34:20.939161 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py",
 line 298, in switch
  2021-07-27 17:34:20.939177 | controller | return self.greenlet.switch()
  2021-07-27 17:34:20.939191 | controller |
  2021-07-27 17:34:20.939205 | controller |   File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py",
 line 350, in run
  2021-07-27 17:34:20.939219 | controller | self.wait(sleep_time)
  2021-07-27 17:34:20.939233 | controller |
  2021-07-27 17:34:20.939246 | controller |   File 

[Yahoo-eng-team] [Bug 1938261] [NEW] [ovn]Router scheduler failing for config "default_availability_zones"

2021-07-28 Thread ZhouHeng
Public bug reported:

I have 3 gateway chassis and all available zones are nova, have 1 chassis.
The default_availability_zones=zone1 are configured in the neutron.conf.

I create router and not set availability_zone_hits, I can create router
success and router's availability_zones=zone1, though ovn-nbctl command,
can found router's gateway_chassis include all chassis(4 nodes).

I think should fail in this case, indicating that the available_zone
does not exist.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938261

Title:
  [ovn]Router scheduler failing for config  "default_availability_zones"

Status in neutron:
  New

Bug description:
  I have 3 gateway chassis and all available zones are nova, have 1 chassis.
  The default_availability_zones=zone1 are configured in the neutron.conf.

  I create router and not set availability_zone_hits, I can create
  router success and router's availability_zones=zone1, though ovn-nbctl
  command, can found router's gateway_chassis include all chassis(4
  nodes).

  I think should fail in this case, indicating that the available_zone
  does not exist.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1938261/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp