[Yahoo-eng-team] [Bug 1813551] Re: [OVN]Missing ingress QoS in OVN

2020-02-07 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/703537
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=dcec852b7f091c67a378db96c4841c3eec0d496a
Submitter: Zuul
Branch:master

commit dcec852b7f091c67a378db96c4841c3eec0d496a
Author: Yunxiang Tao 
Date:   Mon Feb 3 16:40:19 2020 +0800

[OVN] Update QoS related code from networking-ovn

In terms of [1], patch [0] has import the lasted code of ovn_client.py, but 
not
"/networking_ovn/ml2/qos_driver.py", so this patch will update it.

[0] https://review.opendev.org/#/c/697316/
[1] https://review.opendev.org/#/c/692084/

Change-Id: Iefff6cdf070d234c4ea9c8e1d5fdfe4542eb7fa3
Closes-Bug: #1813551


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1813551

Title:
  [OVN]Missing ingress QoS in OVN

Status in neutron:
  Fix Released

Bug description:
  Currently, OpenvSwitch is supported QoS for both directions include
  ingress and egress. OVN is using internal OvS so OVN can support too.

  
  But When I saw source code in [1], OVN support egress only. So we need take 
some works for that.

  
  [1] - 
https://github.com/openstack/networking-ovn/blob/master/networking_ovn/ml2/qos_driver.py#L38

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1813551/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1830763] Related fix merged to neutron (master)

2020-02-07 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/704686
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=18d8d3973a532a36120c2c58136683e834a5e405
Submitter: Zuul
Branch:master

commit 18d8d3973a532a36120c2c58136683e834a5e405
Author: Slawek Kaplonski 
Date:   Tue Jan 28 16:52:29 2020 +0100

Revert "[DVR] Add lock during creation of FIP agent gateway port"

This reverts commit 7b81c1bc67d2d85e03b4c96a8c1c558a2f909836.

It isn't needed anymore with new solution with lock "on db level"
which is introduced in follow-up patch.

Change-Id: Ibf15ee1969f902e8a266825934d9ac963353f0a0
Related-Bug: #1830763


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1830763

Title:
  Debug neutron-tempest-plugin-dvr-multinode-scenario failures

Status in neutron:
  Fix Released

Bug description:
  This bug is meant to track the activities to debug the neutron-
  tempest-plugin-dvr-multinode-scenario job. We start by trying to
  isolate failures in this test case:
  
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22test_connectivity_through_2_routers%5C%22%20AND%20build_status:%5C%22FAILURE%5C%22%20AND%20build_branch:%5C%22master%5C%22%20AND%20build_name:%5C
  %22neutron-tempest-plugin-dvr-multinode-
  scenario%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1830763/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1862425] [NEW] Setting mem_stats_period_seconds=0 should prevent the “Virtio memory balloon” driver from loading

2020-02-07 Thread Albert Braden
Public bug reported:

Setting mem_stats_period_seconds=0 in nova.conf should prevent the
“Virtio memory balloon” driver from loading but it doesn't.

We are running Rocky installed with openstack-ansible. To reproduce the
error:

1. In nova.conf set "mem_stats_period_seconds = 0" on controllers and 
hypervisors
2. Restart nova services on controllers and hypervisors
3. Build VM
4. Log into VM and type: lspci
5. lspci output will include "Red Hat, Inc. Virtio memory balloon"

For more information please see mailing list thread:

http://lists.openstack.org/pipermail/openstack-
discuss/2020-February/012336.html

The problem this causes is that the Virtio memory balloon driver is not
able to address large amounts of RAM. We encountered the problem when we
built VMs with 1.4T RAM. The VM cannot boot because the driver fails:

"BUG: unable to handle kernel paging request at 988b19478000"


root@us01odc-dev2-ctrl1:~# dpkg -l | grep nova
ii  nova-api  2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute - API frontend
ii  nova-common   2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute - common files
ii  nova-conductor2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute - conductor service
ii  nova-novncproxy   2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute - NoVNC proxy
ii  nova-placement-api2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute - placement API frontend
ii  nova-scheduler2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute - virtual machine scheduler
ii  python-nova   2:18.2.3-0ubuntu1~cloud0  
  all  OpenStack Compute Python 2 libraries
ii  python-novaclient 2:11.0.0-0ubuntu1~cloud0  
  all  client library for OpenStack Compute API - Python 2.7

root@us01odc-dev2-hv002:~# virsh --version
4.0.0

root@us01odc-dev2-hv002:~# qemu-system-x86_64 --version
QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.21)

root@us01odc-dev2-hv002:~# nova --version
11.0.0

root@us01odc-dev2-hv002:~# openstack --version
openstack 3.16.1

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862425

Title:
  Setting mem_stats_period_seconds=0 should prevent the “Virtio memory
  balloon” driver from loading

Status in OpenStack Compute (nova):
  New

Bug description:
  Setting mem_stats_period_seconds=0 in nova.conf should prevent the
  “Virtio memory balloon” driver from loading but it doesn't.

  We are running Rocky installed with openstack-ansible. To reproduce
  the error:

  1. In nova.conf set "mem_stats_period_seconds = 0" on controllers and 
hypervisors
  2. Restart nova services on controllers and hypervisors
  3. Build VM
  4. Log into VM and type: lspci
  5. lspci output will include "Red Hat, Inc. Virtio memory balloon"

  For more information please see mailing list thread:

  http://lists.openstack.org/pipermail/openstack-
  discuss/2020-February/012336.html

  The problem this causes is that the Virtio memory balloon driver is
  not able to address large amounts of RAM. We encountered the problem
  when we built VMs with 1.4T RAM. The VM cannot boot because the driver
  fails:

  "BUG: unable to handle kernel paging request at 988b19478000"

  
  root@us01odc-dev2-ctrl1:~# dpkg -l | grep nova
  ii  nova-api  2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute - API frontend
  ii  nova-common   2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute - common files
  ii  nova-conductor2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute - conductor service
  ii  nova-novncproxy   2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute - NoVNC proxy
  ii  nova-placement-api2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute - placement API frontend
  ii  nova-scheduler2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute - virtual machine scheduler
  ii  python-nova   2:18.2.3-0ubuntu1~cloud0
all  OpenStack Compute Python 2 libraries
  ii  python-novaclient 2:11.0.0-0ubuntu1~cloud0
all  client library for OpenStack Compute API - Python 2.7

  root@us01odc-dev2-hv002:~# virsh --version
  4.0.0

  

[Yahoo-eng-team] [Bug 1862417] Re: cloud-init: Attempt to mkswap on a partition fails: invalid block count argument: ''

2020-02-07 Thread Ryan Harper
 The machine I'm working on uses cloud-init to update itself, it might 
only have the fix after the updates.
 ah, interesting
 cat /etc/cloud/build.info  
 that'll give us a point in time for which version you have 
 and I suspect you're right, the top of your cloud-init.log will the 
original version 
 build_name: serverserial: 20190514
 Definitely way older than the PR.
 yep
 I suspect the fix in our case is to use the latest image of Ubuntu 
from end Jan 2020.
 yep

** Changed in: cloud-init
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1862417

Title:
  cloud-init: Attempt to mkswap on a partition fails: invalid block
  count argument: ''

Status in cloud-init:
  Invalid

Bug description:
  If an attempt is made to configure a swap partition on an Ubuntu
  Bionic machine as follows (not a swap file, a swap partition), the
  attempt to mkswap fails.

  The expected behaviour is that mkswap and swapon are executed
  correctly, and /dev/xvdg becomes a valid swap disk. In addition, when
  filename points at a partition, size and maxsize should be ignored.

  fs_setup:
- label: vidi
  device: /dev/xvde
  filesystem: ext4
- label: swap
  device: /dev/xvdg
  filesystem: swap
  mounts:
  - [ /dev/xvde, /var/lib/vidispine, ext4, defaults, 0, 0 ]
  - [ /dev/xvdg, none, swap, sw, 0, 0 ]
  swap:
  filename: /dev/xvdg
  size: auto
  maxsize: 17179869184
  mount_default_fields: [ None, None, "auto", "defaults", "0", "2" ]

  When the machine starts up for the first time, the following error is
  logged after the swap size parameter is passed as the empty string:

  2020-02-07 20:21:55,242 - cc_disk_setup.py[WARNING]: Force flag for swap is 
unknown.
  2020-02-07 20:21:55,255 - util.py[WARNING]: Failed during filesystem operation
  Failed to exec of '['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']':
  Unexpected error while running command.
  Command: ['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']
  Exit code: 1
  Reason: -
  Stdout: 
  Stderr: mkswap: invalid block count argument: ''
  2020-02-07 20:21:55,530 - cc_mounts.py[WARNING]: Activate mounts: FAIL:swapon 
-a
  2020-02-07 20:21:55,530 - util.py[WARNING]: Activate mounts: FAIL:swapon -a

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1862417/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1862417] [NEW] cloud-init: Attempt to mkswap on a partition fails: invalid block count argument: ''

2020-02-07 Thread Graham Leggett
Public bug reported:

If an attempt is made to configure a swap partition on an Ubuntu Bionic
machine as follows (not a swap file, a swap partition), the attempt to
mkswap fails.

The expected behaviour is that mkswap and swapon are executed correctly,
and /dev/xvdg becomes a valid swap disk. In addition, when filename
points at a partition, size and maxsize should be ignored.

fs_setup:
  - label: vidi
device: /dev/xvde
filesystem: ext4
  - label: swap
device: /dev/xvdg
filesystem: swap
mounts:
- [ /dev/xvde, /var/lib/vidispine, ext4, defaults, 0, 0 ]
- [ /dev/xvdg, none, swap, sw, 0, 0 ]
swap:
filename: /dev/xvdg
size: auto
maxsize: 17179869184
mount_default_fields: [ None, None, "auto", "defaults", "0", "2" ]

When the machine starts up for the first time, the following error is
logged after the swap size parameter is passed as the empty string:

2020-02-07 20:21:55,242 - cc_disk_setup.py[WARNING]: Force flag for swap is 
unknown.
2020-02-07 20:21:55,255 - util.py[WARNING]: Failed during filesystem operation
Failed to exec of '['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']':
Unexpected error while running command.
Command: ['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']
Exit code: 1
Reason: -
Stdout: 
Stderr: mkswap: invalid block count argument: ''
2020-02-07 20:21:55,530 - cc_mounts.py[WARNING]: Activate mounts: FAIL:swapon -a
2020-02-07 20:21:55,530 - util.py[WARNING]: Activate mounts: FAIL:swapon -a

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1862417

Title:
  cloud-init: Attempt to mkswap on a partition fails: invalid block
  count argument: ''

Status in cloud-init:
  New

Bug description:
  If an attempt is made to configure a swap partition on an Ubuntu
  Bionic machine as follows (not a swap file, a swap partition), the
  attempt to mkswap fails.

  The expected behaviour is that mkswap and swapon are executed
  correctly, and /dev/xvdg becomes a valid swap disk. In addition, when
  filename points at a partition, size and maxsize should be ignored.

  fs_setup:
- label: vidi
  device: /dev/xvde
  filesystem: ext4
- label: swap
  device: /dev/xvdg
  filesystem: swap
  mounts:
  - [ /dev/xvde, /var/lib/vidispine, ext4, defaults, 0, 0 ]
  - [ /dev/xvdg, none, swap, sw, 0, 0 ]
  swap:
  filename: /dev/xvdg
  size: auto
  maxsize: 17179869184
  mount_default_fields: [ None, None, "auto", "defaults", "0", "2" ]

  When the machine starts up for the first time, the following error is
  logged after the swap size parameter is passed as the empty string:

  2020-02-07 20:21:55,242 - cc_disk_setup.py[WARNING]: Force flag for swap is 
unknown.
  2020-02-07 20:21:55,255 - util.py[WARNING]: Failed during filesystem operation
  Failed to exec of '['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']':
  Unexpected error while running command.
  Command: ['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']
  Exit code: 1
  Reason: -
  Stdout: 
  Stderr: mkswap: invalid block count argument: ''
  2020-02-07 20:21:55,530 - cc_mounts.py[WARNING]: Activate mounts: FAIL:swapon 
-a
  2020-02-07 20:21:55,530 - util.py[WARNING]: Activate mounts: FAIL:swapon -a

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1862417/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1636466] Re: HA router interface points to wrong host after network disruption

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1636466

Title:
  HA router interface points to wrong host after network disruption

Status in neutron:
  Won't Fix

Bug description:
  If overlay network of a network node is down for a while, the slave node of 
HA router can't receive the VRRP packet, so it will promote itself as the 
master node. Then L3 agent updates ha_state of the router bound with itself to 
active and updates port bindings of the router interfaces to the associated 
host.
  After network recovery, one of the two master nodes of a HA router will be 
degraded to the slave node. If the degraded node is exactly the previous slave 
node, L3 agent updates ha_state of the router bound with itself to standby but 
won't update port bindings of the router interfaces to the host hosting the 
original master node. Then packets sent to the router are sent to the slave 
node because l2pop uses the incorrect port bindings.
  As the keepalived configuration priority are the same 50, the probability of 
occurrence of the above problem in two network node scenario is 50%.

  How to reproduce:
  - two network nodes: host1, host2.

  - create a ha router: router1, a network: network1 and a subnet: subnet1, add 
interface of subnet1 to router1.
  $ neutron l3-agent-list-hosting-router subnet1
  
+--+++---+--+
  | id   | host   | admin_state_up | alive 
| ha_state |
  
+--+++---+--+
  | 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1  | True   | :-)   
| standby  |
  | 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2  | True   | :-)   
| active   |
  
+--+++---+--+
  $ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips
  
+--+-+--+
  | id   | binding:host_id | fixed_ips  
  |
  
+--+-+--+
  | 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1   | {"subnet_id": 
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} |
  | b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host2   | {"subnet_id": 
"75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"}  |
  | ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2   | {"subnet_id": 
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} |
  
+--+-+--+

  - disconnect host1 from the overlay network, wait until the 
l3-agent-list-hosting-router api show that the two ha_state of router1 are both 
active.
  $ neutron l3-agent-list-hosting-router subnet1
  
+--+++---+--+
  | id   | host   | admin_state_up | alive 
| ha_state |
  
+--+++---+--+
  | 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1  | True   | :-)   
| active   |
  | 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2  | True   | :-)   
| active   |
  
+--+++---+--+
  $ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips
  
+--+-+--+
  | id   | binding:host_id | fixed_ips  
  |
  
+--+-+--+
  | 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1   | {"subnet_id": 
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} |
  | b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host1   | {"subnet_id": 
"75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"}  |
  | ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2   | {"subnet_id": 
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} |
  

[Yahoo-eng-team] [Bug 1680183] Re: neutron-keepalived-state-change fails with "AssertionError: do not call blocking functions from the mainloop"

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1680183

Title:
  neutron-keepalived-state-change fails with "AssertionError: do not
  call blocking functions from the mainloop"

Status in neutron:
  Fix Released

Bug description:
  17:39:17.802 6173 CRITICAL neutron [-] AssertionError: do not call blocking 
functions from the mainloop
  17:39:17.802 6173 ERROR neutron Traceback (most recent call last):
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/bin/neutron-keepalived-state-change", 
line 10, in 
  17:39:17.802 6173 ERROR neutron sys.exit(main())
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/cmd/keepalived_state_change.py", line 19, in main
  17:39:17.802 6173 ERROR neutron keepalived_state_change.main()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 157, in 
main
  17:39:17.802 6173 ERROR neutron cfg.CONF.monitor_cidr).start()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/linux/daemon.py", line 249, in start
  17:39:17.802 6173 ERROR neutron self.run()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 70, in 
run
  17:39:17.802 6173 ERROR neutron for iterable in self.monitor:
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/linux/async_process.py", line 256, in 
_iter_queue
  17:39:17.802 6173 ERROR neutron yield queue.get(block=block)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/queue.py",
 line 313, in get
  17:39:17.802 6173 ERROR neutron return waiter.wait()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/queue.py",
 line 141, in wait
  17:39:17.802 6173 ERROR neutron return get_hub().switch()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/hub.py",
 line 294, in switch
  17:39:17.802 6173 ERROR neutron return self.greenlet.switch()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/hub.py",
 line 346, in run
  17:39:17.802 6173 ERROR neutron self.wait(sleep_time)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/poll.py",
 line 85, in wait
  17:39:17.802 6173 ERROR neutron presult = self.do_poll(seconds)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/epolls.py",
 line 62, in do_poll
  17:39:17.802 6173 ERROR neutron return self.poll.poll(seconds)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 134, in 
handle_sigterm
  17:39:17.802 6173 ERROR neutron self._kill_monitor()
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 131, in 
_kill_monitor
  17:39:17.802 6173 ERROR neutron run_as_root=True)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/linux/utils.py", line 221, in kill_process
  17:39:17.802 6173 ERROR neutron execute(['kill', '-%d' % signal, pid], 
run_as_root=run_as_root)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/neutron/agent/linux/utils.py", line 155, in execute
  17:39:17.802 6173 ERROR neutron greenthread.sleep(0)
  17:39:17.802 6173 ERROR neutron   File 
"/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/greenthread.py",
 line 31, in sleep
  17:39:17.802 6173 ERROR neutron assert hub.greenlet is not current, 'do 
not call blocking functions from the mainloop'
  17:39:17.802 6173 ERROR neutron AssertionError: do not call blocking 
functions from the mainloop
  17:39:17.802 6173 ERROR neutron

  This is what I see when running fullstack l3ha tests, once I enable
  syslog logging for the helper process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1680183/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1666959] Re: ha_vrrp_auth_type defaults to PASS which is insecure

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1666959

Title:
  ha_vrrp_auth_type defaults to PASS which is insecure

Status in neutron:
  Won't Fix
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  With l3_ha enabled, ha_vrrp_auth_type defaults to PASS authentication:

  
https://github.com/openstack/neutron/blob/b90ec94dc3f83f63bdb505ace1e4c272435c494b/neutron/conf/agent/l3/ha.py#L28

  which according to http://louwrentius.com/configuring-attacking-and-
  securing-vrrp-on-linux.html is totally insecure because the VRRP
  password is transmitted in the clear.

  I'm not sure if this is currently a serious issue, since if the VRRP
  network is untrusted, maybe there are already bigger problems.  But I
  thought it was worth reporting, at least.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1666959/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1641811] Re: Wrong ha_state, when l3-agent that host the active router is down

2020-02-07 Thread Brian Haley
There have been a number of issues fixed in this area the past few
releases, closing.  If it still happens on a newer release please re-
open.

** Changed in: neutron
   Status: Triaged => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1641811

Title:
  Wrong ha_state, when l3-agent that host the active router is down

Status in neutron:
  Invalid

Bug description:
  In an L3 HA Setup with multiple network nodes, we can query the agent
  hosting the Master HA router via l3-agent-list-hosting-router.

  root@node1:~# neutron l3-agent-list-hosting-router demo-router
  
+--+---++---+--+
  | id   | host  | admin_state_up | alive | 
ha_state |
  
+--+---++---+--+
  | 58fbfcf3-6403-4388-b713-523595411de6 | node1 | True   | :-)   | 
active   |
  | a74be278-e428-41a4-a375-9888e9b99bcd | node2 | True   | :-)   | 
standby  |
  
+--+---++---+--+

  Now, On the node1, I stop the neutron-l3-agent, and then check the
  state.

  root@node1:~# neutron l3-agent-list-hosting-router demo-router
  
+--+---++---+--+
  | id   | host  | admin_state_up | alive | 
ha_state |
  
+--+---++---+--+
  | 58fbfcf3-6403-4388-b713-523595411de6 | node1 | True   | xxx   | 
standby  |
  | a74be278-e428-41a4-a375-9888e9b99bcd | node2 | True   | :-)   | 
standby  |
  
+--+---++---+--+

  You can see that there is no "active" router, but north-south traffic
  is still though the node1 and the keepalived work normally. I think
  the ha_state of node1 shoud be "active".

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1641811/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1668410] Re: [SRU] Infinite loop trying to delete deleted HA router

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1668410

Title:
  [SRU] Infinite loop trying to delete deleted HA router

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive mitaka series:
  Fix Released
Status in neutron:
  Fix Released
Status in OpenStack Security Advisory:
  Won't Fix
Status in neutron package in Ubuntu:
  Invalid
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Descriptoin]

  When deleting a router the logfile is filled up. See full log -
  http://paste.ubuntu.com/25429257/

  I can see the error 'Error while deleting router
  c0dab368-5ac8-4996-88c9-f5d345a774a6' occured 3343386 times from
  _safe_router_removed() [1]:

  $ grep -r 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' 
|wc -l
  3343386

  This _safe_router_removed() is invoked by L488 [2], if
  _safe_router_removed() goes wrong it will return False, then
  self._resync_router(update) [3] will make the code
  _safe_router_removed be run again and again. So we saw so many errors
  'Error while deleting router X'.

  [1] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L361
  [2] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488
  [3] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L457

  [Test Case]

  That's because race condition between neutron server and L3 agent,
  after neutron server deletes HA interfaces the L3 agent may sync a HA
  router without HA interface info (just need to trigger L708[1] after
  deleting HA interfaces and before deleting HA router). If we delete HA
  router at this time, this problem will happen. So test case we design
  is as below:

  1, First update fixed package, and restart neutron-server by 'sudo
  service neutron-server restart'

  2, Create ha_router

  neutron router-create harouter --ha=True

  3, Delete ports associated with ha_router before deleting ha_router

  neutron router-port-list harouter |grep 'HA port' |awk '{print $2}' |xargs -l 
neutron port-delete
  neutron router-port-list harouter

  4, Update ha_router to trigger l3-agent to update ha_router info
  without ha_port into self.router_info

  neutron router-update harouter --description=test

  5, Delete ha_router this time

  neutron router-delete harouter

  [1] https://github.com/openstack/neutron/blob/mitaka-
  eol/neutron/db/l3_hamode_db.py#L708

  [Regression Potential]

  The fixed patch [1] for neutron-server will no longer return ha_router
  which is missing ha_ports, so L488 will no longer have chance to call
  _safe_router_removed() for a ha_router, so the problem has been
  fundamentally fixed by this patch and no regression potential.

  Besides, this fixed patch has been in mitaka-eol branch now, and
  neutron-server mitaka package is based on neutron-8.4.0, so we need to
  backport it to xenial and mitaka.

  $ git tag --contains 8c77ee6b20dd38cc0246e854711cb91cffe3a069
  mitaka-eol

  [1] https://review.openstack.org/#/c/440799/2/neutron/db/l3_hamode_db.py
  [2] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1668410/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1510757] Re: tempest test api.network; l3 agent can't delete HA-router

2020-02-07 Thread Brian Haley
Closing as I don't think this happens any more.

** Changed in: neutron
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1510757

Title:
  tempest test api.network;l3 agent can't delete HA-router

Status in neutron:
  Invalid

Bug description:
  I use tempest test my company's openstack environment.

  # tox -eall -- tempest.api.network

  When tox finished. Log in L3-agent always show:

  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent [-] Error while 
deleting router da4b28ce-33b1-4000-8609-a41a2ab8c982
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 344, 
in _safe_router_removed
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 362, 
in _router_removed
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent ri.delete(self)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 
364, in delete
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent 
super(HaRouter, self).delete(agent)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 
273, in delete
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent 
self.process(agent)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 
370, in process
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent 
super(HaRouter, self).process(agent)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/common/utils.py", line 359, in 
call
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent self.logger(e)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 197, in 
__exit__
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent 
six.reraise(self.type_, self.value, self.tb)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/common/utils.py", line 356, in 
call
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent return 
func(*args, **kwargs)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 
695, in process
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent 
self.routes_updated()
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 
181, in routes_updated
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent instance = 
self._get_keepalived_instance()
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent   File 
"/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 
131, in _get_keepalived_instance
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent return 
self.keepalived_manager.config.get_instance(self.ha_vr_id)
  2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent AttributeError: 
'NoneType' object has no attribute 'config'

  I think the reason is tempest create and delete router too fast.
  when l3-agent create ha-router,tempest delete the router,and neutron-server 
delete ha-interface

  
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha_router.py#L79

  class HaRouter can't initialized without ha-interface information,it just 
return without _init_keepalived_manager
  When l3-agent delete router,it report AttributeError: 'NoneType'.

  When l3 agent can't delete the router,l3 agent always fullsync with 
neutron-server every 30 seconds.
  In controller, the neutron-server cpu always 70%..^-^

  L3 agent should add a check before create HArouter.If it find ha-
  interface is none ,means router has been deleted in neutron-server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1510757/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1574092] Re: No router namespace after creating legacy router

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1574092

Title:
  No router namespace after creating legacy router

Status in neutron:
  Won't Fix

Bug description:
  In case there are some temporary MQ connectivity problems during
  router creation, notification sent by l3_notifier via rpc cast gets
  lost. This leads to the absence of qrouter namespace on controllers.

  The issue was first faced on mos HA (3 controllers) build -
  https://bugs.launchpad.net/mos/10.0.x/+bug/1529820

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1574092/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1828547] Re: neutron-dynamic-routing TypeError: argument of type 'NoneType' is not iterable

2020-02-07 Thread Brian Haley
** Project changed: neutron => networking-bgp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1828547

Title:
  neutron-dynamic-routing TypeError: argument of type 'NoneType' is not
  iterable

Status in networking-bgp:
  New

Bug description:
  Rocky with Ryu, dont have a reproduce on this one or don't know what
  caused it in the first place.

  python-neutron-13.0.3-1.el7.noarch
  openstack-neutron-openvswitch-13.0.3-1.el7.noarch
  python2-neutron-dynamic-routing-13.0.1-1.el7.noarch
  openstack-neutron-bgp-dragent-13.0.1-1.el7.noarch
  openstack-neutron-common-13.0.3-1.el7.noarch
  openstack-neutron-ml2-13.0.3-1.el7.noarch
  python2-neutronclient-6.9.0-1.el7.noarch
  openstack-neutron-13.0.3-1.el7.noarch
  openstack-neutron-dynamic-routing-common-13.0.1-1.el7.noarch
  python2-neutron-lib-1.18.0-1.el7.noarch

  
  python-ryu-common-4.26-1.el7.noarch
  python2-ryu-4.26-1.el7.noarch

  
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in 
_process_incoming
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, 
in dispatch
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, 
in _do_dispatch
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in 
inner
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server return 
f(*args, **kwargs)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py",
 line 185, in bgp_speaker_create_end
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server 
self.add_bgp_speaker_helper(bgp_speaker_id)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py",
 line 249, in add_bgp_speaker_helper
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server 
self.add_bgp_speaker_on_dragent(bgp_speaker)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py",
 line 359, in add_bgp_speaker_on_dragent
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server 
self.add_bgp_peers_to_bgp_speaker(bgp_speaker)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py",
 line 390, in add_bgp_peers_to_bgp_speaker
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server bgp_peer)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py",
 line 399, in add_bgp_peer_to_bgp_speaker
  2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server 
self.cache.put_bgp_peer(bgp_speaker_id, bgp_peer)
  2019-05-09 

[Yahoo-eng-team] [Bug 1661717] Re: [linuxbridge agent] vm can't communicate with router with l2pop

2020-02-07 Thread Brian Haley
Since there is a workaround, and this is only regarding the Linux Bridge
agent, which is not actively maintained, I'm closing this since it
doesn't seem like it will be fixed.

** Changed in: neutron
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1661717

Title:
  [linuxbridge agent] vm can't communicate with router with l2pop

Status in neutron:
  Won't Fix

Bug description:
  When both l2pop and arp_responder enabled for linuxbridge agent, vxlan
  device is created in "proxy" mode. In this mode, ARP entry must be
  statically added by linuxbridge agent. Because of [1], l2pop driver
  won't notify HA router port, so linuxbridge agent can't add ARP entry
  for router port. As there is no router ARP entry, vxlan device is
  dropping ARP request packets from vm(destined to router), making vm
  unable to communicate with router.

  This issue is only on linuxbridge agent and not on ovs agent.
  Temporary solution for vm to communicate with HA router is to disable 
arp_responder when l2pop is enabled.
  If the users need both arp_responder and l2pop features for linuxbridge 
agent, we need an implementation which decouples them i.e 
https://bugs.launchpad.net/neutron/+bug/1518392

  [1] https://review.openstack.org/#/c/255237/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1661717/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1252900] Re: Directional network performance issues with Neutron + OpenvSwitch

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Incomplete => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1252900

Title:
  Directional network performance issues with Neutron + OpenvSwitch

Status in neutron:
  Won't Fix
Status in openstack-manuals:
  Fix Released
Status in openvswitch:
  New
Status in Ubuntu:
  Confirmed

Bug description:
  Hello!

  Currently, Havana L3 Router have a serious issue. Which makes it
  almost useless (sorry, I do not want to be rude but instead, trying to
  bring more attention to this problem).

  When the tenant network traffic pass trough the L3 Router (Namespace
  at the Network Node), it becomes very, very slow and intermittent. The
  issue also affects the traffic that hit a "Floating IP", going into
  the Tenant subnet.

  The affected topology is: "Per-Tenant Router with Private Networks".

  As a reference, I'm using the following Grizzly guide for my Havana
  deployment:

  https://github.com/mseknibilel/OpenStack-Grizzly-Install-
  Guide/blob/OVS_MultiNode/OpenStack_Grizzly_Install_Guide.rst

  Extra info:

  http://docs.openstack.org/havana/install-guide/install/apt/content
  /section_networking-routers-with-private-networks.html

  The symptoms are:

  1- "Slow connection to Canonical or when browsing the web from within
  a tenant subnet"

  aptitude update ; aptitude safe-upgrade

  From within a Tenant instance, it will take about 1 hour to finish, on
  a link capable of finishing it in 2~3 minutes.

  2- SSH connection using Floating IPs froze 10 times per minute.

  Connecting from the outside world, into a Instance using its Floating
  IP address, is a pain.

  We're talking about this issue at the OpenStack mail list, here is the
  related thread:
  http://lists.openstack.org/pipermail/openstack/2013-November/002705.html

  Also, I made a video about it, watch it here:
  http://www.youtube.com/watch?v=jVjiphMuuzM

  Tested versions:

  * OpenStack Havana on top of Ubuntu 12.04.3 using Ubuntu Cloud Archive

  * Tested with Open vSwitch versions (none of it works):

  1.10.2 from UCA
  1.11.0 compiled for Ubuntu 12.04.3 using "dpkg-buildpackage"
  1.9.0 from Ubuntu package "openvswitch-datapath-lts-raring-dkms"

  * Not tested (maybe it will work):

  Havana with Ubuntu 12.04.1 + OVS 1.4.0 (does not support VXLAN).

  * Tenant subnet tested types:

  VXLAN
  GRE
  VLAN

  It does not matter the subnet type you choose, it will be always slow.

  Apparently, if you upgrade your Grizzly from Ubuntu 12.04.1 + OVS
  1.4.0, to Ubuntu 12.04.3 with OVS 1.9.0, it will trigger this problem
  when with  Grizzly too. So, I think that this problem might be related
  to Open vSwitch itself. But I need more time to check this.

  My private cloud computing based on Havana is open for you guys to
  debug it, just ask for an access!   =)

  My current plan it to test Havana with OVS 1.4.0 but, I don't have too
  much time this week to do this job.

  I'm not sure if the problem is with OVS or not, I'll try to test it
  this week.

  Also, at my video, you guys can see how I "fixed" it, by starting a
  Squid proxy-cache server within the Tenant Namespece Router, proving
  that the problem appear ONLY when you try to establish a connection
  from a tenant subnet, directly to the External network.

  I mean, the connection between a tenant and its router is okay, from
  its router to the Internet, is also okay but, from a tenant to the
  Internet, is not. So, Squid was a perfect choice to verify this theory
  at the Namespace router... And Voialá! "There I fixed it"!   =P

  Please, let me know what configuration files do you guys will need to
  be able to reproduce this problem.

  Best!
  Thiago

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1252900/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1845557] Re: DVR: FWaaS rules created for a router after the FIP and VM created, not applied to routers rfp port on router-update

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1845557

Title:
  DVR: FWaaS rules created for a router after the FIP and VM created,
  not applied to routers rfp port on router-update

Status in neutron:
  Fix Released

Bug description:
  This was seen in Rocky.

  When network, subnet, router and a VM instance created with a
  FloatingIP before attaching FireWall rules to the router, causes the
  Firewall rules not to be applied to the 'rfp' port for north-south
  routing when using Firewall-as-Service in legacy 'iptables' mode.

  After applying the Firewall rules to the Router, it is expected that
  the router-update would trigger adding the Firewall rules to the
  existing routers, but the logic is not right.

  Any new VMs added to the subnet on a new compute host, gets the
  Firewall rules applied to the 'rfp' interface.

  So the only way to get around this problem is to restart the
  'l3-agent'. Once the 'l3-agent' is restarted, the Firewall rules are
  applied again.

  This is also true when Firewall rules are removed after the VM and
  routers are in place, since the update is not handled properly, the
  firewall rules may stay there until we reboot the l3-agent.

  How to reproduce this problem:

  This is FWaaS v2 with legacy 'iptables':

  1. Create a Network
  2. Create a Subnet
  3. Create a Router (DVR)
  4. Attach the Subnet to the router.
  5. Assign the gateway to the router.
  6. Create a VM on the given private network.
  7. Create a FloatingIP and associate the FloatingIP to the VM's private IP.
  8. Now the VM, router, fipnamespace are all in place.
  9. Now create Firwall rules 
   neutron firewall-rule-create --protocol icmp --action allow --name allow-icmp
   neutron firewall-rule-create --protocol tcp --destination-port 80 --action 
deny --name deny-http
   neutron firewall-rule-create --protocol tcp --destination-port 22 --action 
allow --name allow-ssh
  10. Then create firewall policy
neutron firewall-policy-create --firewall-rules "allow-icmp deny-http 
allow-ssh" policy-fw
  11. Create a firewall
 neutron firewall-create policy-fw --name user-fw
  12. Check if the firewall was created:
 neutron firewall-show user-fw
  13. If the firewall was created after the router have been created, based on 
the documentation you need to manually update the router.
$ neutron firewall-update —router  —router  

  14. After the update we would expect that all existing router-1 and router-2 
to have the firewall rules.

  But we don't see if configured for the router-1 that was created before the 
firewall was created.
  And so the VM is not protected by the Firewall rules.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1845557/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1717302] Re: Tempest floatingip scenario tests failing on DVR Multinode setup with HA

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1717302

Title:
  Tempest floatingip scenario tests failing on DVR Multinode setup with
  HA

Status in neutron:
  Fix Released

Bug description:
  neutron.tests.tempest.scenario.test_floatingip.FloatingIpSameNetwork and
  neutron.tests.tempest.scenario.test_floatingip.FloatingIpSeparateNetwork are 
failing on every patch.

  This trace is seen on the node-2 l3-agent.

  Sep 13 07:16:43.404250 ubuntu-xenial-2-node-rax-dfw-10909819-895688 
neutron-keepalived-state-change[5461]: ERROR neutron.agent.linux.ip_lib [-] 
Failed sending gratuitous ARP to 172.24.5.3 on qg-bf79c157-e2 in namespace 
qrouter-796b8715-ca01-43ad-bc08-f81a0b4db8cc: Exit code: 2; Stdin: ; Stdout: ; 
Stderr: bind: Cannot assign requested address

 : ProcessExecutionError: Exit code: 2; Stdin: ; 
Stdout: ; Stderr: bind: Cannot assign requested address

 ERROR neutron.agent.linux.ip_lib Traceback (most 
recent call last):

 ERROR neutron.agent.linux.ip_lib   File 
"/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 1082, in _arping

 ERROR neutron.agent.linux.ip_lib 
ip_wrapper.netns.execute(arping_cmd, extra_ok_codes=[1])

 ERROR neutron.agent.linux.ip_lib   File 
"/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 901, in execute

 ERROR neutron.agent.linux.ip_lib 
log_fail_as_error=log_fail_as_error, **kwargs)

 ERROR neutron.agent.linux.ip_lib   File 
"/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 151, in execute

 ERROR neutron.agent.linux.ip_lib raise 
ProcessExecutionError(msg, returncode=returncode)

 ERROR neutron.agent.linux.ip_lib 
ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot 
assign requested address

 ERROR neutron.agent.linux.ip_lib

 ERROR neutron.agent.linux.ip_lib

  If this is a DVR router, then the GARP should not go through the qg
  interface for the floatingIP.

  More information can be seen here.

  http://logs.openstack.org/43/500143/5/check/gate-tempest-dsvm-neutron-
  dvr-multinode-scenario-ubuntu-xenial-
  
nv/0a58fce/logs/subnode-2/screen-q-l3.txt.gz?level=TRACE#_Sep_13_07_16_47_864052

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1717302/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1595043] Re: Make DVR portbinding implementation useful for HA ports

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1595043

Title:
  Make DVR portbinding implementation useful for HA ports

Status in neutron:
  Fix Released

Bug description:
  Make DVR portbinding implementation generic so that it will be useful
  for all distributed router ports(for example, HA router ports).

  Currently HA interface port binding is implemented as a normal port
  binding i.e it uses only ml2_port_bindings table, with host set to
  master host. When a new host becomes master, this binding will be
  updated. But this approach has issues as explained in
  https://bugs.launchpad.net/neutron/+bug/1522980

  As HA router ports(DEVICE_OWNER_HA_REPLICATED_INT, DEVICE_OWNER_ROUTER_SNAT 
for DVR+HA) are distributed ports like DVR, we will follow DVR approach of port 
binding for HA router ports.
  So we make DVR port binding generic, so that it can be used for all 
distributed router ports.

  To make DVR port binding generic for all distributed router ports, we need to
  1) rename ml2_dvr_port_bindings table to ml2_distributed_port_bindings 
  2) rename functions updating/accessing this table
  3) Replace 'if condition' for dvr port with distributed port, for example, 
replace
 if port['device_owner'] == const.DEVICE_OWNER_DVR_INTERFACE:
with
 if distributed_router_port(port):

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1595043/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1835731] Re: Neutron server error: failed to update port DOWN

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1835731

Title:
  Neutron server error: failed to update port DOWN

Status in neutron:
  Fix Released

Bug description:
  Before adding extra logging:

  2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req-
  2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update
  device d75fca78-2f64-4c5a-9a94-6684c753bf3d down

  After adding logging:

  2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req-
  2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update
  device d75fca78-2f64-4c5a-9a94-6684c753bf3d down: 'NoneType' object
  has no attribute 'started_at': AttributeError: 'NoneType' object has
  no attribute 'started_at'2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc Traceback (most recent call last):2019-07-03
  13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
  "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line
  367, in update_device_list2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc **kwargs)2019-07-03 13:35:54,701.701 17220
  ERROR neutron.plugins.ml2.rpc   File "/usr/lib/python2.7/dist-
  packages/neutron/plugins/ml2/rpc.py", line 233, in
  update_device_down2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc n_const.PORT_STATUS_DOWN, host)2019-07-03
  13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
  "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line
  319, in notify_l2pop_port_wiring2019-07-03 13:35:54,701.701 17220
  ERROR neutron.plugins.ml2.rpc agent_restarted =
  l2pop_driver.obj.agent_restarted(port_context)2019-07-03
  13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
  "/usr/lib/python2.7/dist-
  packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253,
  in agent_restarted2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc if l2pop_db.get_agent_uptime(agent) <
  cfg.CONF.l2pop.agent_boot_time:2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc   File "/usr/lib/python2.7/dist-
  packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 51, in
  get_agent_uptime2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc return
  timeutils.delta_seconds(agent.started_at,2019-07-03 13:35:54,701.701
  17220 ERROR neutron.plugins.ml2.rpc AttributeError: 'NoneType' object
  has no attribute 'started_at'

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1835731/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1766701] Re: Trunk Tests are failing often in dvr-multinode scenario job

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1766701

Title:
  Trunk Tests are failing often in dvr-multinode scenario job

Status in neutron:
  Fix Released

Bug description:
  In about 40% of test runs tests like:
  
neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle,
 example runs:
  * 
http://logs.openstack.org/03/560703/7/check/neutron-tempest-plugin-dvr-multinode-scenario/1f67afd/logs/testr_results.html.gz
  * 
http://logs.openstack.org/17/553617/19/check/neutron-tempest-plugin-dvr-multinode-scenario/a13a6fd/logs/testr_results.html.gz
  * 
http://logs.openstack.org/84/533284/5/check/neutron-tempest-plugin-dvr-multinode-scenario/1c09aa6/logs/testr_results.html.gz

  
neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity, 
example run:
  * 
http://logs.openstack.org/90/545490/9/check/neutron-tempest-plugin-dvr-multinode-scenario/c1ed535/logs/testr_results.html.gz

  
  are failing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1766701/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1753434] Re: Unbound ports floating ip not working with address scopes in DVR HA

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1753434

Title:
  Unbound ports floating ip not working with address scopes in DVR HA

Status in neutron:
  Fix Released

Bug description:
  using latest build stable Pike

  This commit properly addressed problem of unbound ports centralized
  floating Ips -
  
https://git.openstack.org/cgit/openstack/neutron/commit/?id=8b4bb9c0b057da175f2d773f8257de3e571aed4e

  However traffic towards unbound port (Octavia Pike VIP) when using address 
scopes is getting blocked in snat namespace:
  Chain neutron-l3-agent-scope (1 references)
   pkts bytes target prot opt in out source   
destination
 23  1612 DROP   all  --  anysg-775c0432-f1  anywhere 
anywhere mark match ! 0x401/0x
 
  It is working properly with centralized router HA with address scopes, and 
with DVR HA without address scopes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1753434/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1526855] Re: VMs fail to get metadata in large scale environments

2020-02-07 Thread Brian Haley
I am going to close this bug, partly because it is so old without any
updates, but also because there have been a number of improvements over
the past few cycles wrt scaling that this is probably not as much an
issue any more.

** Changed in: neutron
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1526855

Title:
  VMs fail to get metadata in large scale environments

Status in neutron:
  Won't Fix

Bug description:
  In large scale environments, instances can fail to get their metadata.
  Tests were performed in a 100 compute node environment creating 4000
  vms.  15-20 vms will fail all 20 metadata request attempts.   This has
  been reproduced multiple times with similar results.  All of the vms
  successfully obtain a private ip and are pingable but a small number
  of vms fail to be reachable via ssh.   Increasing the number of
  metadata request attempts in a Cirros test image shows that the
  metadata requests eventually will succeed.  It just takes a long time
  in some cases.   Below is a list of instance uuids I collected from
  one test and the number of metadata request attempts it took to be
  successful.  These tests were performed using stable/liberty with a
  small number of patches cherry-picked from Mitaka which were intended
  to improve performance.

  
  705c3582-f39b-482d-9a6e-d78bc033d3e7   5
  27f93574-19fe-4b88-ad6e-c518022ef66a 2
  ff668db8-196e-4ec3-82d9-f7ab5a30230557
  b3f97acb-6374-4406-9474-7bacfc3486cd42
  80c19187-7c19-4adc-ad3a-51342f00d799   51
  071f60d5-2a9a-4448-b14b-9016c9eee4eb  47
  d39f336e-0fb4-4934-b835-e791661d60f1  36
  a5627d9f-fd2d-48b0-ada2-f519a97849ee5
  3c24145e-8e11-4e79-8618-fca0416ea030   41
  a36ab8fd-4e53-4265-a2bf-6945ac5d8811   46
  a9400361-8941-4f03-b11d-0940b5499b4b  37
  7449efbd-1df6-4fcc-88d5-e4e355ae796324
  45c6a108-c18b-4284-9ede-3e5f8d7851be  30
  fbe7c6fc-6aec-464c-87b7-0800836f7754   7
  cb5a3a49-45b9-40de-8c62-903bee1925f4   37
  0c7151ce-79dc-4d55-a617-7f4182cb2194   14
  0f1c24a0-3b97-4d56-8feb-b30d67cf6852   44
  8c359465-198f-4654-84bb-f334f0400d58   10
  b3a5a3df-28c4-40c3-adba-856a0fcbd29e   55
  38ee6525-441e-4640-a998-ad89b8d3f8be   2
  07ecde16-c274-481e-8169-4febb15c7273   48
  f77cd7aa-89e2-4d2c-a89f-e19ff430e5a4 31
  b9acdba1-1794-4fa8-bbe3-ffb94f86d19b 3
  30824aa6-3df5-4a43-a701-dd33da7f704f   13
  5216ffc0-4a8d-4a3e-a4e3-5473b96ca47b   40
  999512ff-70e3-4cfd-9cb4-c5788a02fee6 4

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1526855/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1789434] Re: neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times

2020-02-07 Thread Brian Haley
** Changed in: neutron
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1789434

Title:
  neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA
  failing 100% times

Status in neutron:
  Fix Released

Bug description:
  Since few days all migration tests from DVR router fails.
  Example of failure: 
http://logs.openstack.org/37/382037/71/check/neutron-tempest-plugin-dvr-multinode-scenario/605ed17/logs/testr_results.html.gz
  May be related somehow to https://review.openstack.org/#/c/589410/ but I'm 
not sure yet.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1789434/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1861670] Re: AttributeError: 'NetworkConnectivityTest' object has no attribute 'safe_client'

2020-02-07 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/705413
Committed: 
https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=2a71a8966492adb222e6fc289e77f7afc681d082
Submitter: Zuul
Branch:master

commit 2a71a8966492adb222e6fc289e77f7afc681d082
Author: Slawek Kaplonski 
Date:   Mon Feb 3 11:48:34 2020 +0100

Fix test_connectivity_dvr_and_no_dvr_routers_in_same_subnet test

This patch fixes couple of issues in scenario test from
test_connectivity module.

1. Replace safe_client with client object
   In class NetworkConnectivityTest there was used safe_client
   but there is no such attribute in this class. Object "client"
   should be used instead.

2. It also fixes in the same test how external network's subnet
   ID is get from the network's info.

3. Change to use admin_client to get details of external
   network's subnet as this subnet don't belongs to tenant user
   so regular client gets 404 error while doing subnet_show
   command.

4. Check the subnets IP version to retrieve only an IPv4 one.

Change-Id: Ibebb20b29dd6ae902d194fd26ba1ea728a976286
Closes-bug: #1861670


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1861670

Title:
  AttributeError: 'NetworkConnectivityTest' object has no attribute
  'safe_client'

Status in neutron:
  Fix Released

Bug description:
  Since few days test
  
neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_dvr_and_no_dvr_routers_in_same_subnet
  is failing with error like:

  ft1.1: 
neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_dvr_and_no_dvr_routers_in_same_subnet[id-69d3650a-5c32-40bc-ae56-5c4c849ddd37]testtools.testresult.real._StringException:
 Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 108, in 
wrapper
  return func(*func_args, **func_kwargs)
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_connectivity.py",
 line 188, in test_connectivity_dvr_and_no_dvr_routers_in_same_subnet
  ext_network = self.safe_client.show_network(self.external_network_id)
  AttributeError: 'NetworkConnectivityTest' object has no attribute 
'safe_client'

  Logstash query:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'NetworkConnectivityTest'%20object%20has%20no%20attribute%20'safe_client'%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1861670/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1862394] [NEW] Nova ignores delete requests while instance is in deleting state

2020-02-07 Thread Vladyslav Drok
Public bug reported:

Right now the code in compute.api delete methods ignores delete requests
if the instance is already in deleting state
(https://github.com/openstack/nova/blob/69ce0f01b60dfe0f020ac57eb82a42e5935064c4/nova/compute/api.py#L2257-L2262).
It was result of discussion in
https://bugs.launchpad.net/nova/+bug/1248563 and mailing list thread
referenced there. Though right now, after python 2 EOL, it is possible
to allow multiple delete requests, without having to worry about delete
requests piling up waiting on the instance uuid lock, if the lock will
be acquired with timeout. Python 3 supports passing timeout argument to
lock.acquire, so it'll have to be a pretty easy change to
oslo.concurrency to allow passing that timeout through (for example
using acquire call with timeout in
https://github.com/openstack/oslo.concurrency/blob/c08159119e605dea76580032ca85834d1de21d3e/oslo_concurrency/lockutils.py#L156-L162).
The instance deletion flow could then use such way of lock acquisition,
and if it was not acquired, to allow user to retry later.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862394

Title:
  Nova ignores delete requests while instance is in deleting state

Status in OpenStack Compute (nova):
  New

Bug description:
  Right now the code in compute.api delete methods ignores delete
  requests if the instance is already in deleting state
  
(https://github.com/openstack/nova/blob/69ce0f01b60dfe0f020ac57eb82a42e5935064c4/nova/compute/api.py#L2257-L2262).
  It was result of discussion in
  https://bugs.launchpad.net/nova/+bug/1248563 and mailing list thread
  referenced there. Though right now, after python 2 EOL, it is possible
  to allow multiple delete requests, without having to worry about
  delete requests piling up waiting on the instance uuid lock, if the
  lock will be acquired with timeout. Python 3 supports passing timeout
  argument to lock.acquire, so it'll have to be a pretty easy change to
  oslo.concurrency to allow passing that timeout through (for example
  using acquire call with timeout in
  
https://github.com/openstack/oslo.concurrency/blob/c08159119e605dea76580032ca85834d1de21d3e/oslo_concurrency/lockutils.py#L156-L162).
  The instance deletion flow could then use such way of lock
  acquisition, and if it was not acquired, to allow user to retry later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1862394/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1862374] [NEW] Neutron incorrectly selects subnet

2020-02-07 Thread David O Neill
Public bug reported:

Distro bionic
Openstack version: openstack testing cloud

When using the command

openstack server add floating ip --fixed-ip-address 10.66.0.18 juju-
53a7bc-north-0 10.245.163.125

for the following machine

ubuntu@dmzoneill-bastion:~$ openstack server list -c  ID -c Name -c Networks
+--+--+-+
| ID   | Name | Networks
|
+--+--+-+
| 063c2e8e-4d57-4267-a506-4c7b336e71b6 | juju-53a7bc-north-0  | 
north=10.66.0.18; south=10.55.0.9   |
| 670bd827-f570-439f-b17f-91710849 | juju-6de968-south-0  | 
south=10.55.0.4 |
| 0cd3c498-2826-4868-9384-e12e0799f903 | juju-855490-default-0| 
dmzoneill_admin_net=10.5.0.6|
| 0b72ccd8-b694-45c7-86be-870511426140 | juju-370447-controller-0 | 
north=10.66.0.14; south=10.55.0.3; dmzoneill_admin_net=10.5.0.8 |
| 40c68cb2-4e20-4d15-a82c-4c4252b8a0da | dmzoneill-bastion| 
dmzoneill_admin_net=10.5.0.7, 10.245.162.200|
+--+--+-+

Neutron returns the error

GET call to network for 
http://10.245.161.159:9696/v2.0/ports?device_id=063c2e8e-4d57-4267-a506-4c7b336e71b6
 used request id req-d915168a-32c6-4c74-9a83-1ef090b376d8
Manager serverstack ran task network.GET.ports in 0.122786998749s
Manager serverstack running task network.PUT.floatingips
REQ: curl -g -i -X PUT 
http://10.245.161.159:9696/v2.0/floatingips/0c771099-ca95-4447-8a60-5f64a590d943
 -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.18.4 
CPython/2.7.15+" -H "Content-Type: application/json" -H "X-Auth-Token: 
{SHA1}9cf2688baf3b5c3e5dea7e2f7faa6554ee1b6bfb" -d '{"floatingip": 
{"fixed_ip_address": "10.66.0.18", "port_id": 
"6f8626ba-ae8c-492a-93ad-3f349c600a3b"}}'
http://10.245.161.159:9696 "PUT 
/v2.0/floatingips/0c771099-ca95-4447-8a60-5f64a590d943 HTTP/1.1" 400 169
RESP: [400] Content-Type: application/json Content-Length: 169 
X-Openstack-Request-Id: req-1a8be2ec-456e-483d-a4e0-5ab204c55c2d Date: Fri, 07 
Feb 2020 14:45:13 GMT Connection: keep-alive
RESP BODY: {"NeutronError": {"message": "Bad floatingip request: Port 
6f8626ba-ae8c-492a-93ad-3f349c600a3b does not have fixed ip 10.66.0.18.", 
"type": "BadRequest", "detail": ""}}

Neutron seems to look at the list of networks associated with server and
pops the last network (south) from the list.

Enutron selects the port south 6f8626ba-ae8c-492a-93ad-3f349c600a3b
which is not in that subnet and errors

ubuntu@dmzoneill-bastion:~$ openstack port list -c ID -c "Fixed IP Addresses"
+--+---+
| ID   | Fixed IP Addresses 
   |
+--+---+
| 1ad11c89-8574-49fc-9efe-b4a0b73b01eb | ip_address='10.55.0.2', 
subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7'  |
| 36b16756-01fe-49c7-9e14-41abaf0059f9 | ip_address='10.5.0.8', 
subnet_id='7edee502-ab23-46af-b446-17c233d11a94'   |
| 3e20a885-6421-4acd-b936-56b3bcb4930f | ip_address='10.5.0.6', 
subnet_id='7edee502-ab23-46af-b446-17c233d11a94'   |
| 410beab6-c2ed-4d31-bc3d-4457a3c28b5f | ip_address='10.55.0.4', 
subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7'  |
| 4e1defd4-1185-4f54-bb93-768dbf8d6436 | ip_address='10.66.0.14', 
subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce' |
| 56d6a38d-0698-491b-9996-b239a7d95d5b | ip_address='10.55.0.3', 
subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7'  |
| 67558707-ed0b-47b6-be0e-0da0bf2871b5 | ip_address='10.66.0.2', 
subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce'  |
| 6d3c6101-75c3-47d0-a496-83cc62092f2d | ip_address='10.5.0.7', 
subnet_id='7edee502-ab23-46af-b446-17c233d11a94'   |
| 6f8626ba-ae8c-492a-93ad-3f349c600a3b | ip_address='10.55.0.9', 
subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7'  |
| 778899dd-7c15-47a9-8968-261088aa14bf | ip_address='10.66.0.18', 
subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce' |
| 7a07c64b-6d7b-4a57-b553-96459337f4cc | ip_address='10.66.0.1', 
subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce'  |
| 8ea03785-c046-4840-be82-0cd4bad5b4e8 | ip_address='10.5.0.2', 
subnet_id='7edee502-ab23-46af-b446-17c233d11a94'   |
| cf0f0aea-e962-4ddf-ac7a-22c43ad483b0 | ip_address='10.5.0.1', 
subnet_id='7edee502-ab23-46af-b446-17c233d11a94'   |

[Yahoo-eng-team] [Bug 1862375] [NEW] Subsequent nova-api volume attach request waiting for previous one to complete

2020-02-07 Thread Martin Matyáš
Public bug reported:

Description
===
Subsequent nova-api requests for attachment of different volumes to the same VM 
are blocking and waiting for the previous attach action to be finished and 
"in-use" state. In my opinion, this is unnecessary and can lead to timeouting 
errors. Observed on Openstack Rocky.

Steps to reproduce
==
Preconditions:
- cinder configured with a backend storage, best if a HW storage is used where 
the attach action takes considerable time - say >10s
- 1 VM ("vm")
- 2 volumes ("vol1", "vol2")

Actions:

1. 
$ openstack server add volume vm vol1
-> is accepted immediately by nova-api

2. immediately after (1.), when the vol1 is being attached, run
$ openstack server add volume vm vol2
-> this openstack command (aka nova-api call) blocks and does not return until 
the volume attach command in (1.) is completed and vol1 is "in-use" state.

Expected result
===
Step (2.) should be immediately accepted and handled asychronously. I don't see 
a reason why step (2.) should wait until volume from step (1.) is "in-use"

Logs

In cases, when the attachment of (1.) takes more than 60s, it leads to an error 
of (2.) with following messaging timeout, which also exposes where the call 
waits - obviously reserve_block_device_name to a compute node :
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi 
[req-44c4d473-9916-4d73-82d6-0115a1305f2a 0b5290e72cf546cb9e1921d81abb303c 
b21f6c73cba24a4280156f1d3b77af98 - default default] Unexpected exception in API 
method: MessagingTimeout:
 Timed out waiting for a reply to message ID 3af45090624b4fa29425e6fc05f41149
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi Traceback (most recent 
call last):
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 801, in 
wrapped
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return f(*args, 
**kwargs)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return func(*args, 
**kwargs)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return func(*args, 
**kwargs)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/compute/volumes.py", line 
336, in create
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi 
supports_multiattach=supports_multiattach)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 205, in inner
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return 
function(self, context, instance, *args, **kwargs)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 153, in inner
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return f(self, 
context, instance, *args, **kw)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 4172, in 
attach_volume
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi 
supports_multiattach=supports_multiattach)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 4047, in 
_attach_volume
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi 
device_type=device_type, tag=tag)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 3958, in 
_create_volume_bdm
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi 
multiattach=volume['multiattach'])
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 897, in 
reserve_block_device_name
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return 
cctxt.call(ctxt, 'reserve_block_device_name', **kw)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in 
call
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi retry=self.retry)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in 
_send
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi retry=retry)
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
645, in send
2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi 
call_monitor_timeout, retry=retry)
2020-02-06 

[Yahoo-eng-team] [Bug 1862315] [NEW] Sometimes VMs can't get IP when spawned concurrently

2020-02-07 Thread Oleg Bondarev
Public bug reported:

Version: Stein
Scenario description:
Rally creates 60 VMs with 6 threads. Each thread:
 - creates a VM
 - pings it
 - if successful ping, tries to reach the VM via ssh and execute a command. It 
tries to do that during 2 minutes.
 - if successful ssh - deletes the VM

For some VMs ping fails. Console log shows that VM failed to get IP from
DHCP.

tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does 
not reply.
>From dnsmasq logs:

Feb  6 00:15:43 dnsmasq[4175]: read 
/var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 
addresses
Feb  6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at 
line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host

So it must be something wrong with neutron-dhcp-agent network cache.

>From neutron-dhcp-agent log:

2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been 
scheduled _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276
2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function 
clear wrapper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102
2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync 
(da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP 
cache is out of sync'] _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293

so the agent is aware of invalid cache for the net, but for unknown
reason actual net resync happens only in 8 minutes:

2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-f5107bdd-
d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1862315

Title:
  Sometimes VMs can't get IP when spawned concurrently

Status in neutron:
  New

Bug description:
  Version: Stein
  Scenario description:
  Rally creates 60 VMs with 6 threads. Each thread:
   - creates a VM
   - pings it
   - if successful ping, tries to reach the VM via ssh and execute a command. 
It tries to do that during 2 minutes.
   - if successful ssh - deletes the VM

  For some VMs ping fails. Console log shows that VM failed to get IP
  from DHCP.

  tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does 
not reply.
  From dnsmasq logs:

  Feb  6 00:15:43 dnsmasq[4175]: read 
/var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 
addresses
  Feb  6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at 
line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host

  So it must be something wrong with neutron-dhcp-agent network cache.

  From neutron-dhcp-agent log:

  2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been 
scheduled _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276
  2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function 
clear wrapper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102
  2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync 
(da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP 
cache is out of sync'] _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293

  so the agent is aware of invalid cache for the net, but for unknown
  reason actual net resync happens only in 8 minutes:

  2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-
  f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1862315/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp