[Yahoo-eng-team] [Bug 1813551] Re: [OVN]Missing ingress QoS in OVN
Reviewed: https://review.opendev.org/703537 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=dcec852b7f091c67a378db96c4841c3eec0d496a Submitter: Zuul Branch:master commit dcec852b7f091c67a378db96c4841c3eec0d496a Author: Yunxiang Tao Date: Mon Feb 3 16:40:19 2020 +0800 [OVN] Update QoS related code from networking-ovn In terms of [1], patch [0] has import the lasted code of ovn_client.py, but not "/networking_ovn/ml2/qos_driver.py", so this patch will update it. [0] https://review.opendev.org/#/c/697316/ [1] https://review.opendev.org/#/c/692084/ Change-Id: Iefff6cdf070d234c4ea9c8e1d5fdfe4542eb7fa3 Closes-Bug: #1813551 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1813551 Title: [OVN]Missing ingress QoS in OVN Status in neutron: Fix Released Bug description: Currently, OpenvSwitch is supported QoS for both directions include ingress and egress. OVN is using internal OvS so OVN can support too. But When I saw source code in [1], OVN support egress only. So we need take some works for that. [1] - https://github.com/openstack/networking-ovn/blob/master/networking_ovn/ml2/qos_driver.py#L38 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1813551/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1830763] Related fix merged to neutron (master)
Reviewed: https://review.opendev.org/704686 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=18d8d3973a532a36120c2c58136683e834a5e405 Submitter: Zuul Branch:master commit 18d8d3973a532a36120c2c58136683e834a5e405 Author: Slawek Kaplonski Date: Tue Jan 28 16:52:29 2020 +0100 Revert "[DVR] Add lock during creation of FIP agent gateway port" This reverts commit 7b81c1bc67d2d85e03b4c96a8c1c558a2f909836. It isn't needed anymore with new solution with lock "on db level" which is introduced in follow-up patch. Change-Id: Ibf15ee1969f902e8a266825934d9ac963353f0a0 Related-Bug: #1830763 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1830763 Title: Debug neutron-tempest-plugin-dvr-multinode-scenario failures Status in neutron: Fix Released Bug description: This bug is meant to track the activities to debug the neutron- tempest-plugin-dvr-multinode-scenario job. We start by trying to isolate failures in this test case: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22test_connectivity_through_2_routers%5C%22%20AND%20build_status:%5C%22FAILURE%5C%22%20AND%20build_branch:%5C%22master%5C%22%20AND%20build_name:%5C %22neutron-tempest-plugin-dvr-multinode- scenario%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1830763/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1862425] [NEW] Setting mem_stats_period_seconds=0 should prevent the “Virtio memory balloon” driver from loading
Public bug reported: Setting mem_stats_period_seconds=0 in nova.conf should prevent the “Virtio memory balloon” driver from loading but it doesn't. We are running Rocky installed with openstack-ansible. To reproduce the error: 1. In nova.conf set "mem_stats_period_seconds = 0" on controllers and hypervisors 2. Restart nova services on controllers and hypervisors 3. Build VM 4. Log into VM and type: lspci 5. lspci output will include "Red Hat, Inc. Virtio memory balloon" For more information please see mailing list thread: http://lists.openstack.org/pipermail/openstack- discuss/2020-February/012336.html The problem this causes is that the Virtio memory balloon driver is not able to address large amounts of RAM. We encountered the problem when we built VMs with 1.4T RAM. The VM cannot boot because the driver fails: "BUG: unable to handle kernel paging request at 988b19478000" root@us01odc-dev2-ctrl1:~# dpkg -l | grep nova ii nova-api 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - API frontend ii nova-common 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - common files ii nova-conductor2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - conductor service ii nova-novncproxy 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy ii nova-placement-api2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend ii nova-scheduler2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler ii python-nova 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute Python 2 libraries ii python-novaclient 2:11.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7 root@us01odc-dev2-hv002:~# virsh --version 4.0.0 root@us01odc-dev2-hv002:~# qemu-system-x86_64 --version QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.21) root@us01odc-dev2-hv002:~# nova --version 11.0.0 root@us01odc-dev2-hv002:~# openstack --version openstack 3.16.1 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862425 Title: Setting mem_stats_period_seconds=0 should prevent the “Virtio memory balloon” driver from loading Status in OpenStack Compute (nova): New Bug description: Setting mem_stats_period_seconds=0 in nova.conf should prevent the “Virtio memory balloon” driver from loading but it doesn't. We are running Rocky installed with openstack-ansible. To reproduce the error: 1. In nova.conf set "mem_stats_period_seconds = 0" on controllers and hypervisors 2. Restart nova services on controllers and hypervisors 3. Build VM 4. Log into VM and type: lspci 5. lspci output will include "Red Hat, Inc. Virtio memory balloon" For more information please see mailing list thread: http://lists.openstack.org/pipermail/openstack- discuss/2020-February/012336.html The problem this causes is that the Virtio memory balloon driver is not able to address large amounts of RAM. We encountered the problem when we built VMs with 1.4T RAM. The VM cannot boot because the driver fails: "BUG: unable to handle kernel paging request at 988b19478000" root@us01odc-dev2-ctrl1:~# dpkg -l | grep nova ii nova-api 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - API frontend ii nova-common 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - common files ii nova-conductor2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - conductor service ii nova-novncproxy 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy ii nova-placement-api2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend ii nova-scheduler2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler ii python-nova 2:18.2.3-0ubuntu1~cloud0 all OpenStack Compute Python 2 libraries ii python-novaclient 2:11.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7 root@us01odc-dev2-hv002:~# virsh --version 4.0.0
[Yahoo-eng-team] [Bug 1862417] Re: cloud-init: Attempt to mkswap on a partition fails: invalid block count argument: ''
The machine I'm working on uses cloud-init to update itself, it might only have the fix after the updates. ah, interesting cat /etc/cloud/build.info that'll give us a point in time for which version you have and I suspect you're right, the top of your cloud-init.log will the original version build_name: serverserial: 20190514 Definitely way older than the PR. yep I suspect the fix in our case is to use the latest image of Ubuntu from end Jan 2020. yep ** Changed in: cloud-init Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1862417 Title: cloud-init: Attempt to mkswap on a partition fails: invalid block count argument: '' Status in cloud-init: Invalid Bug description: If an attempt is made to configure a swap partition on an Ubuntu Bionic machine as follows (not a swap file, a swap partition), the attempt to mkswap fails. The expected behaviour is that mkswap and swapon are executed correctly, and /dev/xvdg becomes a valid swap disk. In addition, when filename points at a partition, size and maxsize should be ignored. fs_setup: - label: vidi device: /dev/xvde filesystem: ext4 - label: swap device: /dev/xvdg filesystem: swap mounts: - [ /dev/xvde, /var/lib/vidispine, ext4, defaults, 0, 0 ] - [ /dev/xvdg, none, swap, sw, 0, 0 ] swap: filename: /dev/xvdg size: auto maxsize: 17179869184 mount_default_fields: [ None, None, "auto", "defaults", "0", "2" ] When the machine starts up for the first time, the following error is logged after the swap size parameter is passed as the empty string: 2020-02-07 20:21:55,242 - cc_disk_setup.py[WARNING]: Force flag for swap is unknown. 2020-02-07 20:21:55,255 - util.py[WARNING]: Failed during filesystem operation Failed to exec of '['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']': Unexpected error while running command. Command: ['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', ''] Exit code: 1 Reason: - Stdout: Stderr: mkswap: invalid block count argument: '' 2020-02-07 20:21:55,530 - cc_mounts.py[WARNING]: Activate mounts: FAIL:swapon -a 2020-02-07 20:21:55,530 - util.py[WARNING]: Activate mounts: FAIL:swapon -a To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1862417/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1862417] [NEW] cloud-init: Attempt to mkswap on a partition fails: invalid block count argument: ''
Public bug reported: If an attempt is made to configure a swap partition on an Ubuntu Bionic machine as follows (not a swap file, a swap partition), the attempt to mkswap fails. The expected behaviour is that mkswap and swapon are executed correctly, and /dev/xvdg becomes a valid swap disk. In addition, when filename points at a partition, size and maxsize should be ignored. fs_setup: - label: vidi device: /dev/xvde filesystem: ext4 - label: swap device: /dev/xvdg filesystem: swap mounts: - [ /dev/xvde, /var/lib/vidispine, ext4, defaults, 0, 0 ] - [ /dev/xvdg, none, swap, sw, 0, 0 ] swap: filename: /dev/xvdg size: auto maxsize: 17179869184 mount_default_fields: [ None, None, "auto", "defaults", "0", "2" ] When the machine starts up for the first time, the following error is logged after the swap size parameter is passed as the empty string: 2020-02-07 20:21:55,242 - cc_disk_setup.py[WARNING]: Force flag for swap is unknown. 2020-02-07 20:21:55,255 - util.py[WARNING]: Failed during filesystem operation Failed to exec of '['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']': Unexpected error while running command. Command: ['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', ''] Exit code: 1 Reason: - Stdout: Stderr: mkswap: invalid block count argument: '' 2020-02-07 20:21:55,530 - cc_mounts.py[WARNING]: Activate mounts: FAIL:swapon -a 2020-02-07 20:21:55,530 - util.py[WARNING]: Activate mounts: FAIL:swapon -a ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1862417 Title: cloud-init: Attempt to mkswap on a partition fails: invalid block count argument: '' Status in cloud-init: New Bug description: If an attempt is made to configure a swap partition on an Ubuntu Bionic machine as follows (not a swap file, a swap partition), the attempt to mkswap fails. The expected behaviour is that mkswap and swapon are executed correctly, and /dev/xvdg becomes a valid swap disk. In addition, when filename points at a partition, size and maxsize should be ignored. fs_setup: - label: vidi device: /dev/xvde filesystem: ext4 - label: swap device: /dev/xvdg filesystem: swap mounts: - [ /dev/xvde, /var/lib/vidispine, ext4, defaults, 0, 0 ] - [ /dev/xvdg, none, swap, sw, 0, 0 ] swap: filename: /dev/xvdg size: auto maxsize: 17179869184 mount_default_fields: [ None, None, "auto", "defaults", "0", "2" ] When the machine starts up for the first time, the following error is logged after the swap size parameter is passed as the empty string: 2020-02-07 20:21:55,242 - cc_disk_setup.py[WARNING]: Force flag for swap is unknown. 2020-02-07 20:21:55,255 - util.py[WARNING]: Failed during filesystem operation Failed to exec of '['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', '']': Unexpected error while running command. Command: ['/sbin/mkswap', '/dev/xvdg', '-L', 'swap', ''] Exit code: 1 Reason: - Stdout: Stderr: mkswap: invalid block count argument: '' 2020-02-07 20:21:55,530 - cc_mounts.py[WARNING]: Activate mounts: FAIL:swapon -a 2020-02-07 20:21:55,530 - util.py[WARNING]: Activate mounts: FAIL:swapon -a To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1862417/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1636466] Re: HA router interface points to wrong host after network disruption
** Changed in: neutron Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1636466 Title: HA router interface points to wrong host after network disruption Status in neutron: Won't Fix Bug description: If overlay network of a network node is down for a while, the slave node of HA router can't receive the VRRP packet, so it will promote itself as the master node. Then L3 agent updates ha_state of the router bound with itself to active and updates port bindings of the router interfaces to the associated host. After network recovery, one of the two master nodes of a HA router will be degraded to the slave node. If the degraded node is exactly the previous slave node, L3 agent updates ha_state of the router bound with itself to standby but won't update port bindings of the router interfaces to the host hosting the original master node. Then packets sent to the router are sent to the slave node because l2pop uses the incorrect port bindings. As the keepalived configuration priority are the same 50, the probability of occurrence of the above problem in two network node scenario is 50%. How to reproduce: - two network nodes: host1, host2. - create a ha router: router1, a network: network1 and a subnet: subnet1, add interface of subnet1 to router1. $ neutron l3-agent-list-hosting-router subnet1 +--+++---+--+ | id | host | admin_state_up | alive | ha_state | +--+++---+--+ | 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1 | True | :-) | standby | | 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2 | True | :-) | active | +--+++---+--+ $ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips +--+-+--+ | id | binding:host_id | fixed_ips | +--+-+--+ | 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1 | {"subnet_id": "6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} | | b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host2 | {"subnet_id": "75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"} | | ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2 | {"subnet_id": "6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} | +--+-+--+ - disconnect host1 from the overlay network, wait until the l3-agent-list-hosting-router api show that the two ha_state of router1 are both active. $ neutron l3-agent-list-hosting-router subnet1 +--+++---+--+ | id | host | admin_state_up | alive | ha_state | +--+++---+--+ | 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1 | True | :-) | active | | 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2 | True | :-) | active | +--+++---+--+ $ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips +--+-+--+ | id | binding:host_id | fixed_ips | +--+-+--+ | 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1 | {"subnet_id": "6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} | | b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host1 | {"subnet_id": "75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"} | | ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2 | {"subnet_id": "6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} |
[Yahoo-eng-team] [Bug 1680183] Re: neutron-keepalived-state-change fails with "AssertionError: do not call blocking functions from the mainloop"
** Changed in: neutron Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1680183 Title: neutron-keepalived-state-change fails with "AssertionError: do not call blocking functions from the mainloop" Status in neutron: Fix Released Bug description: 17:39:17.802 6173 CRITICAL neutron [-] AssertionError: do not call blocking functions from the mainloop 17:39:17.802 6173 ERROR neutron Traceback (most recent call last): 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/bin/neutron-keepalived-state-change", line 10, in 17:39:17.802 6173 ERROR neutron sys.exit(main()) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/cmd/keepalived_state_change.py", line 19, in main 17:39:17.802 6173 ERROR neutron keepalived_state_change.main() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 157, in main 17:39:17.802 6173 ERROR neutron cfg.CONF.monitor_cidr).start() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/linux/daemon.py", line 249, in start 17:39:17.802 6173 ERROR neutron self.run() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 70, in run 17:39:17.802 6173 ERROR neutron for iterable in self.monitor: 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/linux/async_process.py", line 256, in _iter_queue 17:39:17.802 6173 ERROR neutron yield queue.get(block=block) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/queue.py", line 313, in get 17:39:17.802 6173 ERROR neutron return waiter.wait() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/queue.py", line 141, in wait 17:39:17.802 6173 ERROR neutron return get_hub().switch() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch 17:39:17.802 6173 ERROR neutron return self.greenlet.switch() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 346, in run 17:39:17.802 6173 ERROR neutron self.wait(sleep_time) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 85, in wait 17:39:17.802 6173 ERROR neutron presult = self.do_poll(seconds) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 62, in do_poll 17:39:17.802 6173 ERROR neutron return self.poll.poll(seconds) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 134, in handle_sigterm 17:39:17.802 6173 ERROR neutron self._kill_monitor() 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/l3/keepalived_state_change.py", line 131, in _kill_monitor 17:39:17.802 6173 ERROR neutron run_as_root=True) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/linux/utils.py", line 221, in kill_process 17:39:17.802 6173 ERROR neutron execute(['kill', '-%d' % signal, pid], run_as_root=run_as_root) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/neutron/agent/linux/utils.py", line 155, in execute 17:39:17.802 6173 ERROR neutron greenthread.sleep(0) 17:39:17.802 6173 ERROR neutron File "/opt/stack/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/eventlet/greenthread.py", line 31, in sleep 17:39:17.802 6173 ERROR neutron assert hub.greenlet is not current, 'do not call blocking functions from the mainloop' 17:39:17.802 6173 ERROR neutron AssertionError: do not call blocking functions from the mainloop 17:39:17.802 6173 ERROR neutron This is what I see when running fullstack l3ha tests, once I enable syslog logging for the helper process. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1680183/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1666959] Re: ha_vrrp_auth_type defaults to PASS which is insecure
** Changed in: neutron Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1666959 Title: ha_vrrp_auth_type defaults to PASS which is insecure Status in neutron: Won't Fix Status in OpenStack Security Advisory: Won't Fix Bug description: With l3_ha enabled, ha_vrrp_auth_type defaults to PASS authentication: https://github.com/openstack/neutron/blob/b90ec94dc3f83f63bdb505ace1e4c272435c494b/neutron/conf/agent/l3/ha.py#L28 which according to http://louwrentius.com/configuring-attacking-and- securing-vrrp-on-linux.html is totally insecure because the VRRP password is transmitted in the clear. I'm not sure if this is currently a serious issue, since if the VRRP network is untrusted, maybe there are already bigger problems. But I thought it was worth reporting, at least. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1666959/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1641811] Re: Wrong ha_state, when l3-agent that host the active router is down
There have been a number of issues fixed in this area the past few releases, closing. If it still happens on a newer release please re- open. ** Changed in: neutron Status: Triaged => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1641811 Title: Wrong ha_state, when l3-agent that host the active router is down Status in neutron: Invalid Bug description: In an L3 HA Setup with multiple network nodes, we can query the agent hosting the Master HA router via l3-agent-list-hosting-router. root@node1:~# neutron l3-agent-list-hosting-router demo-router +--+---++---+--+ | id | host | admin_state_up | alive | ha_state | +--+---++---+--+ | 58fbfcf3-6403-4388-b713-523595411de6 | node1 | True | :-) | active | | a74be278-e428-41a4-a375-9888e9b99bcd | node2 | True | :-) | standby | +--+---++---+--+ Now, On the node1, I stop the neutron-l3-agent, and then check the state. root@node1:~# neutron l3-agent-list-hosting-router demo-router +--+---++---+--+ | id | host | admin_state_up | alive | ha_state | +--+---++---+--+ | 58fbfcf3-6403-4388-b713-523595411de6 | node1 | True | xxx | standby | | a74be278-e428-41a4-a375-9888e9b99bcd | node2 | True | :-) | standby | +--+---++---+--+ You can see that there is no "active" router, but north-south traffic is still though the node1 and the keepalived work normally. I think the ha_state of node1 shoud be "active". To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1641811/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1668410] Re: [SRU] Infinite loop trying to delete deleted HA router
** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1668410 Title: [SRU] Infinite loop trying to delete deleted HA router Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Released Status in neutron: Fix Released Status in OpenStack Security Advisory: Won't Fix Status in neutron package in Ubuntu: Invalid Status in neutron source package in Xenial: Fix Released Bug description: [Descriptoin] When deleting a router the logfile is filled up. See full log - http://paste.ubuntu.com/25429257/ I can see the error 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' occured 3343386 times from _safe_router_removed() [1]: $ grep -r 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' |wc -l 3343386 This _safe_router_removed() is invoked by L488 [2], if _safe_router_removed() goes wrong it will return False, then self._resync_router(update) [3] will make the code _safe_router_removed be run again and again. So we saw so many errors 'Error while deleting router X'. [1] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L361 [2] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488 [3] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L457 [Test Case] That's because race condition between neutron server and L3 agent, after neutron server deletes HA interfaces the L3 agent may sync a HA router without HA interface info (just need to trigger L708[1] after deleting HA interfaces and before deleting HA router). If we delete HA router at this time, this problem will happen. So test case we design is as below: 1, First update fixed package, and restart neutron-server by 'sudo service neutron-server restart' 2, Create ha_router neutron router-create harouter --ha=True 3, Delete ports associated with ha_router before deleting ha_router neutron router-port-list harouter |grep 'HA port' |awk '{print $2}' |xargs -l neutron port-delete neutron router-port-list harouter 4, Update ha_router to trigger l3-agent to update ha_router info without ha_port into self.router_info neutron router-update harouter --description=test 5, Delete ha_router this time neutron router-delete harouter [1] https://github.com/openstack/neutron/blob/mitaka- eol/neutron/db/l3_hamode_db.py#L708 [Regression Potential] The fixed patch [1] for neutron-server will no longer return ha_router which is missing ha_ports, so L488 will no longer have chance to call _safe_router_removed() for a ha_router, so the problem has been fundamentally fixed by this patch and no regression potential. Besides, this fixed patch has been in mitaka-eol branch now, and neutron-server mitaka package is based on neutron-8.4.0, so we need to backport it to xenial and mitaka. $ git tag --contains 8c77ee6b20dd38cc0246e854711cb91cffe3a069 mitaka-eol [1] https://review.openstack.org/#/c/440799/2/neutron/db/l3_hamode_db.py [2] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1668410/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1510757] Re: tempest test api.network; l3 agent can't delete HA-router
Closing as I don't think this happens any more. ** Changed in: neutron Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1510757 Title: tempest test api.network;l3 agent can't delete HA-router Status in neutron: Invalid Bug description: I use tempest test my company's openstack environment. # tox -eall -- tempest.api.network When tox finished. Log in L3-agent always show: 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent [-] Error while deleting router da4b28ce-33b1-4000-8609-a41a2ab8c982 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 344, in _safe_router_removed 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 362, in _router_removed 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent ri.delete(self) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 364, in delete 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent super(HaRouter, self).delete(agent) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 273, in delete 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent self.process(agent) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 370, in process 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent super(HaRouter, self).process(agent) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/common/utils.py", line 359, in call 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent self.logger(e) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 197, in __exit__ 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/common/utils.py", line 356, in call 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent return func(*args, **kwargs) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 695, in process 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent self.routes_updated() 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 181, in routes_updated 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent instance = self._get_keepalived_instance() 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 131, in _get_keepalived_instance 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent return self.keepalived_manager.config.get_instance(self.ha_vr_id) 2015-10-27 19:32:03.281 23885 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'config' I think the reason is tempest create and delete router too fast. when l3-agent create ha-router,tempest delete the router,and neutron-server delete ha-interface https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha_router.py#L79 class HaRouter can't initialized without ha-interface information,it just return without _init_keepalived_manager When l3-agent delete router,it report AttributeError: 'NoneType'. When l3 agent can't delete the router,l3 agent always fullsync with neutron-server every 30 seconds. In controller, the neutron-server cpu always 70%..^-^ L3 agent should add a check before create HArouter.If it find ha- interface is none ,means router has been deleted in neutron-server. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1510757/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1574092] Re: No router namespace after creating legacy router
** Changed in: neutron Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1574092 Title: No router namespace after creating legacy router Status in neutron: Won't Fix Bug description: In case there are some temporary MQ connectivity problems during router creation, notification sent by l3_notifier via rpc cast gets lost. This leads to the absence of qrouter namespace on controllers. The issue was first faced on mos HA (3 controllers) build - https://bugs.launchpad.net/mos/10.0.x/+bug/1529820 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1574092/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1828547] Re: neutron-dynamic-routing TypeError: argument of type 'NoneType' is not iterable
** Project changed: neutron => networking-bgp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1828547 Title: neutron-dynamic-routing TypeError: argument of type 'NoneType' is not iterable Status in networking-bgp: New Bug description: Rocky with Ryu, dont have a reproduce on this one or don't know what caused it in the first place. python-neutron-13.0.3-1.el7.noarch openstack-neutron-openvswitch-13.0.3-1.el7.noarch python2-neutron-dynamic-routing-13.0.1-1.el7.noarch openstack-neutron-bgp-dragent-13.0.1-1.el7.noarch openstack-neutron-common-13.0.3-1.el7.noarch openstack-neutron-ml2-13.0.3-1.el7.noarch python2-neutronclient-6.9.0-1.el7.noarch openstack-neutron-13.0.3-1.el7.noarch openstack-neutron-dynamic-routing-common-13.0.1-1.el7.noarch python2-neutron-lib-1.18.0-1.el7.noarch python-ryu-common-4.26-1.el7.noarch python2-ryu-4.26-1.el7.noarch 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 359, in add_bgp_speaker_on_dragent 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.add_bgp_peers_to_bgp_speaker(bgp_speaker) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 390, in add_bgp_peers_to_bgp_speaker 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server bgp_peer) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 399, in add_bgp_peer_to_bgp_speaker 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.cache.put_bgp_peer(bgp_speaker_id, bgp_peer) 2019-05-09
[Yahoo-eng-team] [Bug 1661717] Re: [linuxbridge agent] vm can't communicate with router with l2pop
Since there is a workaround, and this is only regarding the Linux Bridge agent, which is not actively maintained, I'm closing this since it doesn't seem like it will be fixed. ** Changed in: neutron Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1661717 Title: [linuxbridge agent] vm can't communicate with router with l2pop Status in neutron: Won't Fix Bug description: When both l2pop and arp_responder enabled for linuxbridge agent, vxlan device is created in "proxy" mode. In this mode, ARP entry must be statically added by linuxbridge agent. Because of [1], l2pop driver won't notify HA router port, so linuxbridge agent can't add ARP entry for router port. As there is no router ARP entry, vxlan device is dropping ARP request packets from vm(destined to router), making vm unable to communicate with router. This issue is only on linuxbridge agent and not on ovs agent. Temporary solution for vm to communicate with HA router is to disable arp_responder when l2pop is enabled. If the users need both arp_responder and l2pop features for linuxbridge agent, we need an implementation which decouples them i.e https://bugs.launchpad.net/neutron/+bug/1518392 [1] https://review.openstack.org/#/c/255237/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1661717/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1252900] Re: Directional network performance issues with Neutron + OpenvSwitch
** Changed in: neutron Status: Incomplete => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1252900 Title: Directional network performance issues with Neutron + OpenvSwitch Status in neutron: Won't Fix Status in openstack-manuals: Fix Released Status in openvswitch: New Status in Ubuntu: Confirmed Bug description: Hello! Currently, Havana L3 Router have a serious issue. Which makes it almost useless (sorry, I do not want to be rude but instead, trying to bring more attention to this problem). When the tenant network traffic pass trough the L3 Router (Namespace at the Network Node), it becomes very, very slow and intermittent. The issue also affects the traffic that hit a "Floating IP", going into the Tenant subnet. The affected topology is: "Per-Tenant Router with Private Networks". As a reference, I'm using the following Grizzly guide for my Havana deployment: https://github.com/mseknibilel/OpenStack-Grizzly-Install- Guide/blob/OVS_MultiNode/OpenStack_Grizzly_Install_Guide.rst Extra info: http://docs.openstack.org/havana/install-guide/install/apt/content /section_networking-routers-with-private-networks.html The symptoms are: 1- "Slow connection to Canonical or when browsing the web from within a tenant subnet" aptitude update ; aptitude safe-upgrade From within a Tenant instance, it will take about 1 hour to finish, on a link capable of finishing it in 2~3 minutes. 2- SSH connection using Floating IPs froze 10 times per minute. Connecting from the outside world, into a Instance using its Floating IP address, is a pain. We're talking about this issue at the OpenStack mail list, here is the related thread: http://lists.openstack.org/pipermail/openstack/2013-November/002705.html Also, I made a video about it, watch it here: http://www.youtube.com/watch?v=jVjiphMuuzM Tested versions: * OpenStack Havana on top of Ubuntu 12.04.3 using Ubuntu Cloud Archive * Tested with Open vSwitch versions (none of it works): 1.10.2 from UCA 1.11.0 compiled for Ubuntu 12.04.3 using "dpkg-buildpackage" 1.9.0 from Ubuntu package "openvswitch-datapath-lts-raring-dkms" * Not tested (maybe it will work): Havana with Ubuntu 12.04.1 + OVS 1.4.0 (does not support VXLAN). * Tenant subnet tested types: VXLAN GRE VLAN It does not matter the subnet type you choose, it will be always slow. Apparently, if you upgrade your Grizzly from Ubuntu 12.04.1 + OVS 1.4.0, to Ubuntu 12.04.3 with OVS 1.9.0, it will trigger this problem when with Grizzly too. So, I think that this problem might be related to Open vSwitch itself. But I need more time to check this. My private cloud computing based on Havana is open for you guys to debug it, just ask for an access! =) My current plan it to test Havana with OVS 1.4.0 but, I don't have too much time this week to do this job. I'm not sure if the problem is with OVS or not, I'll try to test it this week. Also, at my video, you guys can see how I "fixed" it, by starting a Squid proxy-cache server within the Tenant Namespece Router, proving that the problem appear ONLY when you try to establish a connection from a tenant subnet, directly to the External network. I mean, the connection between a tenant and its router is okay, from its router to the Internet, is also okay but, from a tenant to the Internet, is not. So, Squid was a perfect choice to verify this theory at the Namespace router... And Voialá! "There I fixed it"! =P Please, let me know what configuration files do you guys will need to be able to reproduce this problem. Best! Thiago To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1252900/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1845557] Re: DVR: FWaaS rules created for a router after the FIP and VM created, not applied to routers rfp port on router-update
** Changed in: neutron Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1845557 Title: DVR: FWaaS rules created for a router after the FIP and VM created, not applied to routers rfp port on router-update Status in neutron: Fix Released Bug description: This was seen in Rocky. When network, subnet, router and a VM instance created with a FloatingIP before attaching FireWall rules to the router, causes the Firewall rules not to be applied to the 'rfp' port for north-south routing when using Firewall-as-Service in legacy 'iptables' mode. After applying the Firewall rules to the Router, it is expected that the router-update would trigger adding the Firewall rules to the existing routers, but the logic is not right. Any new VMs added to the subnet on a new compute host, gets the Firewall rules applied to the 'rfp' interface. So the only way to get around this problem is to restart the 'l3-agent'. Once the 'l3-agent' is restarted, the Firewall rules are applied again. This is also true when Firewall rules are removed after the VM and routers are in place, since the update is not handled properly, the firewall rules may stay there until we reboot the l3-agent. How to reproduce this problem: This is FWaaS v2 with legacy 'iptables': 1. Create a Network 2. Create a Subnet 3. Create a Router (DVR) 4. Attach the Subnet to the router. 5. Assign the gateway to the router. 6. Create a VM on the given private network. 7. Create a FloatingIP and associate the FloatingIP to the VM's private IP. 8. Now the VM, router, fipnamespace are all in place. 9. Now create Firwall rules neutron firewall-rule-create --protocol icmp --action allow --name allow-icmp neutron firewall-rule-create --protocol tcp --destination-port 80 --action deny --name deny-http neutron firewall-rule-create --protocol tcp --destination-port 22 --action allow --name allow-ssh 10. Then create firewall policy neutron firewall-policy-create --firewall-rules "allow-icmp deny-http allow-ssh" policy-fw 11. Create a firewall neutron firewall-create policy-fw --name user-fw 12. Check if the firewall was created: neutron firewall-show user-fw 13. If the firewall was created after the router have been created, based on the documentation you need to manually update the router. $ neutron firewall-update —router —router 14. After the update we would expect that all existing router-1 and router-2 to have the firewall rules. But we don't see if configured for the router-1 that was created before the firewall was created. And so the VM is not protected by the Firewall rules. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1845557/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1717302] Re: Tempest floatingip scenario tests failing on DVR Multinode setup with HA
** Changed in: neutron Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1717302 Title: Tempest floatingip scenario tests failing on DVR Multinode setup with HA Status in neutron: Fix Released Bug description: neutron.tests.tempest.scenario.test_floatingip.FloatingIpSameNetwork and neutron.tests.tempest.scenario.test_floatingip.FloatingIpSeparateNetwork are failing on every patch. This trace is seen on the node-2 l3-agent. Sep 13 07:16:43.404250 ubuntu-xenial-2-node-rax-dfw-10909819-895688 neutron-keepalived-state-change[5461]: ERROR neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 172.24.5.3 on qg-bf79c157-e2 in namespace qrouter-796b8715-ca01-43ad-bc08-f81a0b4db8cc: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address : ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address ERROR neutron.agent.linux.ip_lib Traceback (most recent call last): ERROR neutron.agent.linux.ip_lib File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 1082, in _arping ERROR neutron.agent.linux.ip_lib ip_wrapper.netns.execute(arping_cmd, extra_ok_codes=[1]) ERROR neutron.agent.linux.ip_lib File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 901, in execute ERROR neutron.agent.linux.ip_lib log_fail_as_error=log_fail_as_error, **kwargs) ERROR neutron.agent.linux.ip_lib File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 151, in execute ERROR neutron.agent.linux.ip_lib raise ProcessExecutionError(msg, returncode=returncode) ERROR neutron.agent.linux.ip_lib ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address ERROR neutron.agent.linux.ip_lib ERROR neutron.agent.linux.ip_lib If this is a DVR router, then the GARP should not go through the qg interface for the floatingIP. More information can be seen here. http://logs.openstack.org/43/500143/5/check/gate-tempest-dsvm-neutron- dvr-multinode-scenario-ubuntu-xenial- nv/0a58fce/logs/subnode-2/screen-q-l3.txt.gz?level=TRACE#_Sep_13_07_16_47_864052 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1717302/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1595043] Re: Make DVR portbinding implementation useful for HA ports
** Changed in: neutron Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1595043 Title: Make DVR portbinding implementation useful for HA ports Status in neutron: Fix Released Bug description: Make DVR portbinding implementation generic so that it will be useful for all distributed router ports(for example, HA router ports). Currently HA interface port binding is implemented as a normal port binding i.e it uses only ml2_port_bindings table, with host set to master host. When a new host becomes master, this binding will be updated. But this approach has issues as explained in https://bugs.launchpad.net/neutron/+bug/1522980 As HA router ports(DEVICE_OWNER_HA_REPLICATED_INT, DEVICE_OWNER_ROUTER_SNAT for DVR+HA) are distributed ports like DVR, we will follow DVR approach of port binding for HA router ports. So we make DVR port binding generic, so that it can be used for all distributed router ports. To make DVR port binding generic for all distributed router ports, we need to 1) rename ml2_dvr_port_bindings table to ml2_distributed_port_bindings 2) rename functions updating/accessing this table 3) Replace 'if condition' for dvr port with distributed port, for example, replace if port['device_owner'] == const.DEVICE_OWNER_DVR_INTERFACE: with if distributed_router_port(port): To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1595043/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1835731] Re: Neutron server error: failed to update port DOWN
** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1835731 Title: Neutron server error: failed to update port DOWN Status in neutron: Fix Released Bug description: Before adding extra logging: 2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req- 2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device d75fca78-2f64-4c5a-9a94-6684c753bf3d down After adding logging: 2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req- 2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device d75fca78-2f64-4c5a-9a94-6684c753bf3d down: 'NoneType' object has no attribute 'started_at': AttributeError: 'NoneType' object has no attribute 'started_at'2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc Traceback (most recent call last):2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 367, in update_device_list2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc **kwargs)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/rpc.py", line 233, in update_device_down2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc n_const.PORT_STATUS_DOWN, host)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 319, in notify_l2pop_port_wiring2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc agent_restarted = l2pop_driver.obj.agent_restarted(port_context)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in agent_restarted2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc if l2pop_db.get_agent_uptime(agent) < cfg.CONF.l2pop.agent_boot_time:2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 51, in get_agent_uptime2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc return timeutils.delta_seconds(agent.started_at,2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc AttributeError: 'NoneType' object has no attribute 'started_at' To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1835731/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1766701] Re: Trunk Tests are failing often in dvr-multinode scenario job
** Changed in: neutron Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1766701 Title: Trunk Tests are failing often in dvr-multinode scenario job Status in neutron: Fix Released Bug description: In about 40% of test runs tests like: neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle, example runs: * http://logs.openstack.org/03/560703/7/check/neutron-tempest-plugin-dvr-multinode-scenario/1f67afd/logs/testr_results.html.gz * http://logs.openstack.org/17/553617/19/check/neutron-tempest-plugin-dvr-multinode-scenario/a13a6fd/logs/testr_results.html.gz * http://logs.openstack.org/84/533284/5/check/neutron-tempest-plugin-dvr-multinode-scenario/1c09aa6/logs/testr_results.html.gz neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity, example run: * http://logs.openstack.org/90/545490/9/check/neutron-tempest-plugin-dvr-multinode-scenario/c1ed535/logs/testr_results.html.gz are failing. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1766701/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1753434] Re: Unbound ports floating ip not working with address scopes in DVR HA
** Changed in: neutron Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1753434 Title: Unbound ports floating ip not working with address scopes in DVR HA Status in neutron: Fix Released Bug description: using latest build stable Pike This commit properly addressed problem of unbound ports centralized floating Ips - https://git.openstack.org/cgit/openstack/neutron/commit/?id=8b4bb9c0b057da175f2d773f8257de3e571aed4e However traffic towards unbound port (Octavia Pike VIP) when using address scopes is getting blocked in snat namespace: Chain neutron-l3-agent-scope (1 references) pkts bytes target prot opt in out source destination 23 1612 DROP all -- anysg-775c0432-f1 anywhere anywhere mark match ! 0x401/0x It is working properly with centralized router HA with address scopes, and with DVR HA without address scopes. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1753434/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1526855] Re: VMs fail to get metadata in large scale environments
I am going to close this bug, partly because it is so old without any updates, but also because there have been a number of improvements over the past few cycles wrt scaling that this is probably not as much an issue any more. ** Changed in: neutron Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1526855 Title: VMs fail to get metadata in large scale environments Status in neutron: Won't Fix Bug description: In large scale environments, instances can fail to get their metadata. Tests were performed in a 100 compute node environment creating 4000 vms. 15-20 vms will fail all 20 metadata request attempts. This has been reproduced multiple times with similar results. All of the vms successfully obtain a private ip and are pingable but a small number of vms fail to be reachable via ssh. Increasing the number of metadata request attempts in a Cirros test image shows that the metadata requests eventually will succeed. It just takes a long time in some cases. Below is a list of instance uuids I collected from one test and the number of metadata request attempts it took to be successful. These tests were performed using stable/liberty with a small number of patches cherry-picked from Mitaka which were intended to improve performance. 705c3582-f39b-482d-9a6e-d78bc033d3e7 5 27f93574-19fe-4b88-ad6e-c518022ef66a 2 ff668db8-196e-4ec3-82d9-f7ab5a30230557 b3f97acb-6374-4406-9474-7bacfc3486cd42 80c19187-7c19-4adc-ad3a-51342f00d799 51 071f60d5-2a9a-4448-b14b-9016c9eee4eb 47 d39f336e-0fb4-4934-b835-e791661d60f1 36 a5627d9f-fd2d-48b0-ada2-f519a97849ee5 3c24145e-8e11-4e79-8618-fca0416ea030 41 a36ab8fd-4e53-4265-a2bf-6945ac5d8811 46 a9400361-8941-4f03-b11d-0940b5499b4b 37 7449efbd-1df6-4fcc-88d5-e4e355ae796324 45c6a108-c18b-4284-9ede-3e5f8d7851be 30 fbe7c6fc-6aec-464c-87b7-0800836f7754 7 cb5a3a49-45b9-40de-8c62-903bee1925f4 37 0c7151ce-79dc-4d55-a617-7f4182cb2194 14 0f1c24a0-3b97-4d56-8feb-b30d67cf6852 44 8c359465-198f-4654-84bb-f334f0400d58 10 b3a5a3df-28c4-40c3-adba-856a0fcbd29e 55 38ee6525-441e-4640-a998-ad89b8d3f8be 2 07ecde16-c274-481e-8169-4febb15c7273 48 f77cd7aa-89e2-4d2c-a89f-e19ff430e5a4 31 b9acdba1-1794-4fa8-bbe3-ffb94f86d19b 3 30824aa6-3df5-4a43-a701-dd33da7f704f 13 5216ffc0-4a8d-4a3e-a4e3-5473b96ca47b 40 999512ff-70e3-4cfd-9cb4-c5788a02fee6 4 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1526855/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1789434] Re: neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times
** Changed in: neutron Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1789434 Title: neutron_tempest_plugin.scenario.test_migration.NetworkMigrationFromHA failing 100% times Status in neutron: Fix Released Bug description: Since few days all migration tests from DVR router fails. Example of failure: http://logs.openstack.org/37/382037/71/check/neutron-tempest-plugin-dvr-multinode-scenario/605ed17/logs/testr_results.html.gz May be related somehow to https://review.openstack.org/#/c/589410/ but I'm not sure yet. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1789434/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1861670] Re: AttributeError: 'NetworkConnectivityTest' object has no attribute 'safe_client'
Reviewed: https://review.opendev.org/705413 Committed: https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=2a71a8966492adb222e6fc289e77f7afc681d082 Submitter: Zuul Branch:master commit 2a71a8966492adb222e6fc289e77f7afc681d082 Author: Slawek Kaplonski Date: Mon Feb 3 11:48:34 2020 +0100 Fix test_connectivity_dvr_and_no_dvr_routers_in_same_subnet test This patch fixes couple of issues in scenario test from test_connectivity module. 1. Replace safe_client with client object In class NetworkConnectivityTest there was used safe_client but there is no such attribute in this class. Object "client" should be used instead. 2. It also fixes in the same test how external network's subnet ID is get from the network's info. 3. Change to use admin_client to get details of external network's subnet as this subnet don't belongs to tenant user so regular client gets 404 error while doing subnet_show command. 4. Check the subnets IP version to retrieve only an IPv4 one. Change-Id: Ibebb20b29dd6ae902d194fd26ba1ea728a976286 Closes-bug: #1861670 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1861670 Title: AttributeError: 'NetworkConnectivityTest' object has no attribute 'safe_client' Status in neutron: Fix Released Bug description: Since few days test neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_dvr_and_no_dvr_routers_in_same_subnet is failing with error like: ft1.1: neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_dvr_and_no_dvr_routers_in_same_subnet[id-69d3650a-5c32-40bc-ae56-5c4c849ddd37]testtools.testresult.real._StringException: Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 108, in wrapper return func(*func_args, **func_kwargs) File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_connectivity.py", line 188, in test_connectivity_dvr_and_no_dvr_routers_in_same_subnet ext_network = self.safe_client.show_network(self.external_network_id) AttributeError: 'NetworkConnectivityTest' object has no attribute 'safe_client' Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'NetworkConnectivityTest'%20object%20has%20no%20attribute%20'safe_client'%5C%22 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1861670/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1862394] [NEW] Nova ignores delete requests while instance is in deleting state
Public bug reported: Right now the code in compute.api delete methods ignores delete requests if the instance is already in deleting state (https://github.com/openstack/nova/blob/69ce0f01b60dfe0f020ac57eb82a42e5935064c4/nova/compute/api.py#L2257-L2262). It was result of discussion in https://bugs.launchpad.net/nova/+bug/1248563 and mailing list thread referenced there. Though right now, after python 2 EOL, it is possible to allow multiple delete requests, without having to worry about delete requests piling up waiting on the instance uuid lock, if the lock will be acquired with timeout. Python 3 supports passing timeout argument to lock.acquire, so it'll have to be a pretty easy change to oslo.concurrency to allow passing that timeout through (for example using acquire call with timeout in https://github.com/openstack/oslo.concurrency/blob/c08159119e605dea76580032ca85834d1de21d3e/oslo_concurrency/lockutils.py#L156-L162). The instance deletion flow could then use such way of lock acquisition, and if it was not acquired, to allow user to retry later. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862394 Title: Nova ignores delete requests while instance is in deleting state Status in OpenStack Compute (nova): New Bug description: Right now the code in compute.api delete methods ignores delete requests if the instance is already in deleting state (https://github.com/openstack/nova/blob/69ce0f01b60dfe0f020ac57eb82a42e5935064c4/nova/compute/api.py#L2257-L2262). It was result of discussion in https://bugs.launchpad.net/nova/+bug/1248563 and mailing list thread referenced there. Though right now, after python 2 EOL, it is possible to allow multiple delete requests, without having to worry about delete requests piling up waiting on the instance uuid lock, if the lock will be acquired with timeout. Python 3 supports passing timeout argument to lock.acquire, so it'll have to be a pretty easy change to oslo.concurrency to allow passing that timeout through (for example using acquire call with timeout in https://github.com/openstack/oslo.concurrency/blob/c08159119e605dea76580032ca85834d1de21d3e/oslo_concurrency/lockutils.py#L156-L162). The instance deletion flow could then use such way of lock acquisition, and if it was not acquired, to allow user to retry later. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1862394/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1862374] [NEW] Neutron incorrectly selects subnet
Public bug reported: Distro bionic Openstack version: openstack testing cloud When using the command openstack server add floating ip --fixed-ip-address 10.66.0.18 juju- 53a7bc-north-0 10.245.163.125 for the following machine ubuntu@dmzoneill-bastion:~$ openstack server list -c ID -c Name -c Networks +--+--+-+ | ID | Name | Networks | +--+--+-+ | 063c2e8e-4d57-4267-a506-4c7b336e71b6 | juju-53a7bc-north-0 | north=10.66.0.18; south=10.55.0.9 | | 670bd827-f570-439f-b17f-91710849 | juju-6de968-south-0 | south=10.55.0.4 | | 0cd3c498-2826-4868-9384-e12e0799f903 | juju-855490-default-0| dmzoneill_admin_net=10.5.0.6| | 0b72ccd8-b694-45c7-86be-870511426140 | juju-370447-controller-0 | north=10.66.0.14; south=10.55.0.3; dmzoneill_admin_net=10.5.0.8 | | 40c68cb2-4e20-4d15-a82c-4c4252b8a0da | dmzoneill-bastion| dmzoneill_admin_net=10.5.0.7, 10.245.162.200| +--+--+-+ Neutron returns the error GET call to network for http://10.245.161.159:9696/v2.0/ports?device_id=063c2e8e-4d57-4267-a506-4c7b336e71b6 used request id req-d915168a-32c6-4c74-9a83-1ef090b376d8 Manager serverstack ran task network.GET.ports in 0.122786998749s Manager serverstack running task network.PUT.floatingips REQ: curl -g -i -X PUT http://10.245.161.159:9696/v2.0/floatingips/0c771099-ca95-4447-8a60-5f64a590d943 -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.18.4 CPython/2.7.15+" -H "Content-Type: application/json" -H "X-Auth-Token: {SHA1}9cf2688baf3b5c3e5dea7e2f7faa6554ee1b6bfb" -d '{"floatingip": {"fixed_ip_address": "10.66.0.18", "port_id": "6f8626ba-ae8c-492a-93ad-3f349c600a3b"}}' http://10.245.161.159:9696 "PUT /v2.0/floatingips/0c771099-ca95-4447-8a60-5f64a590d943 HTTP/1.1" 400 169 RESP: [400] Content-Type: application/json Content-Length: 169 X-Openstack-Request-Id: req-1a8be2ec-456e-483d-a4e0-5ab204c55c2d Date: Fri, 07 Feb 2020 14:45:13 GMT Connection: keep-alive RESP BODY: {"NeutronError": {"message": "Bad floatingip request: Port 6f8626ba-ae8c-492a-93ad-3f349c600a3b does not have fixed ip 10.66.0.18.", "type": "BadRequest", "detail": ""}} Neutron seems to look at the list of networks associated with server and pops the last network (south) from the list. Enutron selects the port south 6f8626ba-ae8c-492a-93ad-3f349c600a3b which is not in that subnet and errors ubuntu@dmzoneill-bastion:~$ openstack port list -c ID -c "Fixed IP Addresses" +--+---+ | ID | Fixed IP Addresses | +--+---+ | 1ad11c89-8574-49fc-9efe-b4a0b73b01eb | ip_address='10.55.0.2', subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7' | | 36b16756-01fe-49c7-9e14-41abaf0059f9 | ip_address='10.5.0.8', subnet_id='7edee502-ab23-46af-b446-17c233d11a94' | | 3e20a885-6421-4acd-b936-56b3bcb4930f | ip_address='10.5.0.6', subnet_id='7edee502-ab23-46af-b446-17c233d11a94' | | 410beab6-c2ed-4d31-bc3d-4457a3c28b5f | ip_address='10.55.0.4', subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7' | | 4e1defd4-1185-4f54-bb93-768dbf8d6436 | ip_address='10.66.0.14', subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce' | | 56d6a38d-0698-491b-9996-b239a7d95d5b | ip_address='10.55.0.3', subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7' | | 67558707-ed0b-47b6-be0e-0da0bf2871b5 | ip_address='10.66.0.2', subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce' | | 6d3c6101-75c3-47d0-a496-83cc62092f2d | ip_address='10.5.0.7', subnet_id='7edee502-ab23-46af-b446-17c233d11a94' | | 6f8626ba-ae8c-492a-93ad-3f349c600a3b | ip_address='10.55.0.9', subnet_id='eec9530d-df77-41eb-8e85-9ef6e45931a7' | | 778899dd-7c15-47a9-8968-261088aa14bf | ip_address='10.66.0.18', subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce' | | 7a07c64b-6d7b-4a57-b553-96459337f4cc | ip_address='10.66.0.1', subnet_id='fe280e59-e8e0-4cdc-923a-1b15e45b95ce' | | 8ea03785-c046-4840-be82-0cd4bad5b4e8 | ip_address='10.5.0.2', subnet_id='7edee502-ab23-46af-b446-17c233d11a94' | | cf0f0aea-e962-4ddf-ac7a-22c43ad483b0 | ip_address='10.5.0.1', subnet_id='7edee502-ab23-46af-b446-17c233d11a94' |
[Yahoo-eng-team] [Bug 1862375] [NEW] Subsequent nova-api volume attach request waiting for previous one to complete
Public bug reported: Description === Subsequent nova-api requests for attachment of different volumes to the same VM are blocking and waiting for the previous attach action to be finished and "in-use" state. In my opinion, this is unnecessary and can lead to timeouting errors. Observed on Openstack Rocky. Steps to reproduce == Preconditions: - cinder configured with a backend storage, best if a HW storage is used where the attach action takes considerable time - say >10s - 1 VM ("vm") - 2 volumes ("vol1", "vol2") Actions: 1. $ openstack server add volume vm vol1 -> is accepted immediately by nova-api 2. immediately after (1.), when the vol1 is being attached, run $ openstack server add volume vm vol2 -> this openstack command (aka nova-api call) blocks and does not return until the volume attach command in (1.) is completed and vol1 is "in-use" state. Expected result === Step (2.) should be immediately accepted and handled asychronously. I don't see a reason why step (2.) should wait until volume from step (1.) is "in-use" Logs In cases, when the attachment of (1.) takes more than 60s, it leads to an error of (2.) with following messaging timeout, which also exposes where the call waits - obviously reserve_block_device_name to a compute node : 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi [req-44c4d473-9916-4d73-82d6-0115a1305f2a 0b5290e72cf546cb9e1921d81abb303c b21f6c73cba24a4280156f1d3b77af98 - default default] Unexpected exception in API method: MessagingTimeout: Timed out waiting for a reply to message ID 3af45090624b4fa29425e6fc05f41149 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 801, in wrapped 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/volumes.py", line 336, in create 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi supports_multiattach=supports_multiattach) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 205, in inner 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 153, in inner 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 4172, in attach_volume 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi supports_multiattach=supports_multiattach) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 4047, in _attach_volume 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi device_type=device_type, tag=tag) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 3958, in _create_volume_bdm 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi multiattach=volume['multiattach']) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 897, in reserve_block_device_name 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi return cctxt.call(ctxt, 'reserve_block_device_name', **kw) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi retry=self.retry) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi retry=retry) 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send 2020-02-06 02:03:14.744 30 ERROR nova.api.openstack.wsgi call_monitor_timeout, retry=retry) 2020-02-06
[Yahoo-eng-team] [Bug 1862315] [NEW] Sometimes VMs can't get IP when spawned concurrently
Public bug reported: Version: Stein Scenario description: Rally creates 60 VMs with 6 threads. Each thread: - creates a VM - pings it - if successful ping, tries to reach the VM via ssh and execute a command. It tries to do that during 2 minutes. - if successful ssh - deletes the VM For some VMs ping fails. Console log shows that VM failed to get IP from DHCP. tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does not reply. >From dnsmasq logs: Feb 6 00:15:43 dnsmasq[4175]: read /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 addresses Feb 6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host So it must be something wrong with neutron-dhcp-agent network cache. >From neutron-dhcp-agent log: 2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been scheduled _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276 2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function clear wrapper /var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102 2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync (da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP cache is out of sync'] _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293 so the agent is aware of invalid cache for the net, but for unknown reason actual net resync happens only in 8 minutes: 2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-f5107bdd- d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1862315 Title: Sometimes VMs can't get IP when spawned concurrently Status in neutron: New Bug description: Version: Stein Scenario description: Rally creates 60 VMs with 6 threads. Each thread: - creates a VM - pings it - if successful ping, tries to reach the VM via ssh and execute a command. It tries to do that during 2 minutes. - if successful ssh - deletes the VM For some VMs ping fails. Console log shows that VM failed to get IP from DHCP. tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does not reply. From dnsmasq logs: Feb 6 00:15:43 dnsmasq[4175]: read /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 addresses Feb 6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host So it must be something wrong with neutron-dhcp-agent network cache. From neutron-dhcp-agent log: 2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been scheduled _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276 2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function clear wrapper /var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102 2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync (da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP cache is out of sync'] _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293 so the agent is aware of invalid cache for the net, but for unknown reason actual net resync happens only in 8 minutes: 2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req- f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1862315/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp