[Yahoo-eng-team] [Bug 1654998] [NEW] fullstack fails: creating ha port runs into StaleDataError
Public bug reported: An example exception can be found in http://paste.openstack.org/show/594276/ . ** Affects: neutron Importance: High Assignee: John Schwarz (jschwarz) Status: In Progress ** Tags: gate-failure l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1654998 Title: fullstack fails: creating ha port runs into StaleDataError Status in neutron: In Progress Bug description: An example exception can be found in http://paste.openstack.org/show/594276/ . To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1654998/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1654032] Re: HA job ping test unstable
** Also affects: neutron Importance: Undecided Status: New ** Changed in: neutron Status: New => Confirmed ** Changed in: neutron Importance: Undecided => Critical ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: neutron Milestone: None => ocata-3 ** Tags added: gate-failure l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1654032 Title: HA job ping test unstable Status in neutron: Confirmed Status in tripleo: In Progress Bug description: We're seeing a lot of spurious failures in the ping test on HA jobs lately. Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=build_name%3A%20 *tripleo- ci*%20AND%20build_status%3A%20FAILURE%20AND%20message%3A%20%5C%22From%2010.0.0.1%20icmp_seq%3D1%20Destination%20Host%20Unreachable%5C%22 Sample failure log: http://logs.openstack.org/76/416576/1/check- tripleo/gate-tripleo-ci-centos-7-ovb- ha/6db60be/console.html#_2017-01-04_16_40_34_770751 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1654032/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1652071] [NEW] Implement migration from iptables-based security groups to ovsfw
Public bug reported: When switching an ovs-agent from iptables to ovsfw, new instances will be created using the ovsfw, but old instances will stick with iptables. In fact, there isn't a way to migrate an instance from iptables to ovsfw, and one should be provided. Considerations: a. It isn't enough to just remove the qvo/qvb/qbr interfaces and then attach the tap device directly to the integration bridge - we should also change the domain xml of the instance itself, so that when migrating an instance from one compute node to the other, nova won't depend on non-existent devices. Should this be done in Nova or in Neutron? Should Nova be notified? b. On Neutron side, we should also change the Port table to indicate a change. This might require a new RPC call from the agent side. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1652071 Title: Implement migration from iptables-based security groups to ovsfw Status in neutron: New Bug description: When switching an ovs-agent from iptables to ovsfw, new instances will be created using the ovsfw, but old instances will stick with iptables. In fact, there isn't a way to migrate an instance from iptables to ovsfw, and one should be provided. Considerations: a. It isn't enough to just remove the qvo/qvb/qbr interfaces and then attach the tap device directly to the integration bridge - we should also change the domain xml of the instance itself, so that when migrating an instance from one compute node to the other, nova won't depend on non-existent devices. Should this be done in Nova or in Neutron? Should Nova be notified? b. On Neutron side, we should also change the Port table to indicate a change. This might require a new RPC call from the agent side. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1652071/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1650901] [NEW] dvr gates are broken - no brctl command
Public bug reported: See [1] and [2] - console.html produce this line: "/bin/sh: 1: brctl: not found" and the job fails early on. [1]: http://logs.openstack.org/99/407099/16/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/b28dcbd/console.html [2]: http://logs.openstack.org/99/407099/16/check/gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial/f3788c0/console.html ** Affects: neutron Importance: Critical Status: Confirmed ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1650901 Title: dvr gates are broken - no brctl command Status in neutron: Confirmed Bug description: See [1] and [2] - console.html produce this line: "/bin/sh: 1: brctl: not found" and the job fails early on. [1]: http://logs.openstack.org/99/407099/16/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/b28dcbd/console.html [2]: http://logs.openstack.org/99/407099/16/check/gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial/f3788c0/console.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1650901/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1649867] Re: Gate tempest dsvm neutron dvr test fails
** Also affects: neutron Importance: Undecided Status: New ** Tags added: gate-failure l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1649867 Title: Gate tempest dsvm neutron dvr test fails Status in neutron: New Status in tempest: New Bug description: The following tests are failing in the neutron gate: tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_active_server [6.911205s] ... FAILED (tempest.api.compute.servers.test_server_addresses_negative.ServerAddressesNegativeTestJSON) [0.00s] ... FAILED tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_attached_volume [3.348451s] ... FAILED tempest.api.compute.servers.test_create_server.ServersTestManualDisk.test_verify_duplicate_network_nics [8.901531s] ... FAILED I spotted this message in logs [1]: "Connection to the hypervisor is broken on host: ubuntu-xenial-osic-cloud1-disk-6162583" Tracebacks: 2016-12-14 10:34:36.039551 | Captured traceback: 2016-12-14 10:34:36.039562 | ~~~ 2016-12-14 10:34:36.039577 | Traceback (most recent call last): 2016-12-14 10:34:36.039611 | File "tempest/api/compute/servers/test_delete_server.py", line 49, in test_delete_active_server 2016-12-14 10:34:36.039634 | waiters.wait_for_server_termination(self.client, server['id']) 2016-12-14 10:34:36.039658 | File "tempest/common/waiters.py", line 111, in wait_for_server_termination 2016-12-14 10:34:36.039693 | raise exceptions.BuildErrorException(server_id=server_id) 2016-12-14 10:34:36.039728 | tempest.exceptions.BuildErrorException: Server e127e6ff-c7bb-43a2-bbe9-c2683ffdf018 failed to build and is in ERROR status 2016-12-14 10:34:36.043578 | Captured traceback: 2016-12-14 10:34:36.043588 | ~~~ 2016-12-14 10:34:36.043602 | Traceback (most recent call last): 2016-12-14 10:34:36.043619 | File "tempest/test.py", line 100, in wrapper 2016-12-14 10:34:36.043636 | return f(self, *func_args, **func_kwargs) 2016-12-14 10:34:36.043668 | File "tempest/api/compute/servers/test_delete_server.py", line 110, in test_delete_server_while_in_attached_volume 2016-12-14 10:34:36.043687 | server = self.create_test_server(wait_until='ACTIVE') 2016-12-14 10:34:36.043709 | File "tempest/api/compute/base.py", line 232, in create_test_server 2016-12-14 10:34:36.043718 | **kwargs) 2016-12-14 10:34:36.043752 | File "tempest/common/compute.py", line 167, in create_test_server 2016-12-14 10:34:36.043763 | % server['id']) 2016-12-14 10:34:36.043798 | File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-12-14 10:34:36.043810 | self.force_reraise() 2016-12-14 10:34:36.043846 | File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-12-14 10:34:36.043864 | six.reraise(self.type_, self.value, self.tb) 2016-12-14 10:34:36.043886 | File "tempest/common/compute.py", line 149, in create_test_server 2016-12-14 10:34:36.043905 | clients.servers_client, server['id'], wait_until) 2016-12-14 10:34:36.043927 | File "tempest/common/waiters.py", line 75, in wait_for_server_status 2016-12-14 10:34:36.043939 | server_id=server_id) 2016-12-14 10:34:36.044257 | tempest.exceptions.BuildErrorException: Server b1472499-6bdc-41fb-98ca-9d1f9ef578ed failed to build and is in ERROR status 2016-12-14 10:34:36.044301 | Details: {u'code': 500, u'created': u'2016-12-14T10:05:22Z', u'message': u'No valid host was found. There are not enough hosts available.'} 2016-12-14 10:34:36.039827 | Captured traceback: 2016-12-14 10:34:36.039838 | ~~~ 2016-12-14 10:34:36.039852 | Traceback (most recent call last): 2016-12-14 10:34:36.039870 | File "tempest/test.py", line 241, in setUpClass 2016-12-14 10:34:36.039885 | six.reraise(etype, value, trace) 2016-12-14 10:34:36.039903 | File "tempest/test.py", line 234, in setUpClass 2016-12-14 10:34:36.039915 | cls.resource_setup() 2016-12-14 10:34:36.039944 | File "tempest/api/compute/servers/test_server_addresses_negative.py", line 36, in resource_setup 2016-12-14 10:34:36.039972 | cls.server = cls.create_test_server(wait_until='ACTIVE') 2016-12-14 10:34:36.039995 | File "tempest/api/compute/base.py", line 232, in create_test_server 2016-12-14 10:34:36.040005 | **kwargs) 2016-12-14 10:34:36.040027 | File "tempest/common/compute.py", line 167, in create_test_server 2016-12-14 10:34:36.040038 | % server['id
[Yahoo-eng-team] [Bug 1647432] [NEW] Multiple SIGHUPs to keepalived might trigger re-election
Public bug reported: As the title says, multiple SIGHUPs that are sent to the keepalived process might cause it to forfeit mastership and re-negotiate a new master (which might be the original master). This means that when, for example, associating/disassociating 2 floatingips in quick succession (each triggers a SIGHUP), the master node will forfeit re-election (causing it to switch to BACKUP, thus removing all the remaining FIP's IPs and severing connectivity. ** Affects: neutron Importance: High Assignee: John Schwarz (jschwarz) Status: In Progress ** Tags: l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1647432 Title: Multiple SIGHUPs to keepalived might trigger re-election Status in neutron: In Progress Bug description: As the title says, multiple SIGHUPs that are sent to the keepalived process might cause it to forfeit mastership and re-negotiate a new master (which might be the original master). This means that when, for example, associating/disassociating 2 floatingips in quick succession (each triggers a SIGHUP), the master node will forfeit re-election (causing it to switch to BACKUP, thus removing all the remaining FIP's IPs and severing connectivity. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1647432/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1645716] [NEW] Migrating HA routers to Legacy doesn't update interface's device_owner
Public bug reported: Patch I322c392529c04aca2448fd957a35f4908b323449 added a new device_owner for HA interfaces between a router and an internal subnet, which is used to differentiate it from normal, non-HA interfaces. However, when migrating a router from HA to legacy, the device_owner isn't switched back to its non-HA counterpart. This can cause migration of the router to DVR to not work properly as the snat interface isn't created. A log and reproducible can be found in [1]. [1]: http://paste.openstack.org/show/590804/ ** Affects: neutron Importance: High Assignee: John Schwarz (jschwarz) Status: Confirmed ** Tags: l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1645716 Title: Migrating HA routers to Legacy doesn't update interface's device_owner Status in neutron: Confirmed Bug description: Patch I322c392529c04aca2448fd957a35f4908b323449 added a new device_owner for HA interfaces between a router and an internal subnet, which is used to differentiate it from normal, non-HA interfaces. However, when migrating a router from HA to legacy, the device_owner isn't switched back to its non-HA counterpart. This can cause migration of the router to DVR to not work properly as the snat interface isn't created. A log and reproducible can be found in [1]. [1]: http://paste.openstack.org/show/590804/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1645716/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1562878] Re: L3 HA: Unable to complete operation on subnet
I found the bug, and it's in rally. Patch Ieab53624dc34dc687a0e8eebd84778f7fc95dd77 added a new type of router interface value for "device_owner", called "network:ha_router_replicated_interface". However, rally was not made aware of it so it thinks this interface is a normal port, trying to delete it with a normal 'neutron port-delete' (and not 'neutron router- interface-remove'). I'll adjust the bug report and will submit a fix for rally. ** Also affects: rally Importance: Undecided Status: New ** Changed in: neutron Status: Confirmed => Invalid ** Changed in: rally Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: rally Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1562878 Title: L3 HA: Unable to complete operation on subnet Status in neutron: Invalid Status in Rally: In Progress Bug description: Environment 3 controllers, 46 computes, liberty. L3 HA During execution NeutronNetworks.create_and_delete_routers several times test failed with "Unable to complete operation on subnet . One or more ports have an IP allocation from this subnet. " trace in neutron-server logs http://paste.openstack.org/show/491557/ Rally report attached. Current problem is with HA subnet. The side effect of this problem is bug https://bugs.launchpad.net/neutron/+bug/1562892 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1562878/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1635554] Re: Delete Router / race condition
No worries :) Glad we could help. ** Changed in: neutron Status: Incomplete => Invalid ** Changed in: neutron Importance: High => Undecided -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1635554 Title: Delete Router / race condition Status in neutron: Invalid Bug description: When deleting a router the logfile is filled up. CentOS7 Newton(RDO) 2016-10-21 09:45:02.526 16200 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:140 2016-10-21 09:45:02.526 16200 WARNING neutron.agent.l3.namespaces [-] Namespace qrouter-8cf5-5c5c-461c-84f3-c8abeca8f79a does not exist. Skipping delete 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent [-] Error while deleting router 8cf5-5c5c-461c-84f3-c8abeca8f79a 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 357, in _safe_router_removed 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _router_removed 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ri.delete(self) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 381, in delete 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 325, in destroy_state_change_monitor 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent pm = self._get_state_change_monitor_process_manager() 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 296, in _get_state_change_monitor_process_manager 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent default_cmd_callback=self._get_state_change_monitor_callback()) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 299, in _get_state_change_monitor_callback 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ha_device = self.get_ha_device_name() 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 137, in get_ha_device_name 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent return (HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN] 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent TypeError: 'NoneType' object has no attribute '__getitem__' 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 2016-10-21 09:45:02.528 16200 DEBUG neutron.agent.l3.agent [-] Finished a router update for 8cf5-5c5c-461c-84f3-c8abeca8f79a _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:504 See full log http://paste.openstack.org/show/586656/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1635554/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1638273] [NEW] find_child_pids crashes under non-english locals
Public bug reported: Traceback available at [1]. The function execute() returns _("Exit code: %(returncode)d; ...") [2]. Under non-English locales (we checked for Japanese, but surely this will also occur in others, the check 'Exit code: 1' in str(e) [3] will fail since 'Exit code: 1' is not encoded the same. This ultimately prevents stuff like booting a new VM. [1]: http://pastebin.com/x66aqctN [2]: https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L127 [3]: https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L176 ** Affects: neutron Importance: Critical Assignee: John Schwarz (jschwarz) Status: Confirmed ** Tags: mitaka-backport-potential newton-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1638273 Title: find_child_pids crashes under non-english locals Status in neutron: Confirmed Bug description: Traceback available at [1]. The function execute() returns _("Exit code: %(returncode)d; ...") [2]. Under non-English locales (we checked for Japanese, but surely this will also occur in others, the check 'Exit code: 1' in str(e) [3] will fail since 'Exit code: 1' is not encoded the same. This ultimately prevents stuff like booting a new VM. [1]: http://pastebin.com/x66aqctN [2]: https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L127 [3]: https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L176 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1638273/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)
Looking at the log involving the server ([1] - the same one you provided in the first comment and in comment #3), and specifically lines 19 and 21, it's clear that sync_routers() is triggering auto_schedule_routers(). Before [2] removed in, the call from sync_routers() to auto_schedule_routers() was done in line 96 of neutron/api/rpc/handlers/l3_rpc.py, as can be observed from the log: 2016-10-09 17:03:52.366 144166 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 96, in sync_routers 2016-10-09 17:03:52.366 144166 ERROR oslo_messaging.rpc.dispatcher self.l3plugin.auto_schedule_routers(context, host, router_ids) In [2], it's evident that the line 96 itself is removed. Thus, this can't be reproduced in master or in stable/mitaka and there is no (upstream) bug to fix. [1]: http://paste.openstack.org/show/585669/ [2]: https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4 ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1633306 Title: Partial HA network causing HA router creation failed (race conditon) Status in neutron: Invalid Bug description: ENV: stable/mitaka,VXLAN Neutron API: two neutron-servers behind a HA proxy VIP. Exception log: [1] http://paste.openstack.org/show/585669/ [2] http://paste.openstack.org/show/585670/ Log [1] shows that the subnet of HA network is concurrently deleted while a new HA router create API comes. Seems the race conditon described in this bug is till exists : https://bugs.launchpad.net/neutron/+bug/1533440, where has description said: """ Some known exceptions: ... 2. IpAddressGenerationFailure: (HA port created failed due to the concurrently HA subnet deletion) ... """ Log [2] has a very strange behavior that those 3 APIs have a same request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e]. Test scenario: Just create one HA router for a tenant, and then quickly delete it. For now, our mitaka ENV use VxLAN as tenant network type. So there is a very large range of VNI. So don't save that, and locally a temporary solution, we add a new config to decide whether delete the HA network every time. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)
Adding a new configuration option is almost never temporary as deleting config options is rarely backward-compatible. The race condition, as I understand it, is as following: 1. Create HA router, have worker1 send 'router_updated' to agent1. 2. Delete HA router (done by worker2). worker2 will now detect that there are no more HA routers and will delete the HA network for the tenant. 3. agent1 issues a 'sync_router', which triggers auto_schedule_routers. create_ha_port_and_bind will try to create the HA port but there are no more IP addresses available, causing add_ha_port to fail as specified in the first paste. Point #3 is a bit weird to me, as it looks like IPAM is detecting a "network deleted during function run" as "no more IP addresses". In addition, this should be caught by [2], forcing a silent retrigger of this issue. Aside from the issue that isn't clear to me, I'd like to point out that the latest stable/mitaka [1] doesn't even trigger auto_schedule_routers on sync_router (not since [3] - perhaps you're missing this backport?) - hence the trace received in the first paste can't be reproduced. For this reason, I'm closing this as Invalid. Liu, feel free to reopen if you disagree with my assessment :) [1]: https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/api/rpc/handlers/l3_rpc.py#L79 [2]: https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/common/utils.py#L726 [3]: https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4 (5860fb21e966ab8f1e011654dd477d7af35f7a27 is the latest stable/mitaka hash that github.com provided.) ** Changed in: neutron Importance: High => Undecided ** Changed in: neutron Status: Confirmed => Invalid ** Changed in: neutron Milestone: ocata-1 => None -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1633306 Title: Partial HA network causing HA router creation failed (race conditon) Status in neutron: Invalid Bug description: ENV: stable/mitaka,VXLAN Neutron API: two neutron-servers behind a HA proxy VIP. Exception log: [1] http://paste.openstack.org/show/585669/ [2] http://paste.openstack.org/show/585670/ Log [1] shows that the subnet of HA network is concurrently deleted while a new HA router create API comes. Seems the race conditon described in this bug is till exists : https://bugs.launchpad.net/neutron/+bug/1533440, where has description said: """ Some known exceptions: ... 2. IpAddressGenerationFailure: (HA port created failed due to the concurrently HA subnet deletion) ... """ Log [2] has a very strange behavior that those 3 APIs have a same request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e]. Test scenario: Just create one HA router for a tenant, and then quickly delete it. For now, our mitaka ENV use VxLAN as tenant network type. So there is a very large range of VNI. So don't save that, and temporarily solution, we add a new config to decide whether delete the HA network every time. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1633042] [NEW] L3 scheduler: make RouterL3AgentBinding always concurrently safe
Public bug reported: Changeset I3447ea5bcb7c57365c6f50efe12a1671e86588b3 added a binding_index column to the RouterL3AgentBinding table, which is unique with the router_id. However, the current logic isn't concurrent safe as some concurrent cases can raise a DBDuplicateEntry (if the same binding_index is being used by 2 different workers). ** Affects: neutron Importance: Medium Assignee: John Schwarz (jschwarz) Status: In Progress ** Tags: l3-dvr-backlog l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1633042 Title: L3 scheduler: make RouterL3AgentBinding always concurrently safe Status in neutron: In Progress Bug description: Changeset I3447ea5bcb7c57365c6f50efe12a1671e86588b3 added a binding_index column to the RouterL3AgentBinding table, which is unique with the router_id. However, the current logic isn't concurrent safe as some concurrent cases can raise a DBDuplicateEntry (if the same binding_index is being used by 2 different workers). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1633042/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1628886] [NEW] test_reprocess_port_when_ovs_restarts fails nondeterministicly
Public bug reported: Encountered in https://review.openstack.org/#/c/365326/8/, specifically http://logs.openstack.org/26/365326/8/check/gate-neutron-dsvm- functional-ubuntu-trusty/cc5f8eb/testr_results.html.gz Stack trace from tempest (if the logs are deleted from the server): http://paste.openstack.org/show/583476/ Stack trace from dsvm-functional log dir: http://paste.openstack.org/show/583478/ ** Affects: neutron Importance: High Status: Confirmed ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1628886 Title: test_reprocess_port_when_ovs_restarts fails nondeterministicly Status in neutron: Confirmed Bug description: Encountered in https://review.openstack.org/#/c/365326/8/, specifically http://logs.openstack.org/26/365326/8/check/gate-neutron- dsvm-functional-ubuntu-trusty/cc5f8eb/testr_results.html.gz Stack trace from tempest (if the logs are deleted from the server): http://paste.openstack.org/show/583476/ Stack trace from dsvm-functional log dir: http://paste.openstack.org/show/583478/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1628886/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test
This seems like a bug to me. I understand that it stands as a limitation that keepalived always selects the higher-IP to be master, but then I would expect the non-higher-IP nodes to revert to backups. If this isn't the case (as it seems from what Ann and Gustavo write) then this is a bug. Reopening. ** Changed in: neutron Status: Opinion => Confirmed ** Changed in: neutron Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1580648 Title: Two HA routers in master state during functional test Status in neutron: Confirmed Bug description: Scheduling ha routers end with two routers in master state. Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - after preparing new functional test. ha_router.py in method - _get_state_change_monitor_callback() is starting a neutron-keepalived-state-change process with parameter --monitor-interface as ha_device (ha-xxx) and it's IP address. That application is monitoring using "ip netns exec xxx ip -o monitor address" all changes in that namespace. Each addition of that ha-xxx device produces a call to neutron-server API that this router becomes "master". It's producing false results because that device doesn't tell anything about that router is master or not. Logs from test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection Agent2: 2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] Launching async process [ip netns exec qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor address]. start /neutron/neutron/agent/linux/async_process.py:109 2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82 2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run /neutron/neutron/agent/l3/keepalived_state_change.py:59 2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process runs with uid/gid: 1000/1000 2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87 2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] State: master notify_agent /neutron/neutron/agent/l3/keepalived_state_change.py:93 Agent1: 2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] Launching async process [ip netns exec qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. start /neutron/neutron/agent/linux/async_process.py:109 2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82 2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run /neutron/neutron/agent/l3/keepalived_state_change.py:59 2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process runs with uid/gid: 1000/1000 2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True parse_and_hand
[Yahoo-eng-team] [Bug 1621086] Re: Port delete on router interface remove
Looks like this is working as planned. ** Changed in: neutron Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1621086 Title: Port delete on router interface remove Status in neutron: Opinion Bug description: 1. I create port, then router and then use add_router_interface. 2. Then I use remove_router_interface. 3. Port is deleted - and this is unexpected (for me, at least). I was using Heat on devstack master to test this. Template for stack with port: resources: media_port: type: OS::Neutron::Port properties: name: media_port network: private Template for stack with router and router interface: heat_template_version: newton resources: media_router: type: OS::Neutron::Router media_router_interface: type: OS::Neutron::RouterInterface properties: router: { get_resource: media_router } port: media_port When I delete second stack, port from first stack is also deleted in neutron. https://github.com/openstack/python-neutronclient/blob/master/neutronclient/v2_0/client.py#L873-L876 that is called method and body here will be: { 'port_id': 'SOMEID' } To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1621086/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1605966] Re: L3 HA: VIP doesn't changed if qr interface or qg interface was down
Marking this as Incomplete seeing as how the no progress has been made on the bug report or on the patch. ** Changed in: neutron Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1605966 Title: L3 HA: VIP doesn't changed if qr interface or qg interface was down Status in neutron: Invalid Bug description: === Problem Description == Currently, in L3 HA, we track "ha" interface to determine whether a VIP address should be failover. Unfortunately, if a qr or qg interface was down, VIP address will not failover. Because we don't track these interfaces in a router. === How to reproduce === Create a HA router and attaching a subnet on it. So that there will be a keepalived process to monitor this router. Go into the L3 router we created it above. Execute "ip link set qr- xxx down". As we don't except, VIP address doesn't failover. == How to resolve it == In current keepalived configure file, like this: vrrp_instance VR_2 { state BACKUP interface ha-c00c7b49-d5 virtual_router_id 2 priority 50 garp_master_delay 60 nopreempt advert_int 2 track_interface { ha-c00c7b49-d5 } virtual_ipaddress { 169.254.0.2/24 dev ha-c00c7b49-d5 } virtual_ipaddress_excluded { 2.2.2.1/24 dev qr-b312f788-9b fe80::f816:3eff:feac:fa12/64 dev qr-b312f788-9b scope link } } Track interfaces only include "ha" interface, so VIP will not changed if "qr" or "qg" interface was down. To address this, we track both "qr" and "qg" interfaces, like this: vrrp_instance VR_2 { state BACKUP interface ha-c00c7b49-d5 virtual_router_id 2 priority 50 garp_master_delay 60 nopreempt advert_int 2 track_interface { qr-xxx qg-xxx ha-c00c7b49-d5 } virtual_ipaddress { 169.254.0.2/24 dev ha-c00c7b49-d5 } virtual_ipaddress_excluded { 2.2.2.1/24 dev qr-b312f788-9b fe80::f816:3eff:feac:fa12/64 dev qr-b312f788-9b scope link } } By doing this, if qr or qg interface was down unfortunately, HA router will failover. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1605966/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1605282] Re: Transaction rolled back while creating HA router
This should have been mitigated by https://review.openstack.org/#/c/364278/10/neutron/scheduler/l3_agent_scheduler.py@207 so I'm closing this. ** Changed in: neutron Status: In Progress => Fix Released ** Changed in: neutron Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1605282 Title: Transaction rolled back while creating HA router Status in neutron: Fix Released Bug description: The stacktrace can be found here: http://paste.openstack.org/show/539052/ This was discovered while running the create_and_delete_router rally test with a high (~10) concurrency number. I encountered this on stable/mitaka so it's interesting to see if this reproduces on master. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1605282/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1619312] [NEW] dvr: can't migrate legacy router to DVR
Public bug reported: As the title say: 2016-09-01 16:38:46.026 ERROR neutron.api.v2.resource [req-d738cdb2-01bb-41a7-a2a9-534bf8b06377 admin 85a2b05da4be46b19bc5f7cf41055e45] update failed: No details. 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource Traceback (most recent call last): 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/api/v2/resource.py", line 79, in resource 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource result = method(request=request, **args) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/api/v2/base.py", line 575, in update 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return self._update(request, id, body, **kwargs) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 151, in wrapper 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource ectxt.value = e.inner_exc 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource self.force_reraise() 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 139, in wrapper 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return f(*args, **kwargs) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/api.py", line 82, in wrapped 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource traceback.format_exc()) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource self.force_reraise() 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/api.py", line 77, in wrapped 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return f(*args, **kwargs) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/api/v2/base.py", line 623, in _update 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource obj = obj_updater(request.context, id, **kwargs) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/extraroute_db.py", line 76, in update_router 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource context, id, router) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/l3_db.py", line 1722, in update_router 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource id, router) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/l3_db.py", line 282, in update_router 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource router_db = self._update_router_db(context, id, r) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/l3_hamode_db.py", line 533, in _update_router_db 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource context, router_id, data) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/l3_dvr_db.py", line 143, in _update_router_db 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource context.elevated(), router_db): 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/l3_dvr_db.py", line 829, in _create_snat_intf_ports_if_not_exists 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource intf['fixed_ips'][0]['subnet_id'], do_pop=False) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/db/l3_dvr_db.py", line 782, in _add_csnat_router_interface_port 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource {'port': port_data}) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/plugins/common/utils.py", line 197, in create_port 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return core_plugin.create_port(context, {'port': port_data}) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource File "/opt/openstack/neutron/neutron/common/utils.py", line 617, in inner 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource "transaction.") % f) 2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource Ru
[Yahoo-eng-team] [Bug 1612192] [NEW] L3 DVR: Unable to complete operation on subnet
Public bug reported: There is a new gate failure that can be found using the following logstash query: message:"One or more ports have an IP allocation from this subnet" && filename:"console.html" && build_queue:"gate" This seems to be specific to DVR jobs and is separate from [1] (see comment #7 on that bug report). [1]: https://bugs.launchpad.net/neutron/+bug/1562878 ** Affects: neutron Importance: Critical Status: New ** Tags: gate-failure l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1612192 Title: L3 DVR: Unable to complete operation on subnet Status in neutron: New Bug description: There is a new gate failure that can be found using the following logstash query: message:"One or more ports have an IP allocation from this subnet" && filename:"console.html" && build_queue:"gate" This seems to be specific to DVR jobs and is separate from [1] (see comment #7 on that bug report). [1]: https://bugs.launchpad.net/neutron/+bug/1562878 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1612192/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1610645] [NEW] Migrating last HA router to legacy doesn't delete HA network
Public bug reported: As the title suggests, migrating a tenant's the last HA router from HA to legacy, doesn't cleanup the HA network. [stack@js16 ~]$ neutron router-create x --ha=True Created a new router: +-+--+ | Field | Value| +-+--+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | description | | | distributed | False| | external_gateway_info | | | flavor_id | | | ha | True | | id | 2bafaae3-776b-4707-958b-f1df77d832fb | | name| x| | revision| 2| | routes | | | status | ACTIVE | | tenant_id | 20482218062b458589b9cffa3a1bb172 | +-+--+ [stack@js16 ~]$ neutron router-update x --admin_state_up=False Updated router: x [stack@js16 ~]$ neutron router-update x --ha=False Updated router: x [stack@js16 ~]$ neutron router-delete x Deleted router: x [stack@js16 ~]$ neutron net-list +--++--+ | id | name | subnets | +--++--+ | 088ffed2-27a0-422e-b92c-c388e825cf8f | HA network tenant 20482218062b458589b9cffa3a1bb172 | 92ee2c83-8fdb-4767-90b3-bfb69fca452f 169.254.192.0/18| | e0e366ee-8a94-4753-8b9a-474bf692fb99 | public | e0933b88-7baf-47a9-84d8-7c98e140f747 172.24.4.0/24 | | | | 400fa2e9-f373-4c7b-954c-f419d6dfba7b 2001:db8::/64 | | fa13ed8e-dd44-4499-a3e8-531a25f26256 | private | 26f2f679-c255-4c36-93ac-7f6ec3e98ffe fd22:5205:4fcc::/64 | | | | ed866867-1611-4919-bb3a-1ff0b4f1d36a 10.0.0.0/24 | +--++--+ ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: New ** Tags: l3-ha ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1610645 Title: Migrating last HA router to legacy doesn't delete HA network Status in neutron: New Bug description: As the title suggests, migrating a tenant's the last HA router from HA to legacy, doesn't cleanup the HA network. [stack@js16 ~]$ neutron router-create x --ha=True Created a new router: +-+--+ | Field | Value| +-+--+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | description | | | distributed | False| | external_gateway_info | | | flavor_id | | | ha | True | | id | 2bafaae3-776b-4707-958b-f1df77d832fb | | name| x| | revision| 2| | routes | | | status | ACTIVE | | tenant_id | 20482218062b458589b9cffa3a1bb172 | +-+--+ [s
[Yahoo-eng-team] [Bug 1609738] [NEW] l3-ha: a router can be stuck in the ALLOCATING state
Public bug reported: The scenario is a simple one: during the creation of a router, the server that deals with the request crashes after creating the router with the ALLOCATING state [1] but before it's changed to ACTIVE [2]. In this case, the router will be "stuck" in the ALLOCATING and the only admin action to change the router back to ACTIVE (and allow it to be scheduled to agents) is: 1. set admin-state-up to False 2. set ha to False 3. set ha to True 4. set admin-state-up to True That is, a full migration of the HA router to legacy and back to HA is required. This will trigger the code in [3] and will fix this issue. The proposed solution is to add a new state, such that if admin-state-up is changed to False then the router's status will be changed to "DOWN" (as opposed to the current "ACTIVE", which doesn't make much sense since admin-state-up is False). [1]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L469 [2]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L485 [3]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L570 ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: New ** Tags: l3-ha ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1609738 Title: l3-ha: a router can be stuck in the ALLOCATING state Status in neutron: New Bug description: The scenario is a simple one: during the creation of a router, the server that deals with the request crashes after creating the router with the ALLOCATING state [1] but before it's changed to ACTIVE [2]. In this case, the router will be "stuck" in the ALLOCATING and the only admin action to change the router back to ACTIVE (and allow it to be scheduled to agents) is: 1. set admin-state-up to False 2. set ha to False 3. set ha to True 4. set admin-state-up to True That is, a full migration of the HA router to legacy and back to HA is required. This will trigger the code in [3] and will fix this issue. The proposed solution is to add a new state, such that if admin-state- up is changed to False then the router's status will be changed to "DOWN" (as opposed to the current "ACTIVE", which doesn't make much sense since admin-state-up is False). [1]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L469 [2]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L485 [3]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L570 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1609738/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1609665] [NEW] Updating a router to HA without enough agents results in partial update
Public bug reported: As the title says, updating a non-HA router to be HA while there is less than the minimum available l3 agents to handle this router results in an invalid state caused by a partial update. [stack@js16 ~]$ neutron router-create --ha=False x neutron router-update x --admin-0sCreated a new router: +-+--+ | Field | Value| +-+--+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | description | | | distributed | False| | external_gateway_info | | | ha | False| | id | 488a0eab-bf7a-4aea-84a4-4146a79eb225 | | name| x| | routes | | | status | ACTIVE | | tenant_id | 20482218062b458589b9cffa3a1bb172 | +-+--+ [stack@js16 ~]$ neutron router-update x --admin-state-up=False Updated router: x [stack@js16 ~]$ neutron router-update x --ha=True Not enough l3 agents available to ensure HA. Minimum required 2, available 1. Neutron server returns request_ids: ['req-4c5400c5-465e-419b-aeda-e637a76c29a1'] [stack@js16 ~]$ neutron router-show x +-+--+ | Field | Value| +-+--+ | admin_state_up | False| | availability_zone_hints | | | availability_zones | | | description | | | distributed | False| | external_gateway_info | | | ha | True | | id | 488a0eab-bf7a-4aea-84a4-4146a79eb225 | | name| x| | routes | | | status | ALLOCATING | | tenant_id | 20482218062b458589b9cffa3a1bb172 | +-+--+ [stack@js16 ~]$ neutron l3-agent-list-hosting-router x [stack@js16 ~]$ The router is set to HA and the status is stuck in ALLOCATING even though it wasn't scheduled to any agent. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: New ** Tags: l3-ha ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1609665 Title: Updating a router to HA without enough agents results in partial update Status in neutron: New Bug description: As the title says, updating a non-HA router to be HA while there is less than the minimum available l3 agents to handle this router results in an invalid state caused by a partial update. [stack@js16 ~]$ neutron router-create --ha=False x neutron router-update x --admin-0sCreated a new router: +-+--+ | Field | Value| +-+--+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | description | | | distributed | False| | external_gateway_info | | | ha | False| | id | 488a0eab-bf7a-4aea-84a4-4146a79eb225 | | name| x| | routes | | | status | ACTIVE | | tenant_id | 20482218062b458589b9cffa3a1bb172 | +-+--+ [stack@js16 ~]$ neutron router-update x --admin-state-up=False Updated router: x [stack@js16
[Yahoo-eng-team] [Bug 1531254] Re: Support migrating of legacy routers to HA and back
** Changed in: neutron Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1531254 Title: Support migrating of legacy routers to HA and back Status in neutron: Fix Released Bug description: https://review.openstack.org/260528 Dear bug triager. This bug was created since a commit was marked with DOCIMPACT. Your project "openstack/neutron" is set up so that we directly report the documentation bugs against it. If this needs changing, the docimpact-group option needs to be added for the project. You can ask the OpenStack infra team (#openstack-infra on freenode) for help if you need to. commit 42f4332a2b6c7aaeadc9c1bdc87f6d4bf4b662d7 Author: John Schwarz Date: Mon Oct 12 16:54:17 2015 +0300 Support migrating of legacy routers to HA and back This patch adds support for migration of legacy routers to HA and vice-versa. This patch also: 1. Reverts I4171ab481e3943e0110bd9a300d965bbebe44871, which was used to disable such migrations until support was inserted to the codebase. 2. Adds an exception to indicate that such migrations are only available on routers that have their admin_state_up set to False. (cherry picked from commit 416c76bc6e01ef433506e4aa4ebd7c76b57acc51) Closes-Bug: #1365426 DocImpact (Handled in patch 233695) Change-Id: Ie92f8033f47e1bf9ba6310373b3bfc9833317580 Conflicts: neutron/db/l3_hamode_db.py neutron/tests/unit/db/test_l3_hamode_db.py To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1531254/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1605282] Re: Transaction rolled back while creating HA router
** Changed in: neutron Status: New => Opinion ** Changed in: neutron Status: Opinion => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1605282 Title: Transaction rolled back while creating HA router Status in neutron: Confirmed Bug description: The stacktrace can be found here: http://paste.openstack.org/show/539052/ This was discovered while running the create_and_delete_router rally test with a high (~10) concurrency number. I encountered this on stable/mitaka so it's interesting to see if this reproduces on master. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1605282/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1606801] Re: deleting router run into race condition
*** This bug is a duplicate of bug 1533457 *** https://bugs.launchpad.net/bugs/1533457 ** This bug is no longer a duplicate of bug 1605546 Race with deleting HA routers ** This bug has been marked a duplicate of bug 1533457 Neutron server unable to sync HA info after race between HA router creating and deleting -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1606801 Title: deleting router run into race condition Status in neutron: New Bug description: After deleting a router the logfiles of both network nodes are filled up with " RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-3767" After i have restarted the openstack services on the network nodes, no new entries Reproduceable: yes Steps: * add router via CLI or dashboard * delete router via CLI or dashboard * logfiles grow up Openstack version: mitaka ( this error occured on liberty too ! ) OS: Centos 7, latest updates Installed Packages on nerwork nodes openstack-neutron-vpnaas-8.0.0-1.el7.noarch openstack-neutron-common-8.1.2-1.el7.noarch openstack-neutron-metering-agent-8.1.2-1.el7.noarch python-neutronclient-4.1.1-2.el7.noarch python-neutron-8.1.2-1.el7.noarch python-neutron-fwaas-8.0.0-3.el7.noarch openstack-neutron-ml2-8.1.2-1.el7.noarch openstack-neutron-bgp-dragent-8.1.2-1.el7.noarch python-neutron-vpnaas-8.0.0-1.el7.noarch openstack-neutron-openvswitch-8.1.2-1.el7.noarch openstack-neutron-8.1.2-1.el7.noarch python-neutron-lib-0.0.2-1.el7.noarch openstack-neutron-fwaas-8.0.0-3.el7.noarch Logfile network node: 2.770 44778 DEBUG neutron.agent.linux.ra [-] radvd disabled for router 37678766-597a-4e33-b83a-65142ca2ced8 disable /usr/lib/python2.7/site-packages/neutron/agent/linux/ra.py:190 2016-07-27 09:10:02.770 44778 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-37678766-597a-4e33-b83a-65142ca2ced8', 'find', '/sys/class/net', '-maxdepth', '1', '-type', 'l', '-printf', '%f '] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-37678766-597a-4e33-b83a-65142ca2ced8": No such file or directory 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent [-] Error while deleting router 37678766-597a-4e33-b83a-65142ca2ced8 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 359, in _safe_router_removed 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 377, in _router_removed 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent ri.delete(self) 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 380, in delete 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent super(HaRouter, self).delete(agent) 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 349, in delete 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent self.router_namespace.delete() 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/namespaces.py", line 100, in delete 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent for d in ns_ip.get_devices(exclude_loopback=True): 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 130, in get_devices 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent log_fail_as_error=self.log_fail_as_error 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent raise RuntimeError(msg) 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-37678766-597a-4e33-b83a-65142ca2ced8": No such file or directory 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent 2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent Attached logfiles of control node and both network nodes. At 09:09:00 -> added router At 09:10:00 -> deleted router To manage notification
[Yahoo-eng-team] [Bug 1499647] Re: test_ha_router fails intermittently
As per comment #39, this can be closed - this bug report is mostly a tracker bug and I'm under most of the races that made test_ha_router fail are resolved. Some other races are https://bugs.launchpad.net/neutron/+bug/1605285 and https://bugs.launchpad.net/neutron/+bug/1605282, but these can be addressed separately. ** Changed in: neutron Status: In Progress => Fix Released ** Changed in: neutron/kilo Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1499647 Title: test_ha_router fails intermittently Status in neutron: Fix Released Status in neutron kilo series: Fix Released Bug description: I have tested work of L3 HA on environment with 3 controllers and 1 compute (Kilo) keepalived v1.2.13 I create 50 nets with 50 subnets and 50 routers with interface is set for each subnet(Note: I've seem the same errors with just one router and net). I've got the following errors: root@node-6:~# neutron l3-agent-list-hosting-router router-1 Request Failed: internal server error while processing your request. In neutron-server error log: http://paste.openstack.org/show/473760/ When I fixed _get_agents_dict_for_router to skip None for further testing, so then I was able to see: root@node-6:~# neutron l3-agent-list-hosting-router router-1 +--+---++---+--+ | id | host | admin_state_up | alive | ha_state | +--+---++---+--+ | f3baba98-ef5d-41f8-8c74-a91b7016ba62 | node-6.domain.tld | True | :-) | active | | c9159f09-34d4-404f-b46c-a8c18df677f3 | node-7.domain.tld | True | :-) | standby | | b458ab49-c294-4bdb-91bf-ae375d87ff20 | node-8.domain.tld | True | :-) | standby | | f3baba98-ef5d-41f8-8c74-a91b7016ba62 | node-6.domain.tld | True | :-) | active | +--+---++---+--+ root@node-6:~# neutron port-list --device_id=fcf150c0-f690-4265-974d-8db370e345c4 +--+-+---++ | id | name | mac_address | fixed_ips | +--+-+---++ | 0834f8a2-f109-4060-9312-edebac84aba5 | | fa:16:3e:73:9f:33 | {"subnet_id": "0c7a2cfa-1cfd-4ecc-a196-ab9e97139352", "ip_address": "172.18.161.223"} | | 2b5a7a15-98a2-4ff1-9128-67d098fa3439 | HA port tenant aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:b8:f6:35 | {"subnet_id": "1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.192.149"} | | 48c887c1-acc3-4804-a993-b99060fa2c75 | HA port tenant aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:e7:70:13 | {"subnet_id": "1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.192.151"} | | 82ab62d6-7dd1-4294-a0dc-f5ebfbcbb4ca | | fa:16:3e:c6:fc:74 | {"subnet_id": "c4cc21c9-3b3a-407c-b4a7-b22f783377e7", "ip_address": "10.0.40.1"} | | bbca8575-51f1-4b42-b074-96e15aeda420 | HA port tenant aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:84:4c:fc | {"subnet_id": "1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.192.150"} | | bee5c6d4-7e0a-4510-bb19-2ef9d60b9faf | HA port tenant aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:09:a1:ae | {"subnet_id": "1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.193.11"} | | f8945a1d-b359-4c36-a8f8-e78c1ba992f0 | HA port tenant aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:c4:54:b5 | {"subnet_id": "1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.193.12"} | +--+-+---++ mysql root@192.168.0.2:neutron> SELECT * FROM ha_router_agent_port_bindings WHERE router_id='fcf150c0-f690-4265-974d-8db370e345c4'; +--+--+--+-+ | port_id | router_id | l3_agent_id | state | |---
[Yahoo-eng-team] [Bug 1523780] Re: Race between HA router create and HA router delete
I've gone through all 5 of the initial reported problems. There are all either fixed or referenced by other bugs: 1. DBReferenceError: referenced by https://bugs.launchpad.net/neutron/+bug/1533460 and fixed by https://review.openstack.org/#/c/260303/ 2. AttributeError: referenced by https://bugs.launchpad.net/neutron/+bug/1605546 3. DBError: referenced by https://bugs.launchpad.net/neutron/+bug/1533443 4. port["id"]: referenced by https://bugs.launchpad.net/neutron/+bug/1533457 5. concurrency error: fixed by https://review.openstack.org/#/c/254586/ Therefore, this bug can be closed. ** Changed in: neutron Status: In Progress => Invalid ** Changed in: neutron/kilo Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1523780 Title: Race between HA router create and HA router delete Status in neutron: Invalid Status in neutron kilo series: Invalid Bug description: Set more than one API worker and RPC worker, and then run rally scenario test create_and_delete_routers: you may get such errors: 1.DBReferenceError: (IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`neutron`.`ha_router_agent_port_bindings`, CONSTRAINT `ha_router_agent_port_bindings_ibfk_2` FOREIGN KEY (`router_id`) REFERENCES `routers` (`id`) ON DELETE CASCADE)') 'INSERT INTO ha_router_agent_port_bindings (port_id, router_id, l3_agent_id, state) VALUES (%s, %s, %s, %s)' ('xxx', 'xxx', None, 'standby') (InvalidRequestError: This Session's transaction has been rolled back by a nested rollback() call. To begin a new transaction, issue Session.rollback() first.) 2. AttributeError: 'NoneType' object has no attribute 'config' (l3 agent process router in router_delete function) 3. DBError: UPDATE statement on table 'ports' expected to update 1 row(s); 0 were matched. 4. res = {"id": port["id"], TypeError: 'NoneType' object is unsubscriptable 5. delete HA network during deleting the last router, get error message: "Unable to complete operation on network . There are one or more ports still in use on the network." There are a bunch of sub-bugs related to this one, basically different incarnations of race conditions in the interactions between the l3-agent and the neutron-server: https://bugs.launchpad.net/neutron/+bug/1499647 https://bugs.launchpad.net/neutron/+bug/1533441 https://bugs.launchpad.net/neutron/+bug/1533443 https://bugs.launchpad.net/neutron/+bug/1533457 https://bugs.launchpad.net/neutron/+bug/1533440 https://bugs.launchpad.net/neutron/+bug/1533454 https://bugs.launchpad.net/neutron/+bug/1533455 https://bugs.launchpad.net/neutron/+bug/1533460 (I suggest we use this main bug as a tracker for the whole thing, as reviews already reference this bug as related). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1523780/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1533441] Re: HA router can not be deleted in L3 agent after race between HA router creating and deleting
I've gone through the 2 errors initially reported: 1. Concurrency issues with HA ports: fixed by https://review.openstack.org/#/c/257059/ (introduction of the ALLOCATING status for routers) 2. AttributeError: already referenced by https://bugs.launchpad.net/neutron/+bug/1605546 So this bug can be closed. ** Changed in: neutron Status: In Progress => Invalid ** Changed in: neutron/kilo Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1533441 Title: HA router can not be deleted in L3 agent after race between HA router creating and deleting Status in neutron: Invalid Status in neutron kilo series: Invalid Bug description: HA router can not be deleted in L3 agent after race between HA router creating and deleting Exception: 1. Unable to process HA router %s without HA port (HA router initialize) 2. AttributeError: 'NoneType' object has no attribute 'config' (HA router deleting procedure) With the newest neutron code, I find a infinite loop in _safe_router_removed. Consider a HA router without HA port was placed in the l3 agent, usually because of the race condition. Infinite loop steps: 1. a HA router deleting RPC comes 2. l3 agent remove it 3. the RouterInfo will delete its the router namespace(self.router_namespace.delete()) 4. the HaRouter, ha_router.delete(), where the AttributeError: 'NoneType' or some error will be raised. 5. _safe_router_removed return False 6. self._resync_router(update) 7. the router namespace is not existed, RuntimeError raised, go to 5, infinite loop 5 - 7 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1533441/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1533440] Re: Race between deleting last HA router and a new HA router API call
3 of the 4 original issues in the first post are now fixed, and the one that isn't is addressed by a separate bug report: 1. NetworkNotFound: fixed by the introduction of _create_ha_interfaces_and_ensure_network 2. IpAddressGenerationFailure: https://bugs.launchpad.net/neutron/+bug/1562887 3. DBReferenceError: Opened a separate bug, https://bugs.launchpad.net/neutron/+bug/1533460, and fixed by https://review.openstack.org/#/c/260303/ 4. HA Network Attribute Error: fixed by the introduction of _create_ha_interfaces_and_ensure_network I think this bug can be closed. ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1533440 Title: Race between deleting last HA router and a new HA router API call Status in neutron: Fix Released Bug description: During the delete of tenant last HA router, neutron will also delete the HA network which can be racy if a new HA router API call is coming concurrently. Some known exceptions: 1. NetworkNotFound: (HA network not found when create HA router HA port) 2. IpAddressGenerationFailure: (HA port created failed due to the concurrently HA subnet deletion) 3. DBReferenceError(IntegrityError): (HA network was deleted by concurrently operation, e.g. deleting the last HA router) 4. HA Network Attribute Error http://paste.openstack.org/show/490140/ Consider using the Rally to do the following steps to reproduce the race exceptions: 1. Create 200+ tenant, each one has 2 or more user 2. Create ONLY 1 router for each tenant 3. Concurently do the following: (1) one user try to delete the LAST HA router (2) other user try to create some HA router To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1533440/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1606827] [NEW] Agents might be reported as down for 10 minutes after all controllers restart
Public bug reported: The scenario which initially revealed this issue involved multiple controllers and an extra compute node (total of 4) but it should also reproduce on deployments smaller than described. The issue is that if an agent tries to report_state to the neutron- server and it fails because of a timeout (raising oslo_messaging.MessagingTimeout), then there is an exponential back-off effect which was put in place by [1]. The feature was intended for heavy RPC calls (like get_routers()) and not for light calls such as report_state, so this can be considered a regression. This can be reproduced by restarting the controllers on a triple-O deployment and specified before. A solution would be to ensure PluginReportStateAPI doesn't use the exponential backoff, instead seeking to always time out after rpc_response_timeout. [1]: https://review.openstack.org/#/c/280595/14/neutron/common/rpc.py ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Tags: liberty-backport-potential mitaka-backport-potential ** Description changed: The scenario which initially revealed this issue involved multiple controllers and an extra compute node (total of 4) but it should also reproduce on deployments smaller than described. The issue is that if an agent tries to report_state to the neutron- server and it fails because of a timeout (raising oslo_messaging.MessagingTimeout), then there is an exponential back-off effect which was put in place by [1]. The feature was intended for heavy RPC calls (like get_routers()) and not for light calls such as - report_state, so this can be considered a regression. + report_state, so this can be considered a regression. This can be + reproduced by restarting the controllers on a triple-O deployment and + specified before. A solution would be to ensure PluginReportStateAPI doesn't use the exponential backoff, instead seeking to always time out after rpc_response_timeout. [1]: https://review.openstack.org/#/c/280595/14/neutron/common/rpc.py ** Tags added: mitaka-backport-potential ** Tags added: liberty-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1606827 Title: Agents might be reported as down for 10 minutes after all controllers restart Status in neutron: In Progress Bug description: The scenario which initially revealed this issue involved multiple controllers and an extra compute node (total of 4) but it should also reproduce on deployments smaller than described. The issue is that if an agent tries to report_state to the neutron- server and it fails because of a timeout (raising oslo_messaging.MessagingTimeout), then there is an exponential back- off effect which was put in place by [1]. The feature was intended for heavy RPC calls (like get_routers()) and not for light calls such as report_state, so this can be considered a regression. This can be reproduced by restarting the controllers on a triple-O deployment and specified before. A solution would be to ensure PluginReportStateAPI doesn't use the exponential backoff, instead seeking to always time out after rpc_response_timeout. [1]: https://review.openstack.org/#/c/280595/14/neutron/common/rpc.py To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1606827/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1605285] [NEW] StaleDataError on ha_router_agent_port_bindings update
Public bug reported: Stacktrace: http://paste.openstack.org/show/539055/ There are a number of currently opened bugs that might deal with this, but they are clouded with information that might not be relevant. I will wade through them in the upcoming days to see if I can find something similar to the stack (though at first glance I didn't). Also, this happened on stable/mitaka. It's interesting to see if this also happens on master. it reproduced while running rally's create_and_delete_routers with high (=10) concurrency. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1605285 Title: StaleDataError on ha_router_agent_port_bindings update Status in neutron: New Bug description: Stacktrace: http://paste.openstack.org/show/539055/ There are a number of currently opened bugs that might deal with this, but they are clouded with information that might not be relevant. I will wade through them in the upcoming days to see if I can find something similar to the stack (though at first glance I didn't). Also, this happened on stable/mitaka. It's interesting to see if this also happens on master. it reproduced while running rally's create_and_delete_routers with high (=10) concurrency. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1605285/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1605282] [NEW] Transaction rolled back while creating HA router
Public bug reported: The stacktrace can be found here: http://paste.openstack.org/show/539052/ This was discovered while running the create_and_delete_router rally test with a high (~10) concurrency number. I encountered this on stable/mitaka so it's interesting to see if this reproduces on master. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1605282 Title: Transaction rolled back while creating HA router Status in neutron: New Bug description: The stacktrace can be found here: http://paste.openstack.org/show/539052/ This was discovered while running the create_and_delete_router rally test with a high (~10) concurrency number. I encountered this on stable/mitaka so it's interesting to see if this reproduces on master. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1605282/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1560945] [NEW] Unable to create DVR+HA routers
Public bug reported: When creating a new DVR+HA, the router is created (the API returns successfully) but the l3 agent enters an endless loop: 2016-03-23 13:57:37.340 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 'a04b3fd7-d46c-4520-82af-18d16835469d' 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 497, in _process_router_update 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self._process_router_if_compatible(router) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 436, in _process_router_if_compatible 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self._process_updated_router(router) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 450, in _process_updated_router 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent ri.process(self) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/dvr_edge_ha_router.py", line 92, in process 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(DvrEdgeHaRouter, self).process(agent) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/dvr_local_router.py", line 486, in process 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(DvrLocalRouter, self).process(agent) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/dvr_router_base.py", line 30, in process 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(DvrRouterBase, self).process(agent) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/ha_router.py", line 386, in process 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(HaRouter, self).process(agent) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/common/utils.py", line 377, in call 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self.logger(e) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self.force_reraise() 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/common/utils.py", line 374, in call 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent return func(*args, **kwargs) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 963, in process 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self.process_address_scope() 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/dvr_edge_router.py", line 235, in process_address_scope 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent with snat_iptables_manager.defer_apply(): 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'defer_apply' 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent This happens in upstream master. ** Affects: neutron Importance: Undecided Status: New ** Tags: l3-bgp l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1560945 Title: Unable to create DVR+HA routers Status in neutron: New Bug description: When creating a new DVR+HA, the router is created (the API returns successfully) but the l3 agent enters an endless loop: 2016-03-23 13:57:37.340 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 'a04b3fd7-d46c-4520-82af-18d16835469d' 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 497, in _process_router_update 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self._process_router_if_compatible(router) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 436, in _process_router_if_compatible 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self._process_updated_router(router) 2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 450, in _pro
[Yahoo-eng-team] [Bug 1552680] [NEW] [RFE] Add support for DLM
Public bug reported: Neutron has many code paths that can collide and be raceful which each other. Current ongoing work can mitigate and minimize these races but work is slow and it's very hard to fight against what you don't know (ie. there can always be more races you're not aware of). A DLM (Distributed Lock Mechanism) such as tooz [1] can help mitigate this greatly. An excellent example of this racefulness in Neutron is the L3's auto_schedule_routers functionality. When creating a tenant's first HA router more resources must also be created (such as a HA network and HA ports). This specific flow of creating the resources can be invoke simultaneously by 2 codepaths: the original create_router (invoked from the REST API) and from the L3 agent's get_router_ids/sync_routers. These simultaneous runs can produce many races, such as creating 2 HA networks (where only one should exist), accidentally deleting valid port bindings and more. Instead of hunting down these races (which can be a long and inaccurate task since more races can always exist), this can be solved much easily by locking the operations done on a single router_id. Using tooz [1] allows for a distributed lock, which crosses all the API/RPC workers on a single server and even crosses multiple neutron- servers. Also, this will help mitigate all sort of races with different resources (a lock can be associated with a uuid so it won't matter if the uuid is a router_id, network_id) [1]: https://github.com/openstack/tooz/tree/master/ ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1552680 Title: [RFE] Add support for DLM Status in neutron: New Bug description: Neutron has many code paths that can collide and be raceful which each other. Current ongoing work can mitigate and minimize these races but work is slow and it's very hard to fight against what you don't know (ie. there can always be more races you're not aware of). A DLM (Distributed Lock Mechanism) such as tooz [1] can help mitigate this greatly. An excellent example of this racefulness in Neutron is the L3's auto_schedule_routers functionality. When creating a tenant's first HA router more resources must also be created (such as a HA network and HA ports). This specific flow of creating the resources can be invoke simultaneously by 2 codepaths: the original create_router (invoked from the REST API) and from the L3 agent's get_router_ids/sync_routers. These simultaneous runs can produce many races, such as creating 2 HA networks (where only one should exist), accidentally deleting valid port bindings and more. Instead of hunting down these races (which can be a long and inaccurate task since more races can always exist), this can be solved much easily by locking the operations done on a single router_id. Using tooz [1] allows for a distributed lock, which crosses all the API/RPC workers on a single server and even crosses multiple neutron- servers. Also, this will help mitigate all sort of races with different resources (a lock can be associated with a uuid so it won't matter if the uuid is a router_id, network_id) [1]: https://github.com/openstack/tooz/tree/master/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1552680/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1550886] [NEW] L3 Agent's fullsync is raceful with creation of HA router
Public bug reported: When creating an HA router, after the server creates all the DB objects (including the HA network and ports if it's the first one), the server continues on the schedule the router to (some of) the available agents. The race is achieved when an L3 agent router issues a sync_router request, which later down the line ends up in an auto_schedule_routers() call. If this happens before the above schedule (of the create_router()) is complete, the server will refuse to schedule the router to the other intended L3 agents, resulting is less agents being scheduled. The only way to fix this is either restarting one of the L3 agents which didn't get scheduled, or recreating the router. Either is a bad option. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Tags: l3-ha ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1550886 Title: L3 Agent's fullsync is raceful with creation of HA router Status in neutron: In Progress Bug description: When creating an HA router, after the server creates all the DB objects (including the HA network and ports if it's the first one), the server continues on the schedule the router to (some of) the available agents. The race is achieved when an L3 agent router issues a sync_router request, which later down the line ends up in an auto_schedule_routers() call. If this happens before the above schedule (of the create_router()) is complete, the server will refuse to schedule the router to the other intended L3 agents, resulting is less agents being scheduled. The only way to fix this is either restarting one of the L3 agents which didn't get scheduled, or recreating the router. Either is a bad option. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1550886/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1546490] [NEW] Security groups don't work with fullstack
Public bug reported: Iptables doesn't work properly with fullstack, as can be observed in [1]. The gist is that since all ovs-agents are running on the same namespace, they try to override each other's iptables, causing the failures. This will obviously cause security groups to fail. Also, Assaf Muller mentioned that since FakeMachines are directly connected to br-int, security groups will also not work properly on them. Instead, they should be connected through an intermediary linuxbridge. [1]: http://logs.openstack.org/71/270971/3/check/gate-neutron-dsvm- fullstack/c913b51/logs/TestConnectivitySameNetwork.test_connectivity_VLANs,Ofctl_ /neutron-openvswitch-agent--2016-02-14-- 11-40-19-078390.log.txt.gz#_2016-02-14_11_41_03_165 ** Affects: neutron Importance: Undecided Status: Confirmed ** Tags: fullstack -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1546490 Title: Security groups don't work with fullstack Status in neutron: Confirmed Bug description: Iptables doesn't work properly with fullstack, as can be observed in [1]. The gist is that since all ovs-agents are running on the same namespace, they try to override each other's iptables, causing the failures. This will obviously cause security groups to fail. Also, Assaf Muller mentioned that since FakeMachines are directly connected to br-int, security groups will also not work properly on them. Instead, they should be connected through an intermediary linuxbridge. [1]: http://logs.openstack.org/71/270971/3/check/gate-neutron-dsvm- fullstack/c913b51/logs/TestConnectivitySameNetwork.test_connectivity_VLANs,Ofctl_ /neutron-openvswitch-agent--2016-02-14-- 11-40-19-078390.log.txt.gz#_2016-02-14_11_41_03_165 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1546490/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1523845] [NEW] Pip package 'ovs' needed but not present in requirements.txt
Public bug reported: As the title mentions, the 'ovs' pip package is needed for [1], but is not present in the requirements.txt [2] and it should be changed to reflect this dependency. [1]: https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/neutron/agent/ovsdb/impl_idl.py#L21 [2]: https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/requirements.txt ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1523845 Title: Pip package 'ovs' needed but not present in requirements.txt Status in neutron: In Progress Bug description: As the title mentions, the 'ovs' pip package is needed for [1], but is not present in the requirements.txt [2] and it should be changed to reflect this dependency. [1]: https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/neutron/agent/ovsdb/impl_idl.py#L21 [2]: https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/requirements.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1523845/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1520271] [NEW] L3's metadata fail to run on Python 2.7.5 when metadata_proxy_watch_log=True
Public bug reported: Following [1], Neutron depends on WatchedFileHandler having a 'delay' property. This attribute is not defined in Python's API [2] but Neutron depends on it anyway. In Python 2.7.6 and later versions (like the one running at the gate), this attribute exists, but in 2.7.5 and below it does not, causing metadata to not run. [1]: https://review.openstack.org/#/c/161494/18 [2]: https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler ** Affects: neutron Importance: Undecided Status: New ** Summary changed: - L3's metadata functional tests fail on Python 2.7.5 + L3's metadata fail to run on Python 2.7.5 when metadata_proxy_watch_log=True -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1520271 Title: L3's metadata fail to run on Python 2.7.5 when metadata_proxy_watch_log=True Status in neutron: New Bug description: Following [1], Neutron depends on WatchedFileHandler having a 'delay' property. This attribute is not defined in Python's API [2] but Neutron depends on it anyway. In Python 2.7.6 and later versions (like the one running at the gate), this attribute exists, but in 2.7.5 and below it does not, causing metadata to not run. [1]: https://review.openstack.org/#/c/161494/18 [2]: https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1520271/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1506503] [NEW] OVS agents periodically fail to start in fullstack
Public bug reported: Changeset [1] introduced a validation that the local_ip specified for tunneling is actually used by one of the devices on the machine running an OVS agent. In Fullstack, multiple tests may run concurrently, which can cause a race condition: suppose an ovs agent starts running as part of test A. It retrieves the list of all devices on the host and starts a sequential loop on them. In the mean time, some *other* fullstack test (test B) completes and deletes the devices it created. The agent has that deleted device in the list and when it will reach the device it will find out it does not exist and crash. [1]: https://review.openstack.org/#/c/154043/ ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Tags: fullstack ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1506503 Title: OVS agents periodically fail to start in fullstack Status in neutron: In Progress Bug description: Changeset [1] introduced a validation that the local_ip specified for tunneling is actually used by one of the devices on the machine running an OVS agent. In Fullstack, multiple tests may run concurrently, which can cause a race condition: suppose an ovs agent starts running as part of test A. It retrieves the list of all devices on the host and starts a sequential loop on them. In the mean time, some *other* fullstack test (test B) completes and deletes the devices it created. The agent has that deleted device in the list and when it will reach the device it will find out it does not exist and crash. [1]: https://review.openstack.org/#/c/154043/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1506503/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1506021] [NEW] AsyncProcess.stop() can lead to deadlock
Public bug reported: The bug occurs when calling stop() on an AsyncProcess instance which is running a progress generating substantial amounts of output to stdout/stderr and that has a signal handler for some signal (SIGTERM for example) that causes the program to exit gracefully. Linux Pipes 101: when calling write() to some one-way pipe, if the pipe is full of data [1], write() will block until the other end read()s from the pipe. AsyncProcess is using eventlet.green.subprocess to create an eventlet- safe subprocess, using stdout=subprocess.PIPE and stderr=subprocess.PIPE. In other words, stdout and stderr are redirected to a one-way linux pipe to the executing AsyncProcess. When stopping the subprocess, the current code [2] first kills the readers used to empty stdout/stderr and only then sends the signal. It is clear that if SIGTERM is sent to the subprocess, and if the subprocess is generating a lot of output to stdout/stderr AFTER the readers were killed, a deadlock is achieved: the parent process is blocking on wait() and the subprocess is blocking on write() (waiting for someone to read and empty the pipe). This can be avoided by sending SIGKILL to the AsyncProcesses (this is the code's default), but other signals such as SIGTERM, that can be handled by the userspace code to cause the process to exit gracefully, might trigger this deadlock. For example, I ran into this while trying to modify existing fullstack tests to SIGTERM processes instead of SIGKILL them, and the ovs agent got deadlocked a lot. [1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity") [2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163 ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: New ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1506021 Title: AsyncProcess.stop() can lead to deadlock Status in neutron: New Bug description: The bug occurs when calling stop() on an AsyncProcess instance which is running a progress generating substantial amounts of output to stdout/stderr and that has a signal handler for some signal (SIGTERM for example) that causes the program to exit gracefully. Linux Pipes 101: when calling write() to some one-way pipe, if the pipe is full of data [1], write() will block until the other end read()s from the pipe. AsyncProcess is using eventlet.green.subprocess to create an eventlet- safe subprocess, using stdout=subprocess.PIPE and stderr=subprocess.PIPE. In other words, stdout and stderr are redirected to a one-way linux pipe to the executing AsyncProcess. When stopping the subprocess, the current code [2] first kills the readers used to empty stdout/stderr and only then sends the signal. It is clear that if SIGTERM is sent to the subprocess, and if the subprocess is generating a lot of output to stdout/stderr AFTER the readers were killed, a deadlock is achieved: the parent process is blocking on wait() and the subprocess is blocking on write() (waiting for someone to read and empty the pipe). This can be avoided by sending SIGKILL to the AsyncProcesses (this is the code's default), but other signals such as SIGTERM, that can be handled by the userspace code to cause the process to exit gracefully, might trigger this deadlock. For example, I ran into this while trying to modify existing fullstack tests to SIGTERM processes instead of SIGKILL them, and the ovs agent got deadlocked a lot. [1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity") [2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1506021/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1505203] [NEW] Setting admin_state_up=False on an HA router with gateway raises an exception
Public bug reported: Steps to reproduce: 1. Create an HA router, 2. Connect the router to a gateway, 3. neutron router-update --admin-state-down=False This results in the following traceback on the l3 agent: 2015-10-12 14:43:44.755 ERROR neutron.agent.l3.router_info [-] Command: ['ip', 'netns', 'exec', u'qrouter-0ce494ff-593a-4d6d-bf06-248979d6cf7a', 'ip', '-4', 'addr', 'del', '172.24.4.11/24', 'dev', u'qg-4f6a7587-00'] Exit code: 2 Stdin: Stdout: Stderr: RTNETLINK answers: Cannot assign requested address 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Traceback (most recent call last): 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/common/utils.py", line 356, in call 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info return func(*args, **kwargs) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 695, in process 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info self.process_external(agent) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 661, in process_external 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info self._process_external_gateway(ex_gw_port, agent.pd) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 575, in _process_external_gateway 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info self.external_gateway_removed(self.ex_gw_port, interface_name) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/l3/ha_router.py", line 368, in external_gateway_removed 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info interface_name) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 550, in external_gateway_removed 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info ip_addr['prefixlen'])) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 201, in remove_external_gateway_ip 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info device.delete_addr_and_conntrack_state(ip_cidr) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 255, in delete_addr_and_conntrack_state 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info self.addr.delete(cidr) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 511, in delete 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 'dev', self.name)) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 295, in _as_root 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info use_root_namespace=use_root_namespace) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 80, in _as_root 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info log_fail_as_error=self.log_fail_as_error) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 89, in _execute 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info log_fail_as_error=log_fail_as_error) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info File "/opt/openstack/neutron/neutron/agent/linux/utils.py", line 157, in execute 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info raise RuntimeError(m) 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info RuntimeError: 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Command: ['ip', 'netns', 'exec', u'qrouter-0ce494ff-593a-4d6d-bf06-248979d6cf7a', 'ip', '-4', 'addr', 'del', '172.24.4.11/24', 'dev', u'qg-4f6a7587-00'] 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Exit code: 2 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Stdin: 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Stdout: 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Stderr: RTNETLINK answers: Cannot assign requested address 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 2015-10-12 14:43:44.755 ERROR neutron.agent.l3.agent [-] Error while deleting router 0ce494ff-593a-4d6d-bf06-248979d6cf7a ** Affects: neutron Importance: Undecided Status: New ** Tags: l3-ha -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1505203 Title: Setting adm
[Yahoo-eng-team] [Bug 1493788] Re: DVR: Restarting the OVS agent does not re-create some of br-tun's flows
*** This bug is a duplicate of bug 1489372 *** https://bugs.launchpad.net/bugs/1489372 @Arthur, you are correct. I've used 'git bisect' and found out that [1] already fixes this issue. I will close this bug as a duplicate. [1]: https://review.openstack.org/#/c/218118/ ** Changed in: neutron Status: New => Fix Released ** This bug has been marked a duplicate of bug 1489372 OVS agent restart breaks connectivity when l2pop is turned on -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1493788 Title: DVR: Restarting the OVS agent does not re-create some of br-tun's flows Status in neutron: Fix Released Bug description: When, on a setup that has a DVR router, an OVS agent restarts, it fails to re-create some of the flows for br-tun. For example: $ # flows before agent restart $ sudo ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=77.325s, table=0, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.292s, table=0, n_packets=0, n_bytes=0, idle_age=190, priority=1,in_port=1 actions=resubmit(,1) cookie=0xa30fd64e48832cbc, duration=77.281s, table=1, n_packets=0, n_bytes=0, idle_age=190, priority=0 actions=resubmit(,2) cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20) cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22) cookie=0x0, duration=77.324s, table=3, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0x0, duration=77.324s, table=4, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.018s, table=4, n_packets=0, n_bytes=0, idle_age=193, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9) cookie=0x0, duration=77.323s, table=6, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.286s, table=9, n_packets=0, n_bytes=0, idle_age=190, priority=0 actions=resubmit(,10) cookie=0xa30fd64e48832cbc, duration=77.259s, table=9, n_packets=0, n_bytes=0, idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1 cookie=0x0, duration=77.323s, table=10, n_packets=0, n_bytes=0, idle_age=77, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa30fd64e48832cbc,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1 cookie=0x0, duration=77.323s, table=20, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=resubmit(,22) cookie=0xa30fd64e48832cbc, duration=77.317s, table=22, n_packets=0, n_bytes=0, idle_age=193, priority=0 actions=drop $ # flows after agent restart $ sudo ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0xbcfcd1c3b35d83e3, duration=3.072s, table=0, n_packets=0, n_bytes=0, idle_age=223, priority=1,in_port=1 actions=resubmit(,1) cookie=0xbcfcd1c3b35d83e3, duration=3.060s, table=1, n_packets=0, n_bytes=0, idle_age=223, priority=0 actions=resubmit(,2) cookie=0xbcfcd1c3b35d83e3, duration=2.997s, table=4, n_packets=0, n_bytes=0, idle_age=226, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9) cookie=0xbcfcd1c3b35d83e3, duration=3.067s, table=9, n_packets=0, n_bytes=0, idle_age=223, priority=0 actions=resubmit(,10) cookie=0xbcfcd1c3b35d83e3, duration=3.038s, table=9, n_packets=0, n_bytes=0, idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1 cookie=0xbcfcd1c3b35d83e3, duration=3.100s, table=22, n_packets=0, n_bytes=0, idle_age=226, priority=0 actions=drop It is clear that quite a few flows are missing. They can be re-created by deleting all the flows on br-int - this starts a chain reaction which ultimately recreates all the flows, including br-tun's. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1493788/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1493788] [NEW] DVR: Restarting the OVS agent does not re-create some of br-tun's flows
Public bug reported: When, on a setup that has a DVR router, an OVS agent restarts, it fails to re-create some of the flows for br-tun. For example: $ # flows before agent restart $ sudo ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=77.325s, table=0, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.292s, table=0, n_packets=0, n_bytes=0, idle_age=190, priority=1,in_port=1 actions=resubmit(,1) cookie=0xa30fd64e48832cbc, duration=77.281s, table=1, n_packets=0, n_bytes=0, idle_age=190, priority=0 actions=resubmit(,2) cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20) cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22) cookie=0x0, duration=77.324s, table=3, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0x0, duration=77.324s, table=4, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.018s, table=4, n_packets=0, n_bytes=0, idle_age=193, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9) cookie=0x0, duration=77.323s, table=6, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.286s, table=9, n_packets=0, n_bytes=0, idle_age=190, priority=0 actions=resubmit(,10) cookie=0xa30fd64e48832cbc, duration=77.259s, table=9, n_packets=0, n_bytes=0, idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1 cookie=0x0, duration=77.323s, table=10, n_packets=0, n_bytes=0, idle_age=77, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa30fd64e48832cbc,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1 cookie=0x0, duration=77.323s, table=20, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=resubmit(,22) cookie=0xa30fd64e48832cbc, duration=77.317s, table=22, n_packets=0, n_bytes=0, idle_age=193, priority=0 actions=drop $ # flows after agent restart $ sudo ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0xbcfcd1c3b35d83e3, duration=3.072s, table=0, n_packets=0, n_bytes=0, idle_age=223, priority=1,in_port=1 actions=resubmit(,1) cookie=0xbcfcd1c3b35d83e3, duration=3.060s, table=1, n_packets=0, n_bytes=0, idle_age=223, priority=0 actions=resubmit(,2) cookie=0xbcfcd1c3b35d83e3, duration=2.997s, table=4, n_packets=0, n_bytes=0, idle_age=226, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9) cookie=0xbcfcd1c3b35d83e3, duration=3.067s, table=9, n_packets=0, n_bytes=0, idle_age=223, priority=0 actions=resubmit(,10) cookie=0xbcfcd1c3b35d83e3, duration=3.038s, table=9, n_packets=0, n_bytes=0, idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1 cookie=0xbcfcd1c3b35d83e3, duration=3.100s, table=22, n_packets=0, n_bytes=0, idle_age=226, priority=0 actions=drop It is clear that quite a few flows are missing. They can be re-created by deleting all the flows on br-int - this starts a chain reaction which ultimately recreates all the flows, including br-tun's. ** Affects: neutron Importance: Undecided Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1493788 Title: DVR: Restarting the OVS agent does not re-create some of br-tun's flows Status in neutron: New Bug description: When, on a setup that has a DVR router, an OVS agent restarts, it fails to re-create some of the flows for br-tun. For example: $ # flows before agent restart $ sudo ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=77.325s, table=0, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.292s, table=0, n_packets=0, n_bytes=0, idle_age=190, priority=1,in_port=1 actions=resubmit(,1) cookie=0xa30fd64e48832cbc, duration=77.281s, table=1, n_packets=0, n_bytes=0, idle_age=190, priority=0 actions=resubmit(,2) cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20) cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22) cookie=0x0, duration=77.324s, table=3, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0x0, duration=77.324s, table=4, n_packets=0, n_bytes=0, idle_age=77, priority=0 actions=drop cookie=0xa30fd64e48832cbc, duration=77.018s, table=4, n_packets=0, n_bytes=0, idle_age=193, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9) cookie=0x0, duration=77.
[Yahoo-eng-team] [Bug 1488996] [NEW] QoS doesn't work when l2pop is enabled
Public bug reported: My ml2 configuration file contains the following: [ml2] extension_drivers = port_security,qos mechanism_drivers = openvswitch,l2population However, when trying to get a list of available rule types, the neutron-server logs this to the log file: WARNING neutron.plugins.ml2.managers [req-19db3de7-1a1a- 42b5-b4c0-b9f146a6bcac admin b44ee578c44a426e81752b4df76c1a89] l2population does not support QoS; no rule types available Seems to me like this should not be the case, as l2pop has nothing to do with QoS. Probably other mechanism drivers also produce the same error. ** Affects: neutron Importance: Undecided Status: New ** Tags: l2-pop qos -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1488996 Title: QoS doesn't work when l2pop is enabled Status in neutron: New Bug description: My ml2 configuration file contains the following: [ml2] extension_drivers = port_security,qos mechanism_drivers = openvswitch,l2population However, when trying to get a list of available rule types, the neutron-server logs this to the log file: WARNING neutron.plugins.ml2.managers [req-19db3de7-1a1a- 42b5-b4c0-b9f146a6bcac admin b44ee578c44a426e81752b4df76c1a89] l2population does not support QoS; no rule types available Seems to me like this should not be the case, as l2pop has nothing to do with QoS. Probably other mechanism drivers also produce the same error. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1488996/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1487053] [NEW] validate_local_ip shouldn't run if no tunneling is enabled
Public bug reported: In case no tunnel_types were specified in the ml2 configuration, the configuration option local_ip is ignored in the code. However, validate_local_ip always check if local_ip is being used by an actual interface, even though it shouldn't if tunnel_types is empty. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1487053 Title: validate_local_ip shouldn't run if no tunneling is enabled Status in neutron: In Progress Bug description: In case no tunnel_types were specified in the ml2 configuration, the configuration option local_ip is ignored in the code. However, validate_local_ip always check if local_ip is being used by an actual interface, even though it shouldn't if tunnel_types is empty. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1487053/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1486627] [NEW] DVR doesn't always schedule SNAT routers
Public bug reported: Creating a new router, attaching it to some tenant network and then adding a gateway for the router doesn't create the snat resources (such as 'snat-%s' namespace and other interfaces). Adding a gateway first (before attaching the router to a tenant network) creates the snat resources correctly. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1486627 Title: DVR doesn't always schedule SNAT routers Status in neutron: Confirmed Bug description: Creating a new router, attaching it to some tenant network and then adding a gateway for the router doesn't create the snat resources (such as 'snat-%s' namespace and other interfaces). Adding a gateway first (before attaching the router to a tenant network) creates the snat resources correctly. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1486627/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1453888] [NEW] Fullstack doesn't clean resources if environment fails to start
Public bug reported: As the title says, in case fullstack_fixtures.EnvironmentFixture fails to start because 'wait_until_env_is_up' didn't return successfully (for example, there was a problem with one of the agents), cleanUp isn't called. This causes all the resources of the fixtures that are used in the environment (processes, configurations, namespaces...) not to be cleaned. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: New ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1453888 Title: Fullstack doesn't clean resources if environment fails to start Status in OpenStack Neutron (virtual network service): New Bug description: As the title says, in case fullstack_fixtures.EnvironmentFixture fails to start because 'wait_until_env_is_up' didn't return successfully (for example, there was a problem with one of the agents), cleanUp isn't called. This causes all the resources of the fixtures that are used in the environment (processes, configurations, namespaces...) not to be cleaned. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1453888/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1446284] [NEW] functional tests fail non-deterministicly because of full-stack
Public bug reported: On startup, the L3 agent looks for namespaces to clean that don't belong to him, in order to minimize system resources (namespaces) in the machine. The fullstack tests run an l3 agent that after deletes some namespaces that he doesn't know. This in turns causes the deletion of namespaces used by the functional tests, causing non-deterministic failures at the gate. The code in question which is responsible for the deletion of namespaces: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/namespace_manager.py#L73 How to replicate: 1. Run 'tox -e dsvm-functional -- neutron.tests.functional.agent.test_l3_agent neutron.tests.fullstack' 2. Some tests are likely to fail 3. ??? 4. Profit? Example of test runs: 1. http://pastebin.com/63n7Y2YK ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1446284 Title: functional tests fail non-deterministicly because of full-stack Status in OpenStack Neutron (virtual network service): New Bug description: On startup, the L3 agent looks for namespaces to clean that don't belong to him, in order to minimize system resources (namespaces) in the machine. The fullstack tests run an l3 agent that after deletes some namespaces that he doesn't know. This in turns causes the deletion of namespaces used by the functional tests, causing non-deterministic failures at the gate. The code in question which is responsible for the deletion of namespaces: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/namespace_manager.py#L73 How to replicate: 1. Run 'tox -e dsvm-functional -- neutron.tests.functional.agent.test_l3_agent neutron.tests.fullstack' 2. Some tests are likely to fail 3. ??? 4. Profit? Example of test runs: 1. http://pastebin.com/63n7Y2YK To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1446284/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1405584] [NEW] misc-sanity-checks.sh doesn't work on OS X
Public bug reported: The patch introduced by https://review.openstack.org/#/c/143539/ changed the sanity script to do all sorts of tests. Among those, it creates a new temporary directory using /bin/mktemp. On OS X, the executable is present in /usr/bin/mktemp. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1405584 Title: misc-sanity-checks.sh doesn't work on OS X Status in OpenStack Neutron (virtual network service): New Bug description: The patch introduced by https://review.openstack.org/#/c/143539/ changed the sanity script to do all sorts of tests. Among those, it creates a new temporary directory using /bin/mktemp. On OS X, the executable is present in /usr/bin/mktemp. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1405584/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1374946] [NEW] HA should have functional tests
Public bug reported: Current HA related code should have functional tests merged to upstream. All patches relevant to HA functional tests should be related to this bug. ** Affects: neutron Importance: Medium Assignee: John Schwarz (jschwarz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1374946 Title: HA should have functional tests Status in OpenStack Neutron (virtual network service): In Progress Bug description: Current HA related code should have functional tests merged to upstream. All patches relevant to HA functional tests should be related to this bug. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1374946/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1374947] [NEW] HA should have integration tests
Public bug reported: Current HA related code should have integration tests merged to upstream. All patches relevant to HA integration tests should be related to this bug, until a proper blueprint is written for Kilo. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1374947 Title: HA should have integration tests Status in OpenStack Neutron (virtual network service): In Progress Bug description: Current HA related code should have integration tests merged to upstream. All patches relevant to HA integration tests should be related to this bug, until a proper blueprint is written for Kilo. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1374947/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1370914] [NEW] When two ovs ports contain same external_ids:face-id field, ovs agent might fail finding correct port.
Public bug reported: As the title says, if there are 2 different ovs ports with the same external_ids:iface-id field (which is the port_id), when at least one of them is managed by the ovs agent, it might fail finding the correct one if they are not connected to the same bridge. Steps to reproduce: 1. Create a router with an internal port to some Neutron network 2. Find the port in 'ovs-vsctl show' 3. Use the following command to find the port_id in ovs: sudo ovs-vsctl --columns=external_ids list Interface 4. Use the following commands to create a new port with the same field in a new bridge: sudo ovs-vsctl add-br a sudo ip link add dummy12312312 type dummy sudo ovs-vsctl add-port br-a dummy12312312 sudo ovs-vsctl set Interface dummy12312312 external_ids:iface-id="" # port_id was obtained in point 3. 5. Restart the ovs agent. At this point the ovs agent's log should show "Port: dummy12312312 is on br-a, not on br-int". Expected result: ovs agent should know to iterate though the options and find the correct port in the correct bridge. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: New ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1370914 Title: When two ovs ports contain same external_ids:face-id field, ovs agent might fail finding correct port. Status in OpenStack Neutron (virtual network service): New Bug description: As the title says, if there are 2 different ovs ports with the same external_ids:iface-id field (which is the port_id), when at least one of them is managed by the ovs agent, it might fail finding the correct one if they are not connected to the same bridge. Steps to reproduce: 1. Create a router with an internal port to some Neutron network 2. Find the port in 'ovs-vsctl show' 3. Use the following command to find the port_id in ovs: sudo ovs-vsctl --columns=external_ids list Interface 4. Use the following commands to create a new port with the same field in a new bridge: sudo ovs-vsctl add-br a sudo ip link add dummy12312312 type dummy sudo ovs-vsctl add-port br-a dummy12312312 sudo ovs-vsctl set Interface dummy12312312 external_ids:iface-id="" # port_id was obtained in point 3. 5. Restart the ovs agent. At this point the ovs agent's log should show "Port: dummy12312312 is on br-a, not on br-int". Expected result: ovs agent should know to iterate though the options and find the correct port in the correct bridge. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1370914/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1358206] Re: ovsdb_monitor.SimpleInterfaceMonitor throws eventlet.timeout.Timeout(5)
** Changed in: neutron Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1358206 Title: ovsdb_monitor.SimpleInterfaceMonitor throws eventlet.timeout.Timeout(5) Status in OpenStack Neutron (virtual network service): Fix Released Bug description: This is found during functional testing, when .start() is called with block=True during sightly high load. This suggest the default timeout needs to be rised to make this module work in all situations. https://review.openstack.org/#/c/112798/14/neutron/agent/linux/ovsdb_monitor.py (I will extract patch from here) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1358206/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1362213] [NEW] haproxy configuration spams logged-in users when no servers are available
Public bug reported: On certain systems which use the default syslog configuration, using haproxy-based LBaaS causes error logs to spam all the logged-in users: Message from syslogd@alpha-controller at Jun 9 01:32:07 ... haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server available! Message from syslogd@alpha-controller at Jun 9 01:32:07 ... haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server available! The error message is valid - it happens when, for example, there are no backend servers available to handle the service requests because all members are down. However, there is no point in sending the messages to all the logged-in users. The wanted result is that each namespace will have its own log file, which will contain all the log messages the relevant haproxy process produces. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1362213 Title: haproxy configuration spams logged-in users when no servers are available Status in OpenStack Neutron (virtual network service): In Progress Bug description: On certain systems which use the default syslog configuration, using haproxy-based LBaaS causes error logs to spam all the logged-in users: Message from syslogd@alpha-controller at Jun 9 01:32:07 ... haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server available! Message from syslogd@alpha-controller at Jun 9 01:32:07 ... haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server available! The error message is valid - it happens when, for example, there are no backend servers available to handle the service requests because all members are down. However, there is no point in sending the messages to all the logged-in users. The wanted result is that each namespace will have its own log file, which will contain all the log messages the relevant haproxy process produces. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1362213/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1361545] [NEW] dhcp agent shouldn't spawn metadata-proxy for non-isolated networks
Public bug reported: The "enable_isolated_metadata = True" options tells DHCP agents that for each network under its care, a neutron-ns-metadata-proxy process should be spawned, regardless if it's isolated or not. This is fine for isolated networks (networks with no routers and no default gateways), but for networks which are connected to a router (for which the L3 agent spawns a separate neutron-ns-metadata-proxy which is attached to the router's namespace), 2 different metadata proxies are spawned. For these networks, the static routes which are pushed to each instance, letting it know where to search for the metadata-proxy, is not pushed and the proxy spawned from the DHCP agent is left unused. The DHCP agent should know if the network it handles is isolated or not, and for non-isolated networks, no neutron-ns-metadata-proxy processes should spawn. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1361545 Title: dhcp agent shouldn't spawn metadata-proxy for non-isolated networks Status in OpenStack Neutron (virtual network service): In Progress Bug description: The "enable_isolated_metadata = True" options tells DHCP agents that for each network under its care, a neutron-ns-metadata-proxy process should be spawned, regardless if it's isolated or not. This is fine for isolated networks (networks with no routers and no default gateways), but for networks which are connected to a router (for which the L3 agent spawns a separate neutron-ns-metadata-proxy which is attached to the router's namespace), 2 different metadata proxies are spawned. For these networks, the static routes which are pushed to each instance, letting it know where to search for the metadata-proxy, is not pushed and the proxy spawned from the DHCP agent is left unused. The DHCP agent should know if the network it handles is isolated or not, and for non-isolated networks, no neutron-ns-metadata-proxy processes should spawn. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1361545/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1350852] [NEW] REST API should allow router filtering by network_id
Public bug reported: There is currently no way to display all routers that are connected to a certain network. This makes it hard for large deployments to figure out which networks are connected to which routers. The proposed change adds this functionality to the REST API, which should also give the end-user the ability to apply this filter using the neutronclient. ** Affects: neutron Importance: Undecided Assignee: John Schwarz (jschwarz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => John Schwarz (jschwarz) ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1350852 Title: REST API should allow router filtering by network_id Status in OpenStack Neutron (virtual network service): In Progress Bug description: There is currently no way to display all routers that are connected to a certain network. This makes it hard for large deployments to figure out which networks are connected to which routers. The proposed change adds this functionality to the REST API, which should also give the end-user the ability to apply this filter using the neutronclient. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1350852/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp