from:"John Schwarz"

[Yahoo-eng-team] [Bug 1654998] [NEW] fullstack fails: creating ha port runs into StaleDataError

2017-01-09 Thread John Schwarz

Public bug reported:

An example exception can be found in
http://paste.openstack.org/show/594276/ .

** Affects: neutron
 Importance: High
 Assignee: John Schwarz (jschwarz)
 Status: In Progress


** Tags: gate-failure l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1654998

Title:
  fullstack fails: creating ha port runs into StaleDataError

Status in neutron:
  In Progress

Bug description:
  An example exception can be found in
  http://paste.openstack.org/show/594276/ .

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1654998/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1654032] Re: HA job ping test unstable

2017-01-05 Thread John Schwarz

** Also affects: neutron
   Importance: Undecided
   Status: New

** Changed in: neutron
   Status: New => Confirmed

** Changed in: neutron
   Importance: Undecided => Critical

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: neutron
Milestone: None => ocata-3

** Tags added: gate-failure l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1654032

Title:
  HA job ping test unstable

Status in neutron:
  Confirmed
Status in tripleo:
  In Progress

Bug description:
  We're seeing a lot of spurious failures in the ping test on HA jobs
  lately.

  Logstash query:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=build_name%3A%20
  *tripleo-
  
ci*%20AND%20build_status%3A%20FAILURE%20AND%20message%3A%20%5C%22From%2010.0.0.1%20icmp_seq%3D1%20Destination%20Host%20Unreachable%5C%22

  Sample failure log: http://logs.openstack.org/76/416576/1/check-
  tripleo/gate-tripleo-ci-centos-7-ovb-
  ha/6db60be/console.html#_2017-01-04_16_40_34_770751

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1654032/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1652071] [NEW] Implement migration from iptables-based security groups to ovsfw

2016-12-22 Thread John Schwarz

Public bug reported:

When switching an ovs-agent from iptables to ovsfw, new instances will
be created using the ovsfw, but old instances will stick with iptables.
In fact, there isn't a way to migrate an instance from iptables to
ovsfw, and one should be provided.

Considerations:
a. It isn't enough to just remove the qvo/qvb/qbr interfaces and then attach 
the tap device directly to the integration bridge - we should also change the 
domain xml of the instance itself, so that when migrating an instance from one 
compute node to the other, nova won't depend on non-existent devices. Should 
this be done in Nova or in Neutron? Should Nova be notified?
b. On Neutron side, we should also change the Port table to indicate a change. 
This might require a new RPC call from the agent side.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1652071

Title:
  Implement migration from iptables-based security groups to ovsfw

Status in neutron:
  New

Bug description:
  When switching an ovs-agent from iptables to ovsfw, new instances will
  be created using the ovsfw, but old instances will stick with
  iptables. In fact, there isn't a way to migrate an instance from
  iptables to ovsfw, and one should be provided.

  Considerations:
  a. It isn't enough to just remove the qvo/qvb/qbr interfaces and then attach 
the tap device directly to the integration bridge - we should also change the 
domain xml of the instance itself, so that when migrating an instance from one 
compute node to the other, nova won't depend on non-existent devices. Should 
this be done in Nova or in Neutron? Should Nova be notified?
  b. On Neutron side, we should also change the Port table to indicate a 
change. This might require a new RPC call from the agent side.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1652071/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1650901] [NEW] dvr gates are broken - no brctl command

2016-12-18 Thread John Schwarz

Public bug reported:

See [1] and [2] - console.html produce this line: "/bin/sh: 1: brctl:
not found" and the job fails early on.

[1]: 
http://logs.openstack.org/99/407099/16/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/b28dcbd/console.html
[2]: 
http://logs.openstack.org/99/407099/16/check/gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial/f3788c0/console.html

** Affects: neutron
 Importance: Critical
 Status: Confirmed


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1650901

Title:
  dvr gates are broken - no brctl command

Status in neutron:
  Confirmed

Bug description:
  See [1] and [2] - console.html produce this line: "/bin/sh: 1: brctl:
  not found" and the job fails early on.

  [1]: 
http://logs.openstack.org/99/407099/16/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/b28dcbd/console.html
  [2]: 
http://logs.openstack.org/99/407099/16/check/gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial/f3788c0/console.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1650901/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1649867] Re: Gate tempest dsvm neutron dvr test fails

2016-12-14 Thread John Schwarz

** Also affects: neutron
   Importance: Undecided
   Status: New

** Tags added: gate-failure l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1649867

Title:
  Gate tempest dsvm neutron dvr test fails

Status in neutron:
  New
Status in tempest:
  New

Bug description:
  The following tests are failing in the neutron gate:

  
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_active_server
 [6.911205s] ... FAILED
  
(tempest.api.compute.servers.test_server_addresses_negative.ServerAddressesNegativeTestJSON)
 [0.00s] ... FAILED
  
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_attached_volume
 [3.348451s] ... FAILED
  
tempest.api.compute.servers.test_create_server.ServersTestManualDisk.test_verify_duplicate_network_nics
 [8.901531s] ... FAILED

  
  I spotted this message in logs [1]: 
  "Connection to the hypervisor is broken on host: 
ubuntu-xenial-osic-cloud1-disk-6162583"

  
  Tracebacks:

  2016-12-14 10:34:36.039551 | Captured traceback:
  2016-12-14 10:34:36.039562 | ~~~
  2016-12-14 10:34:36.039577 | Traceback (most recent call last):
  2016-12-14 10:34:36.039611 |   File 
"tempest/api/compute/servers/test_delete_server.py", line 49, in 
test_delete_active_server
  2016-12-14 10:34:36.039634 | 
waiters.wait_for_server_termination(self.client, server['id'])
  2016-12-14 10:34:36.039658 |   File "tempest/common/waiters.py", line 
111, in wait_for_server_termination
  2016-12-14 10:34:36.039693 | raise 
exceptions.BuildErrorException(server_id=server_id)
  2016-12-14 10:34:36.039728 | tempest.exceptions.BuildErrorException: 
Server e127e6ff-c7bb-43a2-bbe9-c2683ffdf018 failed to build and is in ERROR 
status

  
  2016-12-14 10:34:36.043578 | Captured traceback:
  2016-12-14 10:34:36.043588 | ~~~
  2016-12-14 10:34:36.043602 | Traceback (most recent call last):
  2016-12-14 10:34:36.043619 |   File "tempest/test.py", line 100, in 
wrapper
  2016-12-14 10:34:36.043636 | return f(self, *func_args, **func_kwargs)
  2016-12-14 10:34:36.043668 |   File 
"tempest/api/compute/servers/test_delete_server.py", line 110, in 
test_delete_server_while_in_attached_volume
  2016-12-14 10:34:36.043687 | server = 
self.create_test_server(wait_until='ACTIVE')
  2016-12-14 10:34:36.043709 |   File "tempest/api/compute/base.py", line 
232, in create_test_server
  2016-12-14 10:34:36.043718 | **kwargs)
  2016-12-14 10:34:36.043752 |   File "tempest/common/compute.py", line 
167, in create_test_server
  2016-12-14 10:34:36.043763 | % server['id'])
  2016-12-14 10:34:36.043798 |   File 
"/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
 line 220, in __exit__
  2016-12-14 10:34:36.043810 | self.force_reraise()
  2016-12-14 10:34:36.043846 |   File 
"/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
 line 196, in force_reraise
  2016-12-14 10:34:36.043864 | six.reraise(self.type_, self.value, 
self.tb)
  2016-12-14 10:34:36.043886 |   File "tempest/common/compute.py", line 
149, in create_test_server
  2016-12-14 10:34:36.043905 | clients.servers_client, server['id'], 
wait_until)
  2016-12-14 10:34:36.043927 |   File "tempest/common/waiters.py", line 75, 
in wait_for_server_status
  2016-12-14 10:34:36.043939 | server_id=server_id)
  2016-12-14 10:34:36.044257 | tempest.exceptions.BuildErrorException: 
Server b1472499-6bdc-41fb-98ca-9d1f9ef578ed failed to build and is in ERROR 
status
  2016-12-14 10:34:36.044301 | Details: {u'code': 500, u'created': 
u'2016-12-14T10:05:22Z', u'message': u'No valid host was found. There are not 
enough hosts available.'}

  
  2016-12-14 10:34:36.039827 | Captured traceback:
  2016-12-14 10:34:36.039838 | ~~~
  2016-12-14 10:34:36.039852 | Traceback (most recent call last):
  2016-12-14 10:34:36.039870 |   File "tempest/test.py", line 241, in 
setUpClass
  2016-12-14 10:34:36.039885 | six.reraise(etype, value, trace)
  2016-12-14 10:34:36.039903 |   File "tempest/test.py", line 234, in 
setUpClass
  2016-12-14 10:34:36.039915 | cls.resource_setup()
  2016-12-14 10:34:36.039944 |   File 
"tempest/api/compute/servers/test_server_addresses_negative.py", line 36, in 
resource_setup
  2016-12-14 10:34:36.039972 | cls.server = 
cls.create_test_server(wait_until='ACTIVE')
  2016-12-14 10:34:36.039995 |   File "tempest/api/compute/base.py", line 
232, in create_test_server
  2016-12-14 10:34:36.040005 | **kwargs)
  2016-12-14 10:34:36.040027 |   File "tempest/common/compute.py", line 
167, in create_test_server
  2016-12-14 10:34:36.040038 | % server['id

[Yahoo-eng-team] [Bug 1647432] [NEW] Multiple SIGHUPs to keepalived might trigger re-election

2016-12-05 Thread John Schwarz

Public bug reported:

As the title says, multiple SIGHUPs that are sent to the keepalived
process might cause it to forfeit mastership and re-negotiate a new
master (which might be the original master). This means that when, for
example, associating/disassociating 2 floatingips in quick succession
(each triggers a SIGHUP), the master node will forfeit re-election
(causing it to switch to BACKUP, thus removing all the remaining FIP's
IPs and severing connectivity.

** Affects: neutron
 Importance: High
 Assignee: John Schwarz (jschwarz)
 Status: In Progress


** Tags: l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1647432

Title:
  Multiple SIGHUPs to keepalived might trigger re-election

Status in neutron:
  In Progress

Bug description:
  As the title says, multiple SIGHUPs that are sent to the keepalived
  process might cause it to forfeit mastership and re-negotiate a new
  master (which might be the original master). This means that when, for
  example, associating/disassociating 2 floatingips in quick succession
  (each triggers a SIGHUP), the master node will forfeit re-election
  (causing it to switch to BACKUP, thus removing all the remaining FIP's
  IPs and severing connectivity.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1647432/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1645716] [NEW] Migrating HA routers to Legacy doesn't update interface's device_owner

2016-11-29 Thread John Schwarz

Public bug reported:

Patch I322c392529c04aca2448fd957a35f4908b323449 added a new device_owner
for HA interfaces between a router and an internal subnet, which is used
to differentiate it from normal, non-HA interfaces. However, when
migrating a router from HA to legacy, the device_owner isn't switched
back to its non-HA counterpart. This can cause migration of the router
to DVR to not work properly as the snat interface isn't created.

A log and reproducible can be found in [1].

[1]: http://paste.openstack.org/show/590804/

** Affects: neutron
 Importance: High
 Assignee: John Schwarz (jschwarz)
 Status: Confirmed


** Tags: l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1645716

Title:
  Migrating HA routers to Legacy doesn't update interface's device_owner

Status in neutron:
  Confirmed

Bug description:
  Patch I322c392529c04aca2448fd957a35f4908b323449 added a new
  device_owner for HA interfaces between a router and an internal
  subnet, which is used to differentiate it from normal, non-HA
  interfaces. However, when migrating a router from HA to legacy, the
  device_owner isn't switched back to its non-HA counterpart. This can
  cause migration of the router to DVR to not work properly as the snat
  interface isn't created.

  A log and reproducible can be found in [1].

  [1]: http://paste.openstack.org/show/590804/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1645716/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1562878] Re: L3 HA: Unable to complete operation on subnet

2016-11-07 Thread John Schwarz

I found the bug, and it's in rally. Patch
Ieab53624dc34dc687a0e8eebd84778f7fc95dd77 added a new type of router
interface value for "device_owner", called
"network:ha_router_replicated_interface". However, rally was not made
aware of it so it thinks this interface is a normal port, trying to
delete it with a normal 'neutron port-delete' (and not 'neutron router-
interface-remove').

I'll adjust the bug report and will submit a fix for rally.

** Also affects: rally
   Importance: Undecided
   Status: New

** Changed in: neutron
   Status: Confirmed => Invalid

** Changed in: rally
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: rally
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1562878

Title:
  L3 HA: Unable to complete operation on subnet

Status in neutron:
  Invalid
Status in Rally:
  In Progress

Bug description:
  Environment 3 controllers, 46 computes, liberty. L3 HA During execution 
NeutronNetworks.create_and_delete_routers several times test failed with 
"Unable to complete operation on subnet . One or more ports have an IP 
allocation from this subnet. " trace in neutron-server logs 
http://paste.openstack.org/show/491557/
  Rally report attached.

  Current problem is with HA subnet. The side effect of this problem is
  bug  https://bugs.launchpad.net/neutron/+bug/1562892

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1562878/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1635554] Re: Delete Router / race condition

2016-11-03 Thread John Schwarz

No worries :) Glad we could help.

** Changed in: neutron
   Status: Incomplete => Invalid

** Changed in: neutron
   Importance: High => Undecided

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1635554

Title:
  Delete Router /  race condition

Status in neutron:
  Invalid

Bug description:
  When deleting a router the logfile is filled up.

  
  CentOS7
  Newton(RDO)


  2016-10-21 09:45:02.526 16200 DEBUG neutron.agent.linux.utils [-] Exit code: 
0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:140
  2016-10-21 09:45:02.526 16200 WARNING neutron.agent.l3.namespaces [-] 
Namespace qrouter-8cf5-5c5c-461c-84f3-c8abeca8f79a does not exist. Skipping 
delete
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 8cf5-5c5c-461c-84f3-c8abeca8f79a
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 357, in 
_safe_router_removed
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in 
_router_removed
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ri.delete(self)
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 381, in 
delete
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 
self.destroy_state_change_monitor(self.process_monitor)
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 325, in 
destroy_state_change_monitor
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent pm = 
self._get_state_change_monitor_process_manager()
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 296, in 
_get_state_change_monitor_process_manager
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 
default_cmd_callback=self._get_state_change_monitor_callback())
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 299, in 
_get_state_change_monitor_callback
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ha_device = 
self.get_ha_device_name()
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 137, in 
get_ha_device_name
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent return 
(HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN]
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent TypeError: 
'NoneType' object has no attribute '__getitem__'
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent
  2016-10-21 09:45:02.528 16200 DEBUG neutron.agent.l3.agent [-] Finished a 
router update for 8cf5-5c5c-461c-84f3-c8abeca8f79a _process_router_update 
/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:504

  
  See full log
  http://paste.openstack.org/show/586656/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1635554/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1638273] [NEW] find_child_pids crashes under non-english locals

2016-11-01 Thread John Schwarz

Public bug reported:

Traceback available at [1].

The function execute() returns _("Exit code: %(returncode)d; ...") [2].
Under non-English locales (we checked for Japanese, but surely this will
also occur in others, the check 'Exit code: 1' in str(e)  [3] will fail
since 'Exit code: 1' is not encoded the same.

This ultimately prevents stuff like booting a new VM.

[1]: http://pastebin.com/x66aqctN
[2]: 
https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L127
[3]: 
https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L176

** Affects: neutron
 Importance: Critical
 Assignee: John Schwarz (jschwarz)
 Status: Confirmed


** Tags: mitaka-backport-potential newton-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1638273

Title:
  find_child_pids crashes under non-english locals

Status in neutron:
  Confirmed

Bug description:
  Traceback available at [1].

  The function execute() returns _("Exit code: %(returncode)d; ...")
  [2]. Under non-English locales (we checked for Japanese, but surely
  this will also occur in others, the check 'Exit code: 1' in str(e)
  [3] will fail since 'Exit code: 1' is not encoded the same.

  This ultimately prevents stuff like booting a new VM.

  [1]: http://pastebin.com/x66aqctN
  [2]: 
https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L127
  [3]: 
https://github.com/openstack/neutron/blob/15d65607a47810f7d155d43902d358cb9f953a7a/neutron/agent/linux/utils.py#L176

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1638273/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-14 Thread John Schwarz

Looking at the log involving the server ([1] - the same one you provided
in the first comment and in comment #3), and specifically lines 19 and
21, it's clear that sync_routers() is triggering
auto_schedule_routers(). Before [2] removed in, the call from
sync_routers() to auto_schedule_routers() was done in line 96 of
neutron/api/rpc/handlers/l3_rpc.py, as can be observed from the log:

2016-10-09 17:03:52.366 144166 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 96, 
in sync_routers
2016-10-09 17:03:52.366 144166 ERROR oslo_messaging.rpc.dispatcher 
self.l3plugin.auto_schedule_routers(context, host, router_ids)

In [2], it's evident that the line 96 itself is removed. Thus, this
can't be reproduced in master or in stable/mitaka and there is no
(upstream) bug to fix.

[1]: http://paste.openstack.org/show/585669/
[2]: 
https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
  Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
  Invalid

Bug description:
  ENV: stable/mitaka，VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.

  Exception log:
  [1] http://paste.openstack.org/show/585669/
  [2] http://paste.openstack.org/show/585670/

  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:

  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """

  Log [2] has a very strange behavior that those 3 APIs have a same
  request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.

  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and locally a temporary solution, we add a new config to 
decide whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-14 Thread John Schwarz

Adding a new configuration option is almost never temporary as deleting
config options is rarely backward-compatible.

The race condition, as I understand it, is as following:

1. Create HA router, have worker1 send 'router_updated' to agent1.
2. Delete HA router (done by worker2). worker2 will now detect that there are
no more HA routers and will delete the HA network for the tenant.
3. agent1 issues a 'sync_router', which triggers auto_schedule_routers.
create_ha_port_and_bind will try to create the HA port but there are no more IP
addresses available, causing add_ha_port to fail as specified in the first
paste.

Point #3 is a bit weird to me, as it looks like IPAM is detecting a
"network deleted during function run" as "no more IP addresses". In
addition, this should be caught by [2], forcing a silent retrigger of
this issue.

Aside from the issue that isn't clear to me, I'd like to point out that
the latest stable/mitaka [1] doesn't even trigger auto_schedule_routers
on sync_router (not since [3] - perhaps you're missing this backport?) -
hence the trace received in the first paste can't be reproduced. For
this reason, I'm closing this as Invalid. Liu, feel free to reopen if
you disagree with my assessment :)

[1]:
https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/api/rpc/handlers/l3_rpc.py#L79
[2]:
https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/common/utils.py#L726
[3]:
https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4

(5860fb21e966ab8f1e011654dd477d7af35f7a27 is the latest stable/mitaka
hash that github.com provided.)

** Changed in: neutron
Importance: High => Undecided

** Changed in: neutron
Status: Confirmed => Invalid

** Changed in: neutron
Milestone: ocata-1 => None

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
Invalid

Bug description:
ENV: stable/mitaka，VXLAN
Neutron API: two neutron-servers behind a HA proxy VIP.

Exception log:
[1] http://paste.openstack.org/show/585669/
[2] http://paste.openstack.org/show/585670/

Log [1] shows that the subnet of HA network is concurrently deleted
while a new HA router create API comes. Seems the race conditon
described in this bug is till exists :
https://bugs.launchpad.net/neutron/+bug/1533440, where has description
said:

"""
Some known exceptions:
...
2. IpAddressGenerationFailure: (HA port created failed due to the
concurrently HA subnet deletion)
...
"""

Log [2] has a very strange behavior that those 3 APIs have a same
request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

Test scenario:
Just create one HA router for a tenant, and then quickly delete it.

For now, our mitaka ENV use VxLAN as tenant network type. So there is a very
large range of VNI.
So don't save that, and temporarily solution, we add a new config to decide
whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1633042] [NEW] L3 scheduler: make RouterL3AgentBinding always concurrently safe

2016-10-13 Thread John Schwarz

Public bug reported:

Changeset I3447ea5bcb7c57365c6f50efe12a1671e86588b3 added a
binding_index column to the RouterL3AgentBinding table, which is unique
with the router_id. However, the current logic isn't concurrent safe as
some concurrent cases can raise a DBDuplicateEntry (if the same
binding_index is being used by 2 different workers).

** Affects: neutron
 Importance: Medium
 Assignee: John Schwarz (jschwarz)
 Status: In Progress


** Tags: l3-dvr-backlog l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633042

Title:
  L3 scheduler: make RouterL3AgentBinding always concurrently safe

Status in neutron:
  In Progress

Bug description:
  Changeset I3447ea5bcb7c57365c6f50efe12a1671e86588b3 added a
  binding_index column to the RouterL3AgentBinding table, which is
  unique with the router_id. However, the current logic isn't concurrent
  safe as some concurrent cases can raise a DBDuplicateEntry (if the
  same binding_index is being used by 2 different workers).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633042/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1628886] [NEW] test_reprocess_port_when_ovs_restarts fails nondeterministicly

2016-09-29 Thread John Schwarz

Public bug reported:

Encountered in https://review.openstack.org/#/c/365326/8/, specifically
http://logs.openstack.org/26/365326/8/check/gate-neutron-dsvm-
functional-ubuntu-trusty/cc5f8eb/testr_results.html.gz

Stack trace from tempest (if the logs are deleted from the server):
http://paste.openstack.org/show/583476/

Stack trace from dsvm-functional log dir:
http://paste.openstack.org/show/583478/

** Affects: neutron
 Importance: High
 Status: Confirmed


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1628886

Title:
  test_reprocess_port_when_ovs_restarts fails nondeterministicly

Status in neutron:
  Confirmed

Bug description:
  Encountered in https://review.openstack.org/#/c/365326/8/,
  specifically http://logs.openstack.org/26/365326/8/check/gate-neutron-
  dsvm-functional-ubuntu-trusty/cc5f8eb/testr_results.html.gz

  Stack trace from tempest (if the logs are deleted from the server):
  http://paste.openstack.org/show/583476/

  Stack trace from dsvm-functional log dir:
  http://paste.openstack.org/show/583478/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1628886/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1580648] Re: Two HA routers in master state during functional test

2016-09-25 Thread John Schwarz

This seems like a bug to me. I understand that it stands as a limitation
that keepalived always selects the higher-IP to be master, but then I
would expect the non-higher-IP nodes to revert to backups. If this isn't
the case (as it seems from what Ann and Gustavo write) then this is a
bug.

Reopening.

** Changed in: neutron
   Status: Opinion => Confirmed

** Changed in: neutron
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580648

Title:
  Two HA routers in master state during functional test

Status in neutron:
  Confirmed

Bug description:
  Scheduling ha routers end with two routers in master state.
  Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - 
after preparing new functional test.

  ha_router.py in method - _get_state_change_monitor_callback() is
  starting a neutron-keepalived-state-change process with parameter
  --monitor-interface as ha_device (ha-xxx) and it's IP address.

  That application is monitoring using
  "ip netns exec xxx ip -o monitor address"
  all changes in that namespace. Each addition of that ha-xxx device produces a 
call to neutron-server API that this router becomes "master".
  It's producing false results because that device doesn't tell anything about 
that router is master or not.

  Logs from
  test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection

  Agent2:
  2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor 
address]. start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 
'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:/64, True 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event 
/neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master 
write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87
  2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change 
[-] State: master notify_agent 
/neutron/neutron/agent/l3/keepalived_state_change.py:93

  Agent1:
  2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] 
Launching async process [ip netns exec 
qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. 
start /neutron/neutron/agent/linux/async_process.py:109
  2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running 
command: ['ip', 'netns', 'exec', 
'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 
'address'] create_process /neutron/neutron/agent/linux/utils.py:82
  2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run 
/neutron/neutron/agent/l3/keepalived_state_change.py:59
  2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process 
runs with uid/gid: 1000/1000
  2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False 
parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73
  2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change 
[-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True 
parse_and_hand

[Yahoo-eng-team] [Bug 1621086] Re: Port delete on router interface remove

2016-09-07 Thread John Schwarz

Looks like this is working as planned.

** Changed in: neutron
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1621086

Title:
  Port delete on router interface remove

Status in neutron:
  Opinion

Bug description:
  1. I create port, then router and then use add_router_interface.
  2. Then I use remove_router_interface.
  3. Port is deleted - and this is unexpected (for me, at least).

  I was using Heat on devstack master to test this.

  Template for stack with port:
  resources:
media_port:
  type: OS::Neutron::Port
  properties:
name: media_port
network: private

  Template for stack with router and router interface:
  heat_template_version: newton
  resources:
media_router:
  type: OS::Neutron::Router
media_router_interface:
  type: OS::Neutron::RouterInterface
  properties:
router: { get_resource: media_router }
port: media_port

  When I delete second stack, port from first stack is also deleted in neutron.
  
https://github.com/openstack/python-neutronclient/blob/master/neutronclient/v2_0/client.py#L873-L876
 that is called method and body here will be: { 'port_id': 'SOMEID' }

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1621086/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1605966] Re: L3 HA: VIP doesn't changed if qr interface or qg interface was down

2016-09-07 Thread John Schwarz

Marking this as Incomplete seeing as how the no progress has been made
on the bug report or on the patch.

** Changed in: neutron
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1605966

Title:
  L3 HA: VIP doesn't changed if qr interface or qg interface was down

Status in neutron:
  Invalid

Bug description:
  === Problem Description ==
    Currently, in L3 HA, we track "ha" interface to determine whether a
  VIP address should be failover.

    Unfortunately, if a qr or qg interface was down, VIP address will not
  failover. Because we don't track these interfaces in a router.

  === How to reproduce ===
    Create a HA router and attaching a subnet on it. So that there will be a 
keepalived process to monitor this router.

    Go into the L3 router we created it above. Execute "ip link set qr-
  xxx down". As we don't except, VIP address doesn't failover.

  == How to resolve it ==

    In current keepalived configure file, like this:

  vrrp_instance VR_2 {
  state BACKUP
  interface ha-c00c7b49-d5
  virtual_router_id 2
  priority 50
  garp_master_delay 60
  nopreempt
  advert_int 2
  track_interface {
  ha-c00c7b49-d5
  }
  virtual_ipaddress {
  169.254.0.2/24 dev ha-c00c7b49-d5
  }
  virtual_ipaddress_excluded {
  2.2.2.1/24 dev qr-b312f788-9b
  fe80::f816:3eff:feac:fa12/64 dev qr-b312f788-9b scope link
  }
  }

  Track interfaces only include "ha" interface, so VIP will not changed
  if "qr" or "qg" interface was down.

  To address this, we track both "qr" and "qg" interfaces, like this:

  vrrp_instance VR_2 {
  state BACKUP
  interface ha-c00c7b49-d5
  virtual_router_id 2
  priority 50
  garp_master_delay 60
  nopreempt
  advert_int 2
  track_interface {
  qr-xxx
  qg-xxx
  ha-c00c7b49-d5
  }
  virtual_ipaddress {
  169.254.0.2/24 dev ha-c00c7b49-d5
  }
  virtual_ipaddress_excluded {
  2.2.2.1/24 dev qr-b312f788-9b
  fe80::f816:3eff:feac:fa12/64 dev qr-b312f788-9b scope link
  }
  }

  By doing this, if qr or qg interface was down unfortunately, HA router
  will failover.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1605966/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1605282] Re: Transaction rolled back while creating HA router

2016-09-07 Thread John Schwarz

This should have been mitigated by
https://review.openstack.org/#/c/364278/10/neutron/scheduler/l3_agent_scheduler.py@207
so I'm closing this.

** Changed in: neutron
   Status: In Progress => Fix Released

** Changed in: neutron
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1605282

Title:
  Transaction rolled back while creating HA router

Status in neutron:
  Fix Released

Bug description:
  The stacktrace can be found here:
  http://paste.openstack.org/show/539052/

  This was discovered while running the create_and_delete_router rally
  test with a high (~10) concurrency number.

  I encountered this on stable/mitaka so it's interesting to see if this
  reproduces on master.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1605282/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1619312] [NEW] dvr: can't migrate legacy router to DVR

2016-09-01 Thread John Schwarz

Public bug reported:

As the title say:

2016-09-01 16:38:46.026 ERROR neutron.api.v2.resource 
[req-d738cdb2-01bb-41a7-a2a9-534bf8b06377 admin 
85a2b05da4be46b19bc5f7cf41055e45] update failed: No details.
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource Traceback (most recent 
call last):
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/api/v2/resource.py", line 79, in resource
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource result = 
method(request=request, **args)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/api/v2/base.py", line 575, in update
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return 
self._update(request, id, body, **kwargs)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/site-packages/oslo_db/api.py", line 151, in wrapper
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource ectxt.value = 
e.inner_exc
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource self.force_reraise()
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource 
six.reraise(self.type_, self.value, self.tb)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/site-packages/oslo_db/api.py", line 139, in wrapper
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return f(*args, 
**kwargs)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/api.py", line 82, in wrapped
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource 
traceback.format_exc())
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource self.force_reraise()
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource 
six.reraise(self.type_, self.value, self.tb)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/api.py", line 77, in wrapped
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return f(*args, 
**kwargs)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/api/v2/base.py", line 623, in _update
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource obj = 
obj_updater(request.context, id, **kwargs)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/extraroute_db.py", line 76, in update_router
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource context, id, router)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/l3_db.py", line 1722, in update_router
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource id, router)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/l3_db.py", line 282, in update_router
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource router_db = 
self._update_router_db(context, id, r)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/l3_hamode_db.py", line 533, in 
_update_router_db
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource context, router_id, 
data)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/l3_dvr_db.py", line 143, in _update_router_db
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource context.elevated(), 
router_db):
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/l3_dvr_db.py", line 829, in 
_create_snat_intf_ports_if_not_exists
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource 
intf['fixed_ips'][0]['subnet_id'], do_pop=False)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/db/l3_dvr_db.py", line 782, in 
_add_csnat_router_interface_port
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource {'port': port_data})
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/plugins/common/utils.py", line 197, in 
create_port
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource return 
core_plugin.create_port(context, {'port': port_data})
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource   File 
"/opt/openstack/neutron/neutron/common/utils.py", line 617, in inner
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource "transaction.") % f)
2016-09-01 16:38:46.026 TRACE neutron.api.v2.resource Ru

[Yahoo-eng-team] [Bug 1612192] [NEW] L3 DVR: Unable to complete operation on subnet

2016-08-11 Thread John Schwarz

Public bug reported:

There is a new gate failure that can be found using the following
logstash query:

message:"One or more ports have an IP allocation from this subnet" &&
filename:"console.html" && build_queue:"gate"

This seems to be specific to DVR jobs and is separate from [1] (see
comment #7 on that bug report).

[1]: https://bugs.launchpad.net/neutron/+bug/1562878

** Affects: neutron
 Importance: Critical
 Status: New


** Tags: gate-failure l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1612192

Title:
  L3 DVR: Unable to complete operation on subnet

Status in neutron:
  New

Bug description:
  There is a new gate failure that can be found using the following
  logstash query:

  message:"One or more ports have an IP allocation from this subnet" &&
  filename:"console.html" && build_queue:"gate"

  This seems to be specific to DVR jobs and is separate from [1] (see
  comment #7 on that bug report).

  [1]: https://bugs.launchpad.net/neutron/+bug/1562878

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1612192/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1610645] [NEW] Migrating last HA router to legacy doesn't delete HA network

2016-08-07 Thread John Schwarz

Public bug reported:

As the title suggests, migrating a tenant's the last HA router from HA
to legacy, doesn't cleanup the HA network.

[stack@js16 ~]$ neutron router-create x --ha=True
Created a new router:
+-+--+
| Field   | Value|
+-+--+
| admin_state_up  | True |
| availability_zone_hints |  |
| availability_zones  |  |
| description |  |
| distributed | False|
| external_gateway_info   |  |
| flavor_id   |  |
| ha  | True |
| id  | 2bafaae3-776b-4707-958b-f1df77d832fb |
| name| x|
| revision| 2|
| routes  |  |
| status  | ACTIVE   |
| tenant_id   | 20482218062b458589b9cffa3a1bb172 |
+-+--+
[stack@js16 ~]$ neutron router-update x  --admin_state_up=False
Updated router: x
[stack@js16 ~]$ neutron router-update x  --ha=False
Updated router: x
[stack@js16 ~]$ neutron router-delete x
Deleted router: x
[stack@js16 ~]$ neutron net-list
+--++--+
| id   | name   
| subnets  |
+--++--+
| 088ffed2-27a0-422e-b92c-c388e825cf8f | HA network tenant 
20482218062b458589b9cffa3a1bb172 | 92ee2c83-8fdb-4767-90b3-bfb69fca452f 
169.254.192.0/18|
| e0e366ee-8a94-4753-8b9a-474bf692fb99 | public 
| e0933b88-7baf-47a9-84d8-7c98e140f747 172.24.4.0/24   |
|  |
| 400fa2e9-f373-4c7b-954c-f419d6dfba7b 2001:db8::/64   |
| fa13ed8e-dd44-4499-a3e8-531a25f26256 | private
| 26f2f679-c255-4c36-93ac-7f6ec3e98ffe fd22:5205:4fcc::/64 |
|  |
| ed866867-1611-4919-bb3a-1ff0b4f1d36a 10.0.0.0/24 |
+--++--+

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: New


** Tags: l3-ha

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1610645

Title:
  Migrating last HA router to legacy doesn't delete HA network

Status in neutron:
  New

Bug description:
  As the title suggests, migrating a tenant's the last HA router from HA
  to legacy, doesn't cleanup the HA network.

  [stack@js16 ~]$ neutron router-create x --ha=True
  Created a new router:
  +-+--+
  | Field   | Value|
  +-+--+
  | admin_state_up  | True |
  | availability_zone_hints |  |
  | availability_zones  |  |
  | description |  |
  | distributed | False|
  | external_gateway_info   |  |
  | flavor_id   |  |
  | ha  | True |
  | id  | 2bafaae3-776b-4707-958b-f1df77d832fb |
  | name| x|
  | revision| 2|
  | routes  |  |
  | status  | ACTIVE   |
  | tenant_id   | 20482218062b458589b9cffa3a1bb172 |
  +-+--+
  [s

[Yahoo-eng-team] [Bug 1609738] [NEW] l3-ha: a router can be stuck in the ALLOCATING state

2016-08-04 Thread John Schwarz

Public bug reported:

The scenario is a simple one: during the creation of a router, the
server that deals with the request crashes after creating the router
with the ALLOCATING state [1] but before it's changed to ACTIVE [2]. In
this case, the router will be "stuck" in the ALLOCATING and the only
admin action to change the router back to ACTIVE (and allow it to be
scheduled to agents) is:

1. set admin-state-up to False
2. set ha to False
3. set ha to True
4. set admin-state-up to True

That is, a full migration of the HA router to legacy and back to HA is
required. This will trigger the code in [3] and will fix this issue.

The proposed solution is to add a new state, such that if admin-state-up
is changed to False then the router's status will be changed to "DOWN"
(as opposed to the current "ACTIVE", which doesn't make much sense since
admin-state-up is False).

[1]:
https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L469
[2]:
https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L485
[3]:
https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L570

** Affects: neutron
Importance: Undecided
Assignee: John Schwarz (jschwarz)
Status: New

** Tags: l3-ha

** Changed in: neutron
Assignee: (unassigned) => John Schwarz (jschwarz)

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1609738

Title:
l3-ha: a router can be stuck in the ALLOCATING state

Status in neutron:
New

Bug description:
The scenario is a simple one: during the creation of a router, the
server that deals with the request crashes after creating the router
with the ALLOCATING state [1] but before it's changed to ACTIVE [2].
In this case, the router will be "stuck" in the ALLOCATING and the
only admin action to change the router back to ACTIVE (and allow it to
be scheduled to agents) is:

1. set admin-state-up to False
2. set ha to False
3. set ha to True
4. set admin-state-up to True

That is, a full migration of the HA router to legacy and back to HA is
required. This will trigger the code in [3] and will fix this issue.

The proposed solution is to add a new state, such that if admin-state-
up is changed to False then the router's status will be changed to
"DOWN" (as opposed to the current "ACTIVE", which doesn't make much
sense since admin-state-up is False).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1609738/+subscriptions

--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1609665] [NEW] Updating a router to HA without enough agents results in partial update

2016-08-04 Thread John Schwarz

Public bug reported:

As the title says, updating a non-HA router to be HA while there is less
than the minimum available l3 agents to handle this router results in an
invalid state caused by a partial update.

[stack@js16 ~]$ neutron router-create --ha=False x
neutron router-update x --admin-0sCreated a new router:
+-+--+
| Field   | Value|
+-+--+
| admin_state_up  | True |
| availability_zone_hints |  |
| availability_zones  |  |
| description |  |
| distributed | False|
| external_gateway_info   |  |
| ha  | False|
| id  | 488a0eab-bf7a-4aea-84a4-4146a79eb225 |
| name| x|
| routes  |  |
| status  | ACTIVE   |
| tenant_id   | 20482218062b458589b9cffa3a1bb172 |
+-+--+
[stack@js16 ~]$ neutron router-update x --admin-state-up=False
Updated router: x
[stack@js16 ~]$ neutron router-update x --ha=True
Not enough l3 agents available to ensure HA. Minimum required 2, available 1.
Neutron server returns request_ids: ['req-4c5400c5-465e-419b-aeda-e637a76c29a1']
[stack@js16 ~]$ neutron router-show x
+-+--+
| Field   | Value|
+-+--+
| admin_state_up  | False|
| availability_zone_hints |  |
| availability_zones  |  |
| description |  |
| distributed | False|
| external_gateway_info   |  |
| ha  | True |
| id  | 488a0eab-bf7a-4aea-84a4-4146a79eb225 |
| name| x|
| routes  |  |
| status  | ALLOCATING   |
| tenant_id   | 20482218062b458589b9cffa3a1bb172 |
+-+--+
[stack@js16 ~]$ neutron l3-agent-list-hosting-router x 

[stack@js16 ~]$

The router is set to HA and the status is stuck in ALLOCATING even
though it wasn't scheduled to any agent.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: New


** Tags: l3-ha

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1609665

Title:
  Updating a router to HA without enough agents results in partial
  update

Status in neutron:
  New

Bug description:
  As the title says, updating a non-HA router to be HA while there is
  less than the minimum available l3 agents to handle this router
  results in an invalid state caused by a partial update.

  [stack@js16 ~]$ neutron router-create --ha=False x
  neutron router-update x --admin-0sCreated a new router:
  +-+--+
  | Field   | Value|
  +-+--+
  | admin_state_up  | True |
  | availability_zone_hints |  |
  | availability_zones  |  |
  | description |  |
  | distributed | False|
  | external_gateway_info   |  |
  | ha  | False|
  | id  | 488a0eab-bf7a-4aea-84a4-4146a79eb225 |
  | name| x|
  | routes  |  |
  | status  | ACTIVE   |
  | tenant_id   | 20482218062b458589b9cffa3a1bb172 |
  +-+--+
  [stack@js16 ~]$ neutron router-update x --admin-state-up=False
  Updated router: x
  [stack@js16

[Yahoo-eng-team] [Bug 1531254] Re: Support migrating of legacy routers to HA and back

2016-08-02 Thread John Schwarz

** Changed in: neutron
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1531254

Title:
  Support migrating of legacy routers to HA and back

Status in neutron:
  Fix Released

Bug description:
  https://review.openstack.org/260528
  Dear bug triager. This bug was created since a commit was marked with 
DOCIMPACT.
  Your project "openstack/neutron" is set up so that we directly report the 
documentation bugs against it. If this needs changing, the docimpact-group 
option needs to be added for the project. You can ask the OpenStack infra team 
(#openstack-infra on freenode) for help if you need to.

  commit 42f4332a2b6c7aaeadc9c1bdc87f6d4bf4b662d7
  Author: John Schwarz 
  Date:   Mon Oct 12 16:54:17 2015 +0300

  Support migrating of legacy routers to HA and back
  
  This patch adds support for migration of legacy routers to HA and
  vice-versa. This patch also:
  
  1. Reverts I4171ab481e3943e0110bd9a300d965bbebe44871, which was used to
 disable such migrations until support was inserted to the codebase.
  2. Adds an exception to indicate that such migrations are only available
 on routers that have their admin_state_up set to False.
  
  (cherry picked from commit 416c76bc6e01ef433506e4aa4ebd7c76b57acc51)
  
  Closes-Bug: #1365426
  DocImpact (Handled in patch 233695)
  Change-Id: Ie92f8033f47e1bf9ba6310373b3bfc9833317580
  Conflicts:
neutron/db/l3_hamode_db.py
neutron/tests/unit/db/test_l3_hamode_db.py

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1531254/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1605282] Re: Transaction rolled back while creating HA router

2016-07-28 Thread John Schwarz

** Changed in: neutron
   Status: New => Opinion

** Changed in: neutron
   Status: Opinion => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1605282

Title:
  Transaction rolled back while creating HA router

Status in neutron:
  Confirmed

Bug description:
  The stacktrace can be found here:
  http://paste.openstack.org/show/539052/

  This was discovered while running the create_and_delete_router rally
  test with a high (~10) concurrency number.

  I encountered this on stable/mitaka so it's interesting to see if this
  reproduces on master.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1605282/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1606801] Re: deleting router run into race condition

2016-07-28 Thread John Schwarz

*** This bug is a duplicate of bug 1533457 ***
https://bugs.launchpad.net/bugs/1533457

** This bug is no longer a duplicate of bug 1605546
   Race with deleting HA routers
** This bug has been marked a duplicate of bug 1533457
   Neutron server unable to sync HA info after race between HA router creating 
and deleting

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1606801

Title:
  deleting router run into race condition

Status in neutron:
  New

Bug description:
  After deleting a router the logfiles of both network nodes are filled up with 
" RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network 
namespace "qrouter-3767"
  After i have restarted the openstack services on the network nodes, no new 
entries

  Reproduceable: yes

  Steps:
  * add router via CLI or dashboard
  * delete router via CLI or dashboard
  * logfiles grow up 

  Openstack version: mitaka ( this error occured on liberty too ! )

  OS: Centos 7, latest updates

  Installed Packages on nerwork nodes
  openstack-neutron-vpnaas-8.0.0-1.el7.noarch
  openstack-neutron-common-8.1.2-1.el7.noarch
  openstack-neutron-metering-agent-8.1.2-1.el7.noarch
  python-neutronclient-4.1.1-2.el7.noarch
  python-neutron-8.1.2-1.el7.noarch
  python-neutron-fwaas-8.0.0-3.el7.noarch
  openstack-neutron-ml2-8.1.2-1.el7.noarch
  openstack-neutron-bgp-dragent-8.1.2-1.el7.noarch
  python-neutron-vpnaas-8.0.0-1.el7.noarch
  openstack-neutron-openvswitch-8.1.2-1.el7.noarch
  openstack-neutron-8.1.2-1.el7.noarch
  python-neutron-lib-0.0.2-1.el7.noarch
  openstack-neutron-fwaas-8.0.0-3.el7.noarch

  
  Logfile network node:
  2.770 44778 DEBUG neutron.agent.linux.ra [-] radvd disabled for router 
37678766-597a-4e33-b83a-65142ca2ced8 disable 
/usr/lib/python2.7/site-packages/neutron/agent/linux/ra.py:190
  2016-07-27 09:10:02.770 44778 DEBUG neutron.agent.linux.utils [-] Running 
command (rootwrap daemon): ['ip', 'netns', 'exec', 
'qrouter-37678766-597a-4e33-b83a-65142ca2ced8', 'find', '/sys/class/net', 
'-maxdepth', '1', '-type', 'l', '-printf', '%f '] execute_rootwrap_daemon 
/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.linux.utils [-] Exit code: 
1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace 
"qrouter-37678766-597a-4e33-b83a-65142ca2ced8": No such file or directory
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 37678766-597a-4e33-b83a-65142ca2ced8
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 359, in 
_safe_router_removed
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 377, in 
_router_removed
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent ri.delete(self)
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 380, in 
delete
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent 
super(HaRouter, self).delete(agent)
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 349, 
in delete
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent 
self.router_namespace.delete()
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/namespaces.py", line 100, in 
delete
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent for d in 
ns_ip.get_devices(exclude_loopback=True):
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 130, in 
get_devices
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent 
log_fail_as_error=self.log_fail_as_error
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in 
execute
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent raise 
RuntimeError(msg)
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent RuntimeError: Exit 
code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace 
"qrouter-37678766-597a-4e33-b83a-65142ca2ced8": No such file or directory
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent
  2016-07-27 09:10:02.773 44778 ERROR neutron.agent.l3.agent


  Attached logfiles of control node and both network nodes.
  At 09:09:00 ->  added router
  At 09:10:00 -> deleted router

To manage notification

[Yahoo-eng-team] [Bug 1499647] Re: test_ha_router fails intermittently

2016-07-27 Thread John Schwarz

As per comment #39, this can be closed - this bug report is mostly a
tracker bug and I'm under most of the races that made test_ha_router
fail are resolved.

Some other races are https://bugs.launchpad.net/neutron/+bug/1605285 and
https://bugs.launchpad.net/neutron/+bug/1605282, but these can be
addressed separately.

** Changed in: neutron
   Status: In Progress => Fix Released

** Changed in: neutron/kilo
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1499647

Title:
  test_ha_router fails intermittently

Status in neutron:
  Fix Released
Status in neutron kilo series:
  Fix Released

Bug description:
  I have tested work of L3 HA on environment with 3 controllers and 1
  compute (Kilo) keepalived v1.2.13 I create 50 nets with 50 subnets and
  50 routers with interface is set for each subnet(Note: I've seem the
  same errors with just one router and net). I've got the following
  errors:

  root@node-6:~# neutron l3-agent-list-hosting-router router-1
  Request Failed: internal server error while processing your request.
   
  In neutron-server error log:  http://paste.openstack.org/show/473760/

  When I fixed _get_agents_dict_for_router to skip None for further
  testing, so then I was able to see:

  root@node-6:~# neutron l3-agent-list-hosting-router router-1
  
+--+---++---+--+
  | id   | host  | admin_state_up | 
alive | ha_state |
  
+--+---++---+--+
  | f3baba98-ef5d-41f8-8c74-a91b7016ba62 | node-6.domain.tld | True   | 
:-)   | active   |
  | c9159f09-34d4-404f-b46c-a8c18df677f3 | node-7.domain.tld | True   | 
:-)   | standby  |
  | b458ab49-c294-4bdb-91bf-ae375d87ff20 | node-8.domain.tld | True   | 
:-)   | standby  |
  | f3baba98-ef5d-41f8-8c74-a91b7016ba62 | node-6.domain.tld | True   | 
:-)   | active   |
  
+--+---++---+--+

  root@node-6:~# neutron port-list 
--device_id=fcf150c0-f690-4265-974d-8db370e345c4
  
+--+-+---++
  | id   | name 
   | mac_address   | fixed_ips  
|
  
+--+-+---++
  | 0834f8a2-f109-4060-9312-edebac84aba5 |  
   | fa:16:3e:73:9f:33 | {"subnet_id": 
"0c7a2cfa-1cfd-4ecc-a196-ab9e97139352", "ip_address": "172.18.161.223"}  |
  | 2b5a7a15-98a2-4ff1-9128-67d098fa3439 | HA port tenant 
aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:b8:f6:35 | {"subnet_id": 
"1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.192.149"} |
  | 48c887c1-acc3-4804-a993-b99060fa2c75 | HA port tenant 
aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:e7:70:13 | {"subnet_id": 
"1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.192.151"} |
  | 82ab62d6-7dd1-4294-a0dc-f5ebfbcbb4ca |  
   | fa:16:3e:c6:fc:74 | {"subnet_id": 
"c4cc21c9-3b3a-407c-b4a7-b22f783377e7", "ip_address": "10.0.40.1"}   |
  | bbca8575-51f1-4b42-b074-96e15aeda420 | HA port tenant 
aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:84:4c:fc | {"subnet_id": 
"1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.192.150"} |
  | bee5c6d4-7e0a-4510-bb19-2ef9d60b9faf | HA port tenant 
aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:09:a1:ae | {"subnet_id": 
"1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.193.11"}  |
  | f8945a1d-b359-4c36-a8f8-e78c1ba992f0 | HA port tenant 
aef8d13bad9d42df9f25d8ee54c80ad6 | fa:16:3e:c4:54:b5 | {"subnet_id": 
"1915ccb8-9d0f-4f1a-9811-9a196d1e495e", "ip_address": "169.254.193.12"}  |
  
+--+-+---++
  mysql root@192.168.0.2:neutron> SELECT * FROM ha_router_agent_port_bindings 
WHERE router_id='fcf150c0-f690-4265-974d-8db370e345c4';
  
+--+--+--+-+
  | port_id  | router_id
| l3_agent_id  | state   |
  
|---

[Yahoo-eng-team] [Bug 1523780] Re: Race between HA router create and HA router delete

2016-07-27 Thread John Schwarz

I've gone through all 5 of the initial reported problems. There are all
either fixed or referenced by other bugs:

1. DBReferenceError: referenced by
https://bugs.launchpad.net/neutron/+bug/1533460 and fixed by
https://review.openstack.org/#/c/260303/

2. AttributeError: referenced by
https://bugs.launchpad.net/neutron/+bug/1605546

3. DBError: referenced by
https://bugs.launchpad.net/neutron/+bug/1533443

4. port["id"]: referenced by
https://bugs.launchpad.net/neutron/+bug/1533457

5. concurrency error: fixed by https://review.openstack.org/#/c/254586/

Therefore, this bug can be closed.


** Changed in: neutron
   Status: In Progress => Invalid

** Changed in: neutron/kilo
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1523780

Title:
  Race between HA router create and HA router delete

Status in neutron:
  Invalid
Status in neutron kilo series:
  Invalid

Bug description:
  Set more than one API worker and RPC worker,  and then run rally scenario 
test  create_and_delete_routers:
  you may get such errors:

  1.DBReferenceError: (IntegrityError) (1452, 'Cannot add or update a
  child row: a foreign key constraint fails
  (`neutron`.`ha_router_agent_port_bindings`, CONSTRAINT
  `ha_router_agent_port_bindings_ibfk_2` FOREIGN KEY (`router_id`)
  REFERENCES `routers` (`id`) ON DELETE CASCADE)') 'INSERT INTO
  ha_router_agent_port_bindings (port_id, router_id, l3_agent_id, state)
  VALUES (%s, %s, %s, %s)' ('xxx', 'xxx', None,
  'standby')

  (InvalidRequestError: This Session's transaction has been rolled back
  by a nested rollback() call.  To begin a new transaction, issue
  Session.rollback() first.)

  2. AttributeError: 'NoneType' object has no attribute 'config' (l3
  agent process router in router_delete function)

  3. DBError: UPDATE statement on table 'ports' expected to update 1
  row(s); 0 were matched.

  4. res = {"id": port["id"],
     TypeError: 'NoneType' object is unsubscriptable

  5. delete HA network during deleting the last router, get error
  message: "Unable to complete operation on network . There
  are one or more ports still in use on the network."

  There are a bunch of sub-bugs related to this one, basically different
  incarnations of race conditions in the interactions between the
  l3-agent and the neutron-server:

     https://bugs.launchpad.net/neutron/+bug/1499647
     https://bugs.launchpad.net/neutron/+bug/1533441
     https://bugs.launchpad.net/neutron/+bug/1533443
     https://bugs.launchpad.net/neutron/+bug/1533457
     https://bugs.launchpad.net/neutron/+bug/1533440
     https://bugs.launchpad.net/neutron/+bug/1533454
     https://bugs.launchpad.net/neutron/+bug/1533455
 https://bugs.launchpad.net/neutron/+bug/1533460

  (I suggest we use this main bug as a tracker for the whole thing,
   as reviews already reference this bug as related).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1523780/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1533441] Re: HA router can not be deleted in L3 agent after race between HA router creating and deleting

2016-07-27 Thread John Schwarz

I've gone through the 2 errors initially reported:

1. Concurrency issues with HA ports: fixed by
https://review.openstack.org/#/c/257059/ (introduction of the ALLOCATING
status for routers)

2. AttributeError: already referenced by
https://bugs.launchpad.net/neutron/+bug/1605546

So this bug can be closed.

** Changed in: neutron
   Status: In Progress => Invalid

** Changed in: neutron/kilo
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1533441

Title:
  HA router can not be deleted in L3 agent after race between HA router
  creating and deleting

Status in neutron:
  Invalid
Status in neutron kilo series:
  Invalid

Bug description:
  HA router can not be deleted in L3 agent after race between HA router
  creating and deleting

  Exception:
  1. Unable to process HA router %s without HA port (HA router initialize)

  2. AttributeError: 'NoneType' object has no attribute 'config' (HA
  router deleting procedure)

  
  With the newest neutron code, I find a infinite loop in _safe_router_removed.
  Consider a HA router without HA port was placed in the l3 agent,
  usually because of the race condition.

  Infinite loop steps:
  1. a HA router deleting RPC comes
  2. l3 agent remove it
  3. the RouterInfo will delete its the router 
namespace(self.router_namespace.delete())
  4. the HaRouter, ha_router.delete(), where the AttributeError: 'NoneType' or 
some error will be raised.
  5. _safe_router_removed return False
  6. self._resync_router(update)
  7. the router namespace is not existed, RuntimeError raised, go to 5, 
infinite loop 5 - 7

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1533441/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1533440] Re: Race between deleting last HA router and a new HA router API call

2016-07-27 Thread John Schwarz

3 of the 4 original issues in the first post are now fixed, and the one
that isn't is addressed by a separate bug report:

1. NetworkNotFound: fixed by the introduction of
_create_ha_interfaces_and_ensure_network

2. IpAddressGenerationFailure:
https://bugs.launchpad.net/neutron/+bug/1562887

3. DBReferenceError: Opened a separate bug,
https://bugs.launchpad.net/neutron/+bug/1533460, and fixed by
https://review.openstack.org/#/c/260303/

4. HA Network Attribute Error: fixed by the introduction of
_create_ha_interfaces_and_ensure_network

I think this bug can be closed.


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1533440

Title:
  Race between deleting last HA router and a new HA router API call

Status in neutron:
  Fix Released

Bug description:
  During the delete of tenant last HA router, neutron will also delete
  the HA network which can be racy if a new HA router API call is coming
  concurrently.

  Some known exceptions:
  1. NetworkNotFound: (HA network not found when create HA router HA port)

  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)

  3. DBReferenceError(IntegrityError): (HA network was deleted by
     concurrently operation, e.g. deleting the last HA router)

  4. HA Network Attribute Error
 http://paste.openstack.org/show/490140/

  Consider using the Rally to do the following steps to reproduce the race 
exceptions:
  1. Create 200+ tenant, each one has 2 or more user
  2. Create ONLY 1 router for each tenant
  3. Concurently do the following:
    (1) one user try to delete the LAST HA router
    (2) other user try to create some HA router

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1533440/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1606827] [NEW] Agents might be reported as down for 10 minutes after all controllers restart

2016-07-27 Thread John Schwarz

Public bug reported:

The scenario which initially revealed this issue involved multiple
controllers and an extra compute node (total of 4) but it should also
reproduce on deployments smaller than described.

The issue is that if an agent tries to report_state to the neutron-
server and it fails because of a timeout (raising
oslo_messaging.MessagingTimeout), then there is an exponential back-off
effect which was put in place by [1]. The feature was intended for heavy
RPC calls (like get_routers()) and not for light calls such as
report_state, so this can be considered a regression. This can be
reproduced by restarting the controllers on a triple-O deployment and
specified before.

A solution would be to ensure PluginReportStateAPI doesn't use the
exponential backoff, instead seeking to always time out after
rpc_response_timeout.

[1]: https://review.openstack.org/#/c/280595/14/neutron/common/rpc.py

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress


** Tags: liberty-backport-potential mitaka-backport-potential

** Description changed:

  The scenario which initially revealed this issue involved multiple
  controllers and an extra compute node (total of 4) but it should also
  reproduce on deployments smaller than described.
  
  The issue is that if an agent tries to report_state to the neutron-
  server and it fails because of a timeout (raising
  oslo_messaging.MessagingTimeout), then there is an exponential back-off
  effect which was put in place by [1]. The feature was intended for heavy
  RPC calls (like get_routers()) and not for light calls such as
- report_state, so this can be considered a regression.
+ report_state, so this can be considered a regression. This can be
+ reproduced by restarting the controllers on a triple-O deployment and
+ specified before.
  
  A solution would be to ensure PluginReportStateAPI doesn't use the
  exponential backoff, instead seeking to always time out after
  rpc_response_timeout.
  
  [1]: https://review.openstack.org/#/c/280595/14/neutron/common/rpc.py

** Tags added: mitaka-backport-potential

** Tags added: liberty-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1606827

Title:
  Agents might be reported as down for 10 minutes after all controllers
  restart

Status in neutron:
  In Progress

Bug description:
  The scenario which initially revealed this issue involved multiple
  controllers and an extra compute node (total of 4) but it should also
  reproduce on deployments smaller than described.

  The issue is that if an agent tries to report_state to the neutron-
  server and it fails because of a timeout (raising
  oslo_messaging.MessagingTimeout), then there is an exponential back-
  off effect which was put in place by [1]. The feature was intended for
  heavy RPC calls (like get_routers()) and not for light calls such as
  report_state, so this can be considered a regression. This can be
  reproduced by restarting the controllers on a triple-O deployment and
  specified before.

  A solution would be to ensure PluginReportStateAPI doesn't use the
  exponential backoff, instead seeking to always time out after
  rpc_response_timeout.

  [1]: https://review.openstack.org/#/c/280595/14/neutron/common/rpc.py

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1606827/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1605285] [NEW] StaleDataError on ha_router_agent_port_bindings update

2016-07-21 Thread John Schwarz

Public bug reported:

Stacktrace: http://paste.openstack.org/show/539055/

There are a number of currently opened bugs that might deal with this,
but they are clouded with information that might not be relevant. I will
wade through them in the upcoming days to see if I can find something
similar to the stack (though at first glance I didn't).

Also, this happened on stable/mitaka. It's interesting to see if this
also happens on master. it reproduced while running rally's
create_and_delete_routers with high (=10) concurrency.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1605285

Title:
  StaleDataError on ha_router_agent_port_bindings update

Status in neutron:
  New

Bug description:
  Stacktrace: http://paste.openstack.org/show/539055/

  There are a number of currently opened bugs that might deal with this,
  but they are clouded with information that might not be relevant. I
  will wade through them in the upcoming days to see if I can find
  something similar to the stack (though at first glance I didn't).

  Also, this happened on stable/mitaka. It's interesting to see if this
  also happens on master. it reproduced while running rally's
  create_and_delete_routers with high (=10) concurrency.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1605285/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1605282] [NEW] Transaction rolled back while creating HA router

2016-07-21 Thread John Schwarz

Public bug reported:

The stacktrace can be found here:
http://paste.openstack.org/show/539052/

This was discovered while running the create_and_delete_router rally
test with a high (~10) concurrency number.

I encountered this on stable/mitaka so it's interesting to see if this
reproduces on master.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1605282

Title:
  Transaction rolled back while creating HA router

Status in neutron:
  New

Bug description:
  The stacktrace can be found here:
  http://paste.openstack.org/show/539052/

  This was discovered while running the create_and_delete_router rally
  test with a high (~10) concurrency number.

  I encountered this on stable/mitaka so it's interesting to see if this
  reproduces on master.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1605282/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1560945] [NEW] Unable to create DVR+HA routers

2016-03-23 Thread John Schwarz

Public bug reported:

When creating a new DVR+HA, the router is created (the API returns
successfully) but the l3 agent enters an endless loop:

2016-03-23 13:57:37.340 ERROR neutron.agent.l3.agent [-] Failed to process 
compatible router 'a04b3fd7-d46c-4520-82af-18d16835469d'
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent Traceback (most recent 
call last):
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/agent.py", line 497, in 
_process_router_update
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 
self._process_router_if_compatible(router)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/agent.py", line 436, in 
_process_router_if_compatible
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 
self._process_updated_router(router)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/agent.py", line 450, in 
_process_updated_router
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent ri.process(self)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/dvr_edge_ha_router.py", line 92, in 
process
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(DvrEdgeHaRouter, 
self).process(agent)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/dvr_local_router.py", line 486, in 
process
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(DvrLocalRouter, 
self).process(agent)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/dvr_router_base.py", line 30, in 
process
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(DvrRouterBase, 
self).process(agent)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/ha_router.py", line 386, in process
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent super(HaRouter, 
self).process(agent)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/common/utils.py", line 377, in call
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self.logger(e)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent self.force_reraise()
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 
six.reraise(self.type_, self.value, self.tb)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/common/utils.py", line 374, in call
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent return func(*args, 
**kwargs)
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 963, in process
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 
self.process_address_scope()
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/dvr_edge_router.py", line 235, in 
process_address_scope
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent with 
snat_iptables_manager.defer_apply():
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent AttributeError: 'NoneType' 
object has no attribute 'defer_apply'
2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 

This happens in upstream master.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: l3-bgp l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1560945

Title:
  Unable to create DVR+HA routers

Status in neutron:
  New

Bug description:
  When creating a new DVR+HA, the router is created (the API returns
  successfully) but the l3 agent enters an endless loop:

  2016-03-23 13:57:37.340 ERROR neutron.agent.l3.agent [-] Failed to process 
compatible router 'a04b3fd7-d46c-4520-82af-18d16835469d'
  2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent Traceback (most recent 
call last):
  2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/agent.py", line 497, in 
_process_router_update
  2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 
self._process_router_if_compatible(router)
  2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/agent.py", line 436, in 
_process_router_if_compatible
  2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent 
self._process_updated_router(router)
  2016-03-23 13:57:37.340 TRACE neutron.agent.l3.agent   File 
"/opt/openstack/neutron/neutron/agent/l3/agent.py", line 450, in 
_pro

[Yahoo-eng-team] [Bug 1552680] [NEW] [RFE] Add support for DLM

2016-03-03 Thread John Schwarz

Public bug reported:

Neutron has many code paths that can collide and be raceful which each
other. Current ongoing work can mitigate and minimize these races but
work is slow and it's very hard to fight against what you don't know
(ie. there can always be more races you're not aware of). A DLM
(Distributed Lock Mechanism) such as tooz [1] can help mitigate this
greatly.

An excellent example of this racefulness in Neutron is the L3's
auto_schedule_routers functionality. When creating a tenant's first HA
router more resources must also be created (such as a HA network and HA
ports). This specific flow of creating the resources can be invoke
simultaneously by 2 codepaths: the original create_router (invoked from
the REST API) and from the L3 agent's get_router_ids/sync_routers. These
simultaneous runs can produce many races, such as creating 2 HA networks
(where only one should exist), accidentally deleting valid port bindings
and more. Instead of hunting down these races (which can be a long and
inaccurate task since more races can always exist), this can be solved
much easily by locking the operations done on a single router_id.

Using tooz [1] allows for a distributed lock, which crosses all the
API/RPC workers on a single server and even crosses multiple neutron-
servers. Also, this will help mitigate all sort of races with different
resources (a lock can be associated with a uuid so it won't matter if
the uuid is a router_id, network_id)


[1]: https://github.com/openstack/tooz/tree/master/

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1552680

Title:
  [RFE] Add support for DLM

Status in neutron:
  New

Bug description:
  Neutron has many code paths that can collide and be raceful which each
  other. Current ongoing work can mitigate and minimize these races but
  work is slow and it's very hard to fight against what you don't know
  (ie. there can always be more races you're not aware of). A DLM
  (Distributed Lock Mechanism) such as tooz [1] can help mitigate this
  greatly.

  An excellent example of this racefulness in Neutron is the L3's
  auto_schedule_routers functionality. When creating a tenant's first HA
  router more resources must also be created (such as a HA network and
  HA ports). This specific flow of creating the resources can be invoke
  simultaneously by 2 codepaths: the original create_router (invoked
  from the REST API) and from the L3 agent's
  get_router_ids/sync_routers. These simultaneous runs can produce many
  races, such as creating 2 HA networks (where only one should exist),
  accidentally deleting valid port bindings and more. Instead of hunting
  down these races (which can be a long and inaccurate task since more
  races can always exist), this can be solved much easily by locking the
  operations done on a single router_id.

  Using tooz [1] allows for a distributed lock, which crosses all the
  API/RPC workers on a single server and even crosses multiple neutron-
  servers. Also, this will help mitigate all sort of races with
  different resources (a lock can be associated with a uuid so it won't
  matter if the uuid is a router_id, network_id)

  
  [1]: https://github.com/openstack/tooz/tree/master/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1552680/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1550886] [NEW] L3 Agent's fullsync is raceful with creation of HA router

2016-02-28 Thread John Schwarz

Public bug reported:

When creating an HA router, after the server creates all the DB objects
(including the HA network and ports if it's the first one), the server
continues on the schedule the router to (some of) the available agents.

The race is achieved when an L3 agent router issues a sync_router
request, which later down the line ends up in an auto_schedule_routers()
call. If this happens before the above schedule (of the create_router())
is complete, the server will refuse to schedule the router to the other
intended L3 agents, resulting is less agents being scheduled.

The only way to fix this is either restarting one of the L3 agents which
didn't get scheduled, or recreating the router. Either is a bad option.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress


** Tags: l3-ha

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1550886

Title:
  L3 Agent's fullsync is raceful with creation of HA router

Status in neutron:
  In Progress

Bug description:
  When creating an HA router, after the server creates all the DB
  objects (including the HA network and ports if it's the first one),
  the server continues on the schedule the router to (some of) the
  available agents.

  The race is achieved when an L3 agent router issues a sync_router
  request, which later down the line ends up in an
  auto_schedule_routers() call. If this happens before the above
  schedule (of the create_router()) is complete, the server will refuse
  to schedule the router to the other intended L3 agents, resulting is
  less agents being scheduled.

  The only way to fix this is either restarting one of the L3 agents
  which didn't get scheduled, or recreating the router. Either is a bad
  option.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1550886/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1546490] [NEW] Security groups don't work with fullstack

2016-02-17 Thread John Schwarz

Public bug reported:

Iptables doesn't work properly with fullstack, as can be observed in
[1].

The gist is that since all ovs-agents are running on the same namespace, they 
try to override each other's iptables, causing the failures. This will 
obviously cause security groups to fail.
Also, Assaf Muller mentioned that since FakeMachines are directly connected to 
br-int, security groups will also not work properly on them. Instead, they 
should be connected through an intermediary linuxbridge.

[1]: http://logs.openstack.org/71/270971/3/check/gate-neutron-dsvm-
fullstack/c913b51/logs/TestConnectivitySameNetwork.test_connectivity_VLANs,Ofctl_
/neutron-openvswitch-agent--2016-02-14--
11-40-19-078390.log.txt.gz#_2016-02-14_11_41_03_165

** Affects: neutron
 Importance: Undecided
 Status: Confirmed


** Tags: fullstack

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1546490

Title:
  Security groups don't work with fullstack

Status in neutron:
  Confirmed

Bug description:
  Iptables doesn't work properly with fullstack, as can be observed in
  [1].

  The gist is that since all ovs-agents are running on the same namespace, they 
try to override each other's iptables, causing the failures. This will 
obviously cause security groups to fail.
  Also, Assaf Muller mentioned that since FakeMachines are directly connected 
to br-int, security groups will also not work properly on them. Instead, they 
should be connected through an intermediary linuxbridge.

  [1]: http://logs.openstack.org/71/270971/3/check/gate-neutron-dsvm-
  
fullstack/c913b51/logs/TestConnectivitySameNetwork.test_connectivity_VLANs,Ofctl_
  /neutron-openvswitch-agent--2016-02-14--
  11-40-19-078390.log.txt.gz#_2016-02-14_11_41_03_165

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1546490/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1523845] [NEW] Pip package 'ovs' needed but not present in requirements.txt

2015-12-08 Thread John Schwarz

Public bug reported:

As the title mentions, the 'ovs' pip package is needed for [1], but is
not present in the requirements.txt [2] and it should be changed to
reflect this dependency.

[1]: 
https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/neutron/agent/ovsdb/impl_idl.py#L21
[2]: 
https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/requirements.txt

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1523845

Title:
  Pip package 'ovs' needed but not present in requirements.txt

Status in neutron:
  In Progress

Bug description:
  As the title mentions, the 'ovs' pip package is needed for [1], but is
  not present in the requirements.txt [2] and it should be changed to
  reflect this dependency.

  [1]: 
https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/neutron/agent/ovsdb/impl_idl.py#L21
  [2]: 
https://github.com/openstack/neutron/blob/7a5ebc171f9ff342d7526808b1063b58cc631fec/requirements.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1523845/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1520271] [NEW] L3's metadata fail to run on Python 2.7.5 when metadata_proxy_watch_log=True

2015-11-26 Thread John Schwarz

Public bug reported:

Following [1], Neutron depends on WatchedFileHandler having a 'delay' property. 
This attribute is not defined in Python's API [2] but Neutron depends on it 
anyway.
In Python 2.7.6 and later versions (like the one running at the gate), this 
attribute exists, but in 2.7.5 and below it does not, causing metadata to not 
run.

[1]: https://review.openstack.org/#/c/161494/18
[2]: https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler

** Affects: neutron
 Importance: Undecided
 Status: New

** Summary changed:

- L3's metadata functional tests fail on Python 2.7.5
+ L3's metadata fail to run on Python 2.7.5 when metadata_proxy_watch_log=True

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1520271

Title:
  L3's metadata fail to run on Python 2.7.5 when
  metadata_proxy_watch_log=True

Status in neutron:
  New

Bug description:
  Following [1], Neutron depends on WatchedFileHandler having a 'delay' 
property. This attribute is not defined in Python's API [2] but Neutron depends 
on it anyway.
  In Python 2.7.6 and later versions (like the one running at the gate), this 
attribute exists, but in 2.7.5 and below it does not, causing metadata to not 
run.

  [1]: https://review.openstack.org/#/c/161494/18
  [2]: 
https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1520271/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1506503] [NEW] OVS agents periodically fail to start in fullstack

2015-10-15 Thread John Schwarz

Public bug reported:

Changeset [1] introduced a validation that the local_ip specified for
tunneling is actually used by one of the devices on the machine running
an OVS agent.

In Fullstack, multiple tests may run concurrently, which can cause a
race condition: suppose an ovs agent starts running as part of test A.
It retrieves the list of all devices on the host and starts a sequential
loop on them. In the mean time, some *other* fullstack test (test B)
completes and deletes the devices it created. The agent has that deleted
device in the list and when it will reach the device it will find out it
does not exist and crash.

[1]: https://review.openstack.org/#/c/154043/

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress


** Tags: fullstack

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1506503

Title:
  OVS agents periodically fail to start in fullstack

Status in neutron:
  In Progress

Bug description:
  Changeset [1] introduced a validation that the local_ip specified for
  tunneling is actually used by one of the devices on the machine
  running an OVS agent.

  In Fullstack, multiple tests may run concurrently, which can cause a
  race condition: suppose an ovs agent starts running as part of test A.
  It retrieves the list of all devices on the host and starts a
  sequential loop on them. In the mean time, some *other* fullstack test
  (test B) completes and deletes the devices it created. The agent has
  that deleted device in the list and when it will reach the device it
  will find out it does not exist and crash.

  [1]: https://review.openstack.org/#/c/154043/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1506503/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1506021] [NEW] AsyncProcess.stop() can lead to deadlock

2015-10-14 Thread John Schwarz

Public bug reported:

The bug occurs when calling stop() on an AsyncProcess instance which is
running a progress generating substantial amounts of output to
stdout/stderr and that has a signal handler for some signal (SIGTERM for
example) that causes the program to exit gracefully.

Linux Pipes 101: when calling write() to some one-way pipe, if the pipe
is full of data [1], write() will block until the other end read()s from
the pipe.

AsyncProcess is using eventlet.green.subprocess to create an eventlet-
safe subprocess, using stdout=subprocess.PIPE and
stderr=subprocess.PIPE. In other words, stdout and stderr are redirected
to a one-way linux pipe to the executing AsyncProcess. When stopping the
subprocess, the current code [2] first kills the readers used to empty
stdout/stderr and only then sends the signal.

It is clear that if SIGTERM is sent to the subprocess, and if the
subprocess is generating a lot of output to stdout/stderr AFTER the
readers were killed, a deadlock is achieved: the parent process is
blocking on wait() and the subprocess is blocking on write() (waiting
for someone to read and empty the pipe).

This can be avoided by sending SIGKILL to the AsyncProcesses (this is
the code's default), but other signals such as SIGTERM, that can be
handled by the userspace code to cause the process to exit gracefully,
might trigger this deadlock. For example, I ran into this while trying
to modify existing fullstack tests to SIGTERM processes instead of
SIGKILL them, and the ovs agent got deadlocked a lot.

[1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
[2]: 
https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1506021

Title:
  AsyncProcess.stop() can lead to deadlock

Status in neutron:
  New

Bug description:
  The bug occurs when calling stop() on an AsyncProcess instance which
  is running a progress generating substantial amounts of output to
  stdout/stderr and that has a signal handler for some signal (SIGTERM
  for example) that causes the program to exit gracefully.

  Linux Pipes 101: when calling write() to some one-way pipe, if the
  pipe is full of data [1], write() will block until the other end
  read()s from the pipe.

  AsyncProcess is using eventlet.green.subprocess to create an eventlet-
  safe subprocess, using stdout=subprocess.PIPE and
  stderr=subprocess.PIPE. In other words, stdout and stderr are
  redirected to a one-way linux pipe to the executing AsyncProcess. When
  stopping the subprocess, the current code [2] first kills the readers
  used to empty stdout/stderr and only then sends the signal.

  It is clear that if SIGTERM is sent to the subprocess, and if the
  subprocess is generating a lot of output to stdout/stderr AFTER the
  readers were killed, a deadlock is achieved: the parent process is
  blocking on wait() and the subprocess is blocking on write() (waiting
  for someone to read and empty the pipe).

  This can be avoided by sending SIGKILL to the AsyncProcesses (this is
  the code's default), but other signals such as SIGTERM, that can be
  handled by the userspace code to cause the process to exit gracefully,
  might trigger this deadlock. For example, I ran into this while trying
  to modify existing fullstack tests to SIGTERM processes instead of
  SIGKILL them, and the ovs agent got deadlocked a lot.

  [1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
  [2]: 
https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1506021/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1505203] [NEW] Setting admin_state_up=False on an HA router with gateway raises an exception

2015-10-12 Thread John Schwarz

Public bug reported:

Steps to reproduce:

1. Create an HA router,
2. Connect the router to a gateway,
3. neutron router-update  --admin-state-down=False

This results in the following traceback on the l3 agent:

2015-10-12 14:43:44.755 ERROR neutron.agent.l3.router_info [-] 
Command: ['ip', 'netns', 'exec', 
u'qrouter-0ce494ff-593a-4d6d-bf06-248979d6cf7a', 'ip', '-4', 'addr', 'del', 
'172.24.4.11/24', 'dev', u'qg-4f6a7587-00']
Exit code: 2
Stdin: 
Stdout: 
Stderr: RTNETLINK answers: Cannot assign requested address

2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Traceback (most 
recent call last):
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/common/utils.py", line 356, in call
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info return 
func(*args, **kwargs)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 695, in process
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
self.process_external(agent)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 661, in 
process_external
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
self._process_external_gateway(ex_gw_port, agent.pd)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 575, in 
_process_external_gateway
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
self.external_gateway_removed(self.ex_gw_port, interface_name)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/l3/ha_router.py", line 368, in 
external_gateway_removed
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info interface_name)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 550, in 
external_gateway_removed
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
ip_addr['prefixlen']))
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/l3/router_info.py", line 201, in 
remove_external_gateway_ip
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
device.delete_addr_and_conntrack_state(ip_cidr)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 255, in 
delete_addr_and_conntrack_state
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
self.addr.delete(cidr)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 511, in delete
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 'dev', 
self.name))
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 295, in _as_root
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
use_root_namespace=use_root_namespace)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 80, in _as_root
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
log_fail_as_error=self.log_fail_as_error)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/linux/ip_lib.py", line 89, in _execute
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
log_fail_as_error=log_fail_as_error)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info   File 
"/opt/openstack/neutron/neutron/agent/linux/utils.py", line 157, in execute
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info raise 
RuntimeError(m)
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info RuntimeError:
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Command: ['ip', 
'netns', 'exec', u'qrouter-0ce494ff-593a-4d6d-bf06-248979d6cf7a', 'ip', '-4', 
'addr', 'del', '172.24.4.11/24', 'dev', u'qg-4f6a7587-00']
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Exit code: 2
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Stdin:
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Stdout:
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info Stderr: RTNETLINK 
answers: Cannot assign requested address
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
2015-10-12 14:43:44.755 TRACE neutron.agent.l3.router_info 
2015-10-12 14:43:44.755 ERROR neutron.agent.l3.agent [-] Error while deleting 
router 0ce494ff-593a-4d6d-bf06-248979d6cf7a

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1505203

Title:
  Setting adm

[Yahoo-eng-team] [Bug 1493788] Re: DVR: Restarting the OVS agent does not re-create some of br-tun's flows

2015-09-09 Thread John Schwarz

*** This bug is a duplicate of bug 1489372 ***
https://bugs.launchpad.net/bugs/1489372

@Arthur, you are correct. I've used 'git bisect' and found out that [1]
already fixes this issue. I will close this bug as a duplicate.

[1]: https://review.openstack.org/#/c/218118/

** Changed in: neutron
   Status: New => Fix Released

** This bug has been marked a duplicate of bug 1489372
   OVS agent restart breaks connectivity when l2pop is turned on

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1493788

Title:
  DVR: Restarting the OVS agent does not re-create some of br-tun's
  flows

Status in neutron:
  Fix Released

Bug description:
  When, on a setup that has a DVR router, an OVS agent restarts, it
  fails to re-create some of the flows for br-tun. For example:

  $ # flows before agent restart
  $ sudo ovs-ofctl dump-flows br-tun
  NXST_FLOW reply (xid=0x4):
   cookie=0x0, duration=77.325s, table=0, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0xa30fd64e48832cbc, duration=77.292s, table=0, n_packets=0, 
n_bytes=0, idle_age=190, priority=1,in_port=1 actions=resubmit(,1)
   cookie=0xa30fd64e48832cbc, duration=77.281s, table=1, n_packets=0, 
n_bytes=0, idle_age=190, priority=0 actions=resubmit(,2)
   cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
   cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, 
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
   cookie=0x0, duration=77.324s, table=3, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0x0, duration=77.324s, table=4, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0xa30fd64e48832cbc, duration=77.018s, table=4, n_packets=0, 
n_bytes=0, idle_age=193, priority=1,tun_id=0x437 
actions=mod_vlan_vid:1,resubmit(,9)
   cookie=0x0, duration=77.323s, table=6, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0xa30fd64e48832cbc, duration=77.286s, table=9, n_packets=0, 
n_bytes=0, idle_age=190, priority=0 actions=resubmit(,10)
   cookie=0xa30fd64e48832cbc, duration=77.259s, table=9, n_packets=0, 
n_bytes=0, idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1
   cookie=0x0, duration=77.323s, table=10, n_packets=0, n_bytes=0, idle_age=77, 
priority=1 
actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa30fd64e48832cbc,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
   cookie=0x0, duration=77.323s, table=20, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=resubmit(,22)
   cookie=0xa30fd64e48832cbc, duration=77.317s, table=22, n_packets=0, 
n_bytes=0, idle_age=193, priority=0 actions=drop

  $ # flows after agent restart
  $ sudo ovs-ofctl dump-flows br-tun
  NXST_FLOW reply (xid=0x4):
   cookie=0xbcfcd1c3b35d83e3, duration=3.072s, table=0, n_packets=0, n_bytes=0, 
idle_age=223, priority=1,in_port=1 actions=resubmit(,1)
   cookie=0xbcfcd1c3b35d83e3, duration=3.060s, table=1, n_packets=0, n_bytes=0, 
idle_age=223, priority=0 actions=resubmit(,2)
   cookie=0xbcfcd1c3b35d83e3, duration=2.997s, table=4, n_packets=0, n_bytes=0, 
idle_age=226, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9)
   cookie=0xbcfcd1c3b35d83e3, duration=3.067s, table=9, n_packets=0, n_bytes=0, 
idle_age=223, priority=0 actions=resubmit(,10)
   cookie=0xbcfcd1c3b35d83e3, duration=3.038s, table=9, n_packets=0, n_bytes=0, 
idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1
   cookie=0xbcfcd1c3b35d83e3, duration=3.100s, table=22, n_packets=0, 
n_bytes=0, idle_age=226, priority=0 actions=drop

  It is clear that quite a few flows are missing. They can be re-created
  by deleting all the flows on br-int - this starts a chain reaction
  which ultimately recreates all the flows, including br-tun's.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1493788/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1493788] [NEW] DVR: Restarting the OVS agent does not re-create some of br-tun's flows

2015-09-09 Thread John Schwarz

Public bug reported:

When, on a setup that has a DVR router, an OVS agent restarts, it fails
to re-create some of the flows for br-tun. For example:

$ # flows before agent restart
$ sudo ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=77.325s, table=0, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
 cookie=0xa30fd64e48832cbc, duration=77.292s, table=0, n_packets=0, n_bytes=0, 
idle_age=190, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0xa30fd64e48832cbc, duration=77.281s, table=1, n_packets=0, n_bytes=0, 
idle_age=190, priority=0 actions=resubmit(,2)
 cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, 
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
 cookie=0x0, duration=77.324s, table=3, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
 cookie=0x0, duration=77.324s, table=4, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
 cookie=0xa30fd64e48832cbc, duration=77.018s, table=4, n_packets=0, n_bytes=0, 
idle_age=193, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9)
 cookie=0x0, duration=77.323s, table=6, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
 cookie=0xa30fd64e48832cbc, duration=77.286s, table=9, n_packets=0, n_bytes=0, 
idle_age=190, priority=0 actions=resubmit(,10)
 cookie=0xa30fd64e48832cbc, duration=77.259s, table=9, n_packets=0, n_bytes=0, 
idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1
 cookie=0x0, duration=77.323s, table=10, n_packets=0, n_bytes=0, idle_age=77, 
priority=1 
actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa30fd64e48832cbc,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
 cookie=0x0, duration=77.323s, table=20, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=resubmit(,22)
 cookie=0xa30fd64e48832cbc, duration=77.317s, table=22, n_packets=0, n_bytes=0, 
idle_age=193, priority=0 actions=drop

$ # flows after agent restart
$ sudo ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0xbcfcd1c3b35d83e3, duration=3.072s, table=0, n_packets=0, n_bytes=0, 
idle_age=223, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0xbcfcd1c3b35d83e3, duration=3.060s, table=1, n_packets=0, n_bytes=0, 
idle_age=223, priority=0 actions=resubmit(,2)
 cookie=0xbcfcd1c3b35d83e3, duration=2.997s, table=4, n_packets=0, n_bytes=0, 
idle_age=226, priority=1,tun_id=0x437 actions=mod_vlan_vid:1,resubmit(,9)
 cookie=0xbcfcd1c3b35d83e3, duration=3.067s, table=9, n_packets=0, n_bytes=0, 
idle_age=223, priority=0 actions=resubmit(,10)
 cookie=0xbcfcd1c3b35d83e3, duration=3.038s, table=9, n_packets=0, n_bytes=0, 
idle_age=65534, priority=1,dl_src=fa:16:3f:55:8c:22 actions=output:1
 cookie=0xbcfcd1c3b35d83e3, duration=3.100s, table=22, n_packets=0, n_bytes=0, 
idle_age=226, priority=0 actions=drop

It is clear that quite a few flows are missing. They can be re-created
by deleting all the flows on br-int - this starts a chain reaction which
ultimately recreates all the flows, including br-tun's.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1493788

Title:
  DVR: Restarting the OVS agent does not re-create some of br-tun's
  flows

Status in neutron:
  New

Bug description:
  When, on a setup that has a DVR router, an OVS agent restarts, it
  fails to re-create some of the flows for br-tun. For example:

  $ # flows before agent restart
  $ sudo ovs-ofctl dump-flows br-tun
  NXST_FLOW reply (xid=0x4):
   cookie=0x0, duration=77.325s, table=0, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0xa30fd64e48832cbc, duration=77.292s, table=0, n_packets=0, 
n_bytes=0, idle_age=190, priority=1,in_port=1 actions=resubmit(,1)
   cookie=0xa30fd64e48832cbc, duration=77.281s, table=1, n_packets=0, 
n_bytes=0, idle_age=190, priority=0 actions=resubmit(,2)
   cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
   cookie=0x0, duration=77.324s, table=2, n_packets=0, n_bytes=0, idle_age=77, 
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
   cookie=0x0, duration=77.324s, table=3, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0x0, duration=77.324s, table=4, n_packets=0, n_bytes=0, idle_age=77, 
priority=0 actions=drop
   cookie=0xa30fd64e48832cbc, duration=77.018s, table=4, n_packets=0, 
n_bytes=0, idle_age=193, priority=1,tun_id=0x437 
actions=mod_vlan_vid:1,resubmit(,9)
   cookie=0x0, duration=77.

[Yahoo-eng-team] [Bug 1488996] [NEW] QoS doesn't work when l2pop is enabled

2015-08-26 Thread John Schwarz

Public bug reported:

My ml2 configuration file contains the following:

[ml2]
extension_drivers = port_security,qos
mechanism_drivers = openvswitch,l2population


However, when trying to get a list of available rule types, the neutron-server 
logs this to the log file:

WARNING neutron.plugins.ml2.managers [req-19db3de7-1a1a-
42b5-b4c0-b9f146a6bcac admin b44ee578c44a426e81752b4df76c1a89]
l2population does not support QoS; no rule types available

Seems to me like this should not be the case, as l2pop has nothing to do
with QoS. Probably other mechanism drivers also produce the same error.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: l2-pop qos

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1488996

Title:
  QoS doesn't work when l2pop is enabled

Status in neutron:
  New

Bug description:
  My ml2 configuration file contains the following:

  [ml2]
  extension_drivers = port_security,qos
  mechanism_drivers = openvswitch,l2population

  
  However, when trying to get a list of available rule types, the 
neutron-server logs this to the log file:

  WARNING neutron.plugins.ml2.managers [req-19db3de7-1a1a-
  42b5-b4c0-b9f146a6bcac admin b44ee578c44a426e81752b4df76c1a89]
  l2population does not support QoS; no rule types available

  Seems to me like this should not be the case, as l2pop has nothing to
  do with QoS. Probably other mechanism drivers also produce the same
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1488996/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1487053] [NEW] validate_local_ip shouldn't run if no tunneling is enabled

2015-08-20 Thread John Schwarz

Public bug reported:

In case no tunnel_types were specified in the ml2 configuration, the
configuration option local_ip is ignored in the code. However,
validate_local_ip always check if local_ip is being used by an actual
interface, even though it shouldn't if tunnel_types is empty.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1487053

Title:
  validate_local_ip shouldn't run if no tunneling is enabled

Status in neutron:
  In Progress

Bug description:
  In case no tunnel_types were specified in the ml2 configuration, the
  configuration option local_ip is ignored in the code. However,
  validate_local_ip always check if local_ip is being used by an actual
  interface, even though it shouldn't if tunnel_types is empty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1487053/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1486627] [NEW] DVR doesn't always schedule SNAT routers

2015-08-19 Thread John Schwarz

Public bug reported:

Creating a new router, attaching it to some tenant network and then
adding a gateway for the router doesn't create the snat resources (such
as 'snat-%s' namespace and other interfaces).

Adding a gateway first (before attaching the router to a tenant network)
creates the snat resources correctly.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1486627

Title:
  DVR doesn't always schedule SNAT routers

Status in neutron:
  Confirmed

Bug description:
  Creating a new router, attaching it to some tenant network and then
  adding a gateway for the router doesn't create the snat resources
  (such as 'snat-%s' namespace and other interfaces).

  Adding a gateway first (before attaching the router to a tenant
  network) creates the snat resources correctly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1486627/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1453888] [NEW] Fullstack doesn't clean resources if environment fails to start

2015-05-11 Thread John Schwarz

Public bug reported:

As the title says, in case fullstack_fixtures.EnvironmentFixture fails
to start because 'wait_until_env_is_up' didn't return successfully (for
example, there was a problem with one of the agents), cleanUp isn't
called. This causes all the resources of the fixtures that are used in
the environment (processes, configurations, namespaces...) not to be
cleaned.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1453888

Title:
  Fullstack doesn't clean resources if environment fails to start

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  As the title says, in case fullstack_fixtures.EnvironmentFixture fails
  to start because 'wait_until_env_is_up' didn't return successfully
  (for example, there was a problem with one of the agents), cleanUp
  isn't called. This causes all the resources of the fixtures that are
  used in the environment (processes, configurations, namespaces...) not
  to be cleaned.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1453888/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1446284] [NEW] functional tests fail non-deterministicly because of full-stack

2015-04-20 Thread John Schwarz

Public bug reported:

On startup, the L3 agent looks for namespaces to clean that don't belong
to him, in order to minimize system resources (namespaces) in the
machine.

The fullstack tests run an l3 agent that after deletes some namespaces
that he doesn't know. This in turns causes the deletion of namespaces
used by the functional tests, causing non-deterministic failures at the
gate.

The code in question which is responsible for the deletion of
namespaces:
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/namespace_manager.py#L73

How to replicate:
1. Run 'tox -e dsvm-functional -- neutron.tests.functional.agent.test_l3_agent
neutron.tests.fullstack'
2. Some tests are likely to fail
3. ???
4. Profit?

Example of test runs:
1. http://pastebin.com/63n7Y2YK

** Affects: neutron
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1446284

Title:
functional tests fail non-deterministicly because of full-stack

Status in OpenStack Neutron (virtual network service):
New

Bug description:
On startup, the L3 agent looks for namespaces to clean that don't
belong to him, in order to minimize system resources (namespaces) in
the machine.

The code in question which is responsible for the deletion of
namespaces:

https://github.com/openstack/neutron/blob/master/neutron/agent/l3/namespace_manager.py#L73

How to replicate:
1. Run 'tox -e dsvm-functional --
neutron.tests.functional.agent.test_l3_agent neutron.tests.fullstack'
2. Some tests are likely to fail
3. ???
4. Profit?

Example of test runs:
1. http://pastebin.com/63n7Y2YK

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1446284/+subscriptions

--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1405584] [NEW] misc-sanity-checks.sh doesn't work on OS X

2014-12-25 Thread John Schwarz

Public bug reported:

The patch introduced by https://review.openstack.org/#/c/143539/ changed
the sanity script to do all sorts of tests. Among those, it creates a
new temporary directory using /bin/mktemp.

On OS X, the executable is present in /usr/bin/mktemp.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1405584

Title:
  misc-sanity-checks.sh doesn't work on OS X

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  The patch introduced by https://review.openstack.org/#/c/143539/
  changed the sanity script to do all sorts of tests. Among those, it
  creates a new temporary directory using /bin/mktemp.

  On OS X, the executable is present in /usr/bin/mktemp.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1405584/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1374946] [NEW] HA should have functional tests

2014-09-28 Thread John Schwarz

Public bug reported:

Current HA related code should have functional tests merged to upstream.
All patches relevant to HA functional tests should be related to this
bug.

** Affects: neutron
 Importance: Medium
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1374946

Title:
  HA should have functional tests

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  Current HA related code should have functional tests merged to
  upstream. All patches relevant to HA functional tests should be
  related to this bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1374946/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1374947] [NEW] HA should have integration tests

2014-09-28 Thread John Schwarz

Public bug reported:

Current HA related code should have integration tests merged to
upstream. All patches relevant to HA integration tests should be related
to this bug, until a proper blueprint is written for Kilo.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1374947

Title:
  HA should have integration tests

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  Current HA related code should have integration tests merged to
  upstream. All patches relevant to HA integration tests should be
  related to this bug, until a proper blueprint is written for Kilo.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1374947/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1370914] [NEW] When two ovs ports contain same external_ids:face-id field, ovs agent might fail finding correct port.

2014-09-18 Thread John Schwarz

Public bug reported:

As the title says, if there are 2 different ovs ports with the same
external_ids:iface-id field (which is the port_id), when at least one of
them is managed by the ovs agent, it might fail finding the correct one
if they are not connected to the same bridge.

Steps to reproduce:
1. Create a router with an internal port to some Neutron network
2. Find the port in 'ovs-vsctl show'
3. Use the following command to find the port_id in ovs: sudo ovs-vsctl  
--columns=external_ids list Interface 
4. Use the following commands to create a new port with the same field in a new 
bridge:
 sudo ovs-vsctl add-br a
 sudo ip link add dummy12312312 type dummy
 sudo ovs-vsctl add-port br-a dummy12312312
 sudo ovs-vsctl set Interface dummy12312312 external_ids:iface-id="" # 
port_id was obtained in point 3.
5. Restart the ovs agent.

At this point the ovs agent's log should show "Port: dummy12312312 is on
br-a, not on br-int".

Expected result: ovs agent should know to iterate though the options and
find the correct port in the correct bridge.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1370914

Title:
  When two ovs ports contain same external_ids:face-id field, ovs agent
  might fail finding correct port.

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  As the title says, if there are 2 different ovs ports with the same
  external_ids:iface-id field (which is the port_id), when at least one
  of them is managed by the ovs agent, it might fail finding the correct
  one if they are not connected to the same bridge.

  Steps to reproduce:
  1. Create a router with an internal port to some Neutron network
  2. Find the port in 'ovs-vsctl show'
  3. Use the following command to find the port_id in ovs: sudo ovs-vsctl  
--columns=external_ids list Interface 
  4. Use the following commands to create a new port with the same field in a 
new bridge:
   sudo ovs-vsctl add-br a
   sudo ip link add dummy12312312 type dummy
   sudo ovs-vsctl add-port br-a dummy12312312
   sudo ovs-vsctl set Interface dummy12312312 external_ids:iface-id="" 
# port_id was obtained in point 3.
  5. Restart the ovs agent.

  At this point the ovs agent's log should show "Port: dummy12312312 is
  on br-a, not on br-int".

  Expected result: ovs agent should know to iterate though the options
  and find the correct port in the correct bridge.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1370914/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1358206] Re: ovsdb_monitor.SimpleInterfaceMonitor throws eventlet.timeout.Timeout(5)

2014-09-07 Thread John Schwarz

** Changed in: neutron
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1358206

Title:
  ovsdb_monitor.SimpleInterfaceMonitor throws
  eventlet.timeout.Timeout(5)

Status in OpenStack Neutron (virtual network service):
  Fix Released

Bug description:
  This is found during functional testing, when .start() is called with
  block=True during sightly high load.

  This suggest the default timeout needs to be rised to make this module
  work in all situations.

  
https://review.openstack.org/#/c/112798/14/neutron/agent/linux/ovsdb_monitor.py
  (I will extract patch from here)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1358206/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1362213] [NEW] haproxy configuration spams logged-in users when no servers are available

2014-08-27 Thread John Schwarz

Public bug reported:

On certain systems which use the default syslog configuration, using
haproxy-based LBaaS causes error logs to spam all the logged-in users:

Message from syslogd@alpha-controller at Jun  9 01:32:07 ...
 haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server 
available!

Message from syslogd@alpha-controller at Jun  9 01:32:07 ...
 haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server 
available!


The error message is valid - it happens when, for example, there are no backend 
servers available to handle the service requests because all members are down.
However, there is no point in sending the messages to all the logged-in users. 
The wanted result is that each namespace will have its own log file, which will 
contain all the log messages the relevant haproxy process produces.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1362213

Title:
  haproxy configuration spams logged-in users when no servers are
  available

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  On certain systems which use the default syslog configuration, using
  haproxy-based LBaaS causes error logs to spam all the logged-in users:

  Message from syslogd@alpha-controller at Jun  9 01:32:07 ...
   haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server 
available!

  Message from syslogd@alpha-controller at Jun  9 01:32:07 ...
   haproxy[2719]:backend 32fce5ee-b7f7-4415-a572-a83eba1be6b0 has no server 
available!

  
  The error message is valid - it happens when, for example, there are no 
backend servers available to handle the service requests because all members 
are down.
  However, there is no point in sending the messages to all the logged-in 
users. The wanted result is that each namespace will have its own log file, 
which will contain all the log messages the relevant haproxy process produces.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1362213/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1361545] [NEW] dhcp agent shouldn't spawn metadata-proxy for non-isolated networks

2014-08-26 Thread John Schwarz

Public bug reported:

The "enable_isolated_metadata = True" options tells DHCP agents that for each 
network under its care, a neutron-ns-metadata-proxy process should be spawned, 
regardless if it's isolated or not.
This is fine for isolated networks (networks with no routers and no default 
gateways), but for networks which are connected to a router (for which the L3 
agent spawns a separate neutron-ns-metadata-proxy which is attached to the 
router's namespace), 2 different metadata proxies are spawned. For these 
networks, the static routes which are pushed to each instance, letting it know 
where to search for the metadata-proxy, is not pushed and the proxy spawned 
from the DHCP agent is left unused.

The DHCP agent should know if the network it handles is isolated or not,
and for non-isolated networks, no neutron-ns-metadata-proxy processes
should spawn.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1361545

Title:
  dhcp agent shouldn't spawn metadata-proxy for non-isolated networks

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  The "enable_isolated_metadata = True" options tells DHCP agents that for each 
network under its care, a neutron-ns-metadata-proxy process should be spawned, 
regardless if it's isolated or not.
  This is fine for isolated networks (networks with no routers and no default 
gateways), but for networks which are connected to a router (for which the L3 
agent spawns a separate neutron-ns-metadata-proxy which is attached to the 
router's namespace), 2 different metadata proxies are spawned. For these 
networks, the static routes which are pushed to each instance, letting it know 
where to search for the metadata-proxy, is not pushed and the proxy spawned 
from the DHCP agent is left unused.

  The DHCP agent should know if the network it handles is isolated or
  not, and for non-isolated networks, no neutron-ns-metadata-proxy
  processes should spawn.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1361545/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1350852] [NEW] REST API should allow router filtering by network_id

2014-07-31 Thread John Schwarz

Public bug reported:

There is currently no way to display all routers that are connected to a
certain network. This makes it hard for large deployments to figure out
which networks are connected to which routers. The proposed change adds
this functionality to the REST API, which should also give the end-user
the ability to apply this filter using the neutronclient.

** Affects: neutron
 Importance: Undecided
 Assignee: John Schwarz (jschwarz)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) => John Schwarz (jschwarz)

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1350852

Title:
  REST API should allow router filtering by network_id

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  There is currently no way to display all routers that are connected to
  a certain network. This makes it hard for large deployments to figure
  out which networks are connected to which routers. The proposed change
  adds this functionality to the REST API, which should also give the
  end-user the ability to apply this filter using the neutronclient.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1350852/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

57 matches

Mail list logo