[Yahoo-eng-team] [Bug 1551288] Re: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error

2016-03-29 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/298056
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=0361ac620249950d8bca628719e9c14c4382
Submitter: Jenkins
Branch:master

commit 0361ac620249950d8bca628719e9c14c4382
Author: Assaf Muller 
Date:   Thu Mar 24 22:14:07 2016 -0400

Add fullstack cross-process port/ip address fixtures

We've had a series of bugs with resources that need
to be unique on the system across test runner
processes. Ports are used by neutron-server and the
OVS agent when run in native openflow mode. The function
that generates ports looks up random unused ports and
starts the service. However, it is raceful: By the time the
port is found to be unused and the service is started,
another test runner can pick the same random port.
With close to 65536 ports to choose from, the chance
for collision is low, but given enough test runs, it's
happened a non-trivial amount of times, and given that
a voting job needs a very low false-negative rate, we
need a more robust solution. The same applies to IP
addresses that are used by the OVS agent in tunneling
mode, and for the LB agent in all modes. With IP addresses,
we don't check if the IP address is used, we simply
pick a random address from a large pool, and again
we've seen a non-trivial amount of test failures.

The bugs referenced below had simple, short term solutions
applied but the bugs remain remain. This patch is a correct,
long term solution that doesn't rely on chance.

This patch adds a resource allocator that uses the disk
to persist allocations. Access to the disk is guarded
via a file lock. IP address, networks and ports fixtures
use an allocator internally.

Closes-Bug: #1551288
Closes-Bug: #1561248
Closes-Bug: #1560277
Change-Id: I46c0ca138b806759128462f8d44a5fab96a106d3


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1551288

Title:
  Fullstack native tests sometimes fail with an OVS agent failing to
  start with 'Address already in use' error

Status in neutron:
  Fix Released

Bug description:
  Example failure:
  test_connectivity(VLANs,Native) fails with this error:

  http://paste.openstack.org/show/488585/

  wait_until_env_is_up is timing out, which typically means that the
  expected number of agents failed to start. Indeed in this particular
  example I saw this line being output repeatedly in neutron-server.log:

  [29/Feb/2016 04:16:31] "GET /v2.0/agents.json HTTP/1.1" 200 1870
  0.005458

  Fullstack calls GET on agents to determine if the expected amount of
  agents were started and are successfully reporting back to neutron-
  server.

  We then see that one of the three OVS agents crashed with this TRACE:
  http://paste.openstack.org/show/488586/

  This happens only with the native tests using the Ryu library.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1551288/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1551288] Re: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error

2016-03-27 Thread Assaf Muller
Still seeing instances of this bug. I have a deterministic solution
coming up.

** Changed in: neutron
   Status: Fix Released => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1551288

Title:
  Fullstack native tests sometimes fail with an OVS agent failing to
  start with 'Address already in use' error

Status in neutron:
  Confirmed

Bug description:
  Example failure:
  test_connectivity(VLANs,Native) fails with this error:

  http://paste.openstack.org/show/488585/

  wait_until_env_is_up is timing out, which typically means that the
  expected number of agents failed to start. Indeed in this particular
  example I saw this line being output repeatedly in neutron-server.log:

  [29/Feb/2016 04:16:31] "GET /v2.0/agents.json HTTP/1.1" 200 1870
  0.005458

  Fullstack calls GET on agents to determine if the expected amount of
  agents were started and are successfully reporting back to neutron-
  server.

  We then see that one of the three OVS agents crashed with this TRACE:
  http://paste.openstack.org/show/488586/

  This happens only with the native tests using the Ryu library.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1551288/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1551288] Re: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error

2016-03-14 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/292392
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=bb567e9b32bf58cb5f74149f1f5cb9cb656e565e
Submitter: Jenkins
Branch:master

commit bb567e9b32bf58cb5f74149f1f5cb9cb656e565e
Author: Ihar Hrachyshka 
Date:   Mon Mar 14 14:35:31 2016 +0100

Reset RNG seed with current time and pid for each test started

This will hopefully fix fullstack failures where different process
fixtures running in parallel test processes and relying on the same
random.choice() generator seeded by the same initial value could pick up
the same value as a service free port, and spawn their respective
resources using the same port.

Which made one of those unlucky services to fail.

Change-Id: I13cfa9392fd138c5e1b1b7d397b9ea91b2a47ed2
Closes-Bug: #1551288


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1551288

Title:
  Fullstack native tests sometimes fail with an OVS agent failing to
  start with 'Address already in use' error

Status in neutron:
  Fix Released

Bug description:
  Example failure:
  test_connectivity(VLANs,Native) fails with this error:

  http://paste.openstack.org/show/488585/

  wait_until_env_is_up is timing out, which typically means that the
  expected number of agents failed to start. Indeed in this particular
  example I saw this line being output repeatedly in neutron-server.log:

  [29/Feb/2016 04:16:31] "GET /v2.0/agents.json HTTP/1.1" 200 1870
  0.005458

  Fullstack calls GET on agents to determine if the expected amount of
  agents were started and are successfully reporting back to neutron-
  server.

  We then see that one of the three OVS agents crashed with this TRACE:
  http://paste.openstack.org/show/488586/

  This happens only with the native tests using the Ryu library.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1551288/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp