[Yahoo-eng-team] [Bug 1551288] Re: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error
Reviewed: https://review.openstack.org/298056 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0361ac620249950d8bca628719e9c14c4382 Submitter: Jenkins Branch:master commit 0361ac620249950d8bca628719e9c14c4382 Author: Assaf MullerDate: Thu Mar 24 22:14:07 2016 -0400 Add fullstack cross-process port/ip address fixtures We've had a series of bugs with resources that need to be unique on the system across test runner processes. Ports are used by neutron-server and the OVS agent when run in native openflow mode. The function that generates ports looks up random unused ports and starts the service. However, it is raceful: By the time the port is found to be unused and the service is started, another test runner can pick the same random port. With close to 65536 ports to choose from, the chance for collision is low, but given enough test runs, it's happened a non-trivial amount of times, and given that a voting job needs a very low false-negative rate, we need a more robust solution. The same applies to IP addresses that are used by the OVS agent in tunneling mode, and for the LB agent in all modes. With IP addresses, we don't check if the IP address is used, we simply pick a random address from a large pool, and again we've seen a non-trivial amount of test failures. The bugs referenced below had simple, short term solutions applied but the bugs remain remain. This patch is a correct, long term solution that doesn't rely on chance. This patch adds a resource allocator that uses the disk to persist allocations. Access to the disk is guarded via a file lock. IP address, networks and ports fixtures use an allocator internally. Closes-Bug: #1551288 Closes-Bug: #1561248 Closes-Bug: #1560277 Change-Id: I46c0ca138b806759128462f8d44a5fab96a106d3 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1551288 Title: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error Status in neutron: Fix Released Bug description: Example failure: test_connectivity(VLANs,Native) fails with this error: http://paste.openstack.org/show/488585/ wait_until_env_is_up is timing out, which typically means that the expected number of agents failed to start. Indeed in this particular example I saw this line being output repeatedly in neutron-server.log: [29/Feb/2016 04:16:31] "GET /v2.0/agents.json HTTP/1.1" 200 1870 0.005458 Fullstack calls GET on agents to determine if the expected amount of agents were started and are successfully reporting back to neutron- server. We then see that one of the three OVS agents crashed with this TRACE: http://paste.openstack.org/show/488586/ This happens only with the native tests using the Ryu library. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1551288/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1551288] Re: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error
Still seeing instances of this bug. I have a deterministic solution coming up. ** Changed in: neutron Status: Fix Released => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1551288 Title: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error Status in neutron: Confirmed Bug description: Example failure: test_connectivity(VLANs,Native) fails with this error: http://paste.openstack.org/show/488585/ wait_until_env_is_up is timing out, which typically means that the expected number of agents failed to start. Indeed in this particular example I saw this line being output repeatedly in neutron-server.log: [29/Feb/2016 04:16:31] "GET /v2.0/agents.json HTTP/1.1" 200 1870 0.005458 Fullstack calls GET on agents to determine if the expected amount of agents were started and are successfully reporting back to neutron- server. We then see that one of the three OVS agents crashed with this TRACE: http://paste.openstack.org/show/488586/ This happens only with the native tests using the Ryu library. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1551288/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1551288] Re: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error
Reviewed: https://review.openstack.org/292392 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bb567e9b32bf58cb5f74149f1f5cb9cb656e565e Submitter: Jenkins Branch:master commit bb567e9b32bf58cb5f74149f1f5cb9cb656e565e Author: Ihar HrachyshkaDate: Mon Mar 14 14:35:31 2016 +0100 Reset RNG seed with current time and pid for each test started This will hopefully fix fullstack failures where different process fixtures running in parallel test processes and relying on the same random.choice() generator seeded by the same initial value could pick up the same value as a service free port, and spawn their respective resources using the same port. Which made one of those unlucky services to fail. Change-Id: I13cfa9392fd138c5e1b1b7d397b9ea91b2a47ed2 Closes-Bug: #1551288 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1551288 Title: Fullstack native tests sometimes fail with an OVS agent failing to start with 'Address already in use' error Status in neutron: Fix Released Bug description: Example failure: test_connectivity(VLANs,Native) fails with this error: http://paste.openstack.org/show/488585/ wait_until_env_is_up is timing out, which typically means that the expected number of agents failed to start. Indeed in this particular example I saw this line being output repeatedly in neutron-server.log: [29/Feb/2016 04:16:31] "GET /v2.0/agents.json HTTP/1.1" 200 1870 0.005458 Fullstack calls GET on agents to determine if the expected amount of agents were started and are successfully reporting back to neutron- server. We then see that one of the three OVS agents crashed with this TRACE: http://paste.openstack.org/show/488586/ This happens only with the native tests using the Ryu library. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1551288/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp