Public bug reported: Here's an example: http://logs.openstack.org/13/302913/1/check/gate-neutron-dsvm-functional/91dd537/console.html
Logstash query: build_name:"gate-neutron-dsvm-functional" AND build_status:"FAILURE" AND message:"Killed timeout -s 9" 45 hits in the last 7 days. Ihar and I checked the timing, and it started happening as we merged: https://review.openstack.org/#/c/298056/ There's a few problems here: 1) It appears like a test is freezing up. We have a per-test timeout defined. The timeout is defined by OS_TEST_TIMEOUT in tox.ini, and is enforced via a fixtures.Timeout fixture set up in the oslotest base class. It looks like that timeout doesn't always work. 2) When the global 2 hours job timeout is hit, it doesn't perform post-tests tasks such as copying over log files, which makes these problems a lot harder to troubleshoot. 3) And of course, there is some sort of issue with likely https://review.openstack.org/#/c/298056/. We can fix via a revert, which will increase the failure rate of fullstack. Since I've been unable to reproduce this issue locally, I'd like to hold off on a revert and try to get some more information by tackling some combination of problems 1 and 2, and then adding more logging to figure it out. ** Affects: neutron Importance: High Status: New ** Tags: functional-tests gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1567668 Title: Functional job sometimes hits global 2 hour limit and fails Status in neutron: New Bug description: Here's an example: http://logs.openstack.org/13/302913/1/check/gate-neutron-dsvm-functional/91dd537/console.html Logstash query: build_name:"gate-neutron-dsvm-functional" AND build_status:"FAILURE" AND message:"Killed timeout -s 9" 45 hits in the last 7 days. Ihar and I checked the timing, and it started happening as we merged: https://review.openstack.org/#/c/298056/ There's a few problems here: 1) It appears like a test is freezing up. We have a per-test timeout defined. The timeout is defined by OS_TEST_TIMEOUT in tox.ini, and is enforced via a fixtures.Timeout fixture set up in the oslotest base class. It looks like that timeout doesn't always work. 2) When the global 2 hours job timeout is hit, it doesn't perform post-tests tasks such as copying over log files, which makes these problems a lot harder to troubleshoot. 3) And of course, there is some sort of issue with likely https://review.openstack.org/#/c/298056/. We can fix via a revert, which will increase the failure rate of fullstack. Since I've been unable to reproduce this issue locally, I'd like to hold off on a revert and try to get some more information by tackling some combination of problems 1 and 2, and then adding more logging to figure it out. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1567668/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp