Reviewed: https://review.opendev.org/675553 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f875c9d12fa03e1eb0d6ac0f3dd95f502ae1e6a1 Submitter: Zuul Branch: master
commit f875c9d12fa03e1eb0d6ac0f3dd95f502ae1e6a1 Author: Balazs Gibizer <balazs.gibi...@est.tech> Date: Fri Aug 9 09:53:45 2019 +0200 Prevent init_host test to interfere with other tests The test_migrate_disk_and_power_off_crash_finish_revert_migration test needs to simulate a compute host crash at a certain point. It stops the execution at a certain point by injecting a sleep then simulating a compute restart. However the sleep is just 30 seconds which allows the stopped function to return while other functional tests are running in the same test worker process making those tests fail in a weird way. One simple solution is to add a big enough sleep to the test that will never return before the whole functional test execution. This patch proposes a million seconds which is more than 277 hours. Similar to how the other test in this test package works. This solution is hacky but simple. A better solution would be to further enhance the capabilities of the functional test env supporting nova-compute service crash / kill + restart. Change-Id: Ib0d142806804e9113dd61d3a7ec15a98232775c8 Closes-Bug: #1839515 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839515 Title: Weird functional test failures hitting neutron API in unrelated resize flows since 8/5 Status in OpenStack Compute (nova): Fix Released Bug description: Noticed here: https://logs.opendev.org/32/634832/43/check/nova-tox-functional- py36/d4f3be5/testr_results.html.gz With this test: nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled That's a simple test which disables a service and then asserts there is a service.update notification, but there is another notification happening as well: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled 'uuid': self.service_uuid}) File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification base._verify_notification(sample_file_name, replacements, actual) File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS)) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual self.assertThat(observed, matcher, message) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: 1 != 2 And in the error output, we can see this weird traceback of a resize revert failure b/c the NeutronFixture isn't being used: 2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/ 2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception yield File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance migration_p) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start client = _get_ksa_client(context, admin=True) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client auth_plugin = _get_auth_plugin(context, admin=admin) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin _ADMIN_AUTH = _load_auth_plugin(CONF) File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin raise neutron_client_exc.Unauthorized(message=err_msg) neutronclient.common.exceptions.Unauthorized: Unknown auth type: None According to logstash this started showing up around 8/5: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d Which makes me think this change, which is restarting a compute service and sleeping in a stub: https://review.opendev.org/#/c/670393/ Might be screwing up concurrently running tests. Looking at when that test runs and the ones that fails: 2019-08-07 23:21:54.157918 | ubuntu-bionic | {4} nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration [4.063814s] ... ok 2019-08-07 23:25:00.073443 | ubuntu-bionic | {4} nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled [160.155643s] ... FAILED We can see they are on the same worker process and run at about the same time. Furthermore, we can see that TestServiceUpdateNotificationSampleLatest.test_service_disabled eventually times out after 160 seconds and this is in the error output: 2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs. Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status self.rpcapi.set_host_enabled(context, service.host, enabled) File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled return cctxt.call(ctxt, 'set_host_enabled', enabled=enabled) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call transport_options=self.transport_options) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send transport_options=transport_options) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 224, in send transport_options) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 208, in _send reply, failure = reply_q.get(timeout=timeout) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get return waiter.wait() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait return get_hub().switch() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch return self.greenlet.switch() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run self.wait(sleep_time) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait time.sleep(seconds) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler raise TimeoutException() fixtures._fixtures.timeout.TimeoutException So test_migrate_disk_and_power_off_crash_finish_revert_migration is probably not cleaning up properly. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839515/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp