Reviewed: https://review.openstack.org/534449 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f84781f246004651e0636f8b6507ee1e48bac6b0 Submitter: Zuul Branch: master
commit f84781f246004651e0636f8b6507ee1e48bac6b0 Author: Harald Jensas <hjen...@redhat.com> Date: Tue Jan 16 21:15:22 2018 +0100 Add retry decorator update_segment_host_mapping() When multiple agents register at the same time there is a possible race condition causing segment host mappings updates to fail. StaleDataError raised by SQLAlchemy ORM. Adding retry_if_session_inactive() decorator to the method fixes the issue. Also serialize the method with lockutils. It takes 25+ seconds to update segment host mappings for 10 agents with the retry decorator alone. With the method serialized the same operation completes in less than 1 second. The retry decorator is still required for active/active scenarios. Closes-Bug: #1743579 Change-Id: I616457f094d000a4016c610b454be8269d9b4948 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1743579 Title: Concurrent report_state from multiple agents: segment_host_mapping fails - StaleDataError Status in neutron: Fix Released Bug description: When multiple host agents rapidly report_state for the first time we get StaleDataError and _update_segment_host_mapping_for_agent does not complete for all hosts. Attached is a file with logs as well as reproducer script and instruction on how to set up devstack environment similar to the one I am using. To Reproduce: ------------- Run script with the delay, time.sleep(10), commented. Results: * 2x StaleDataError * Only 1 attempt to add host to placement/host-aggregate. MariaDB [neutron]> MariaDB [neutron]> SELECT * FROM segmenthostmappings; +--------------------------------------+---------------------------------+ | segment_id | host | +--------------------------------------+---------------------------------+ | a974ae4c-1389-4e41-9ab9-820165c26acd | host2 | | a974ae4c-1389-4e41-9ab9-820165c26acd | routed-devstack.lab.example.com | | bc626d3d-5503-4875-9db8-e1bcfad35979 | host2 | | bc626d3d-5503-4875-9db8-e1bcfad35979 | routed-devstack.lab.example.com | | ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | host2 | | ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | routed-devstack.lab.example.com | +--------------------------------------+---------------------------------+ Conclutions: * 2x StaleDataError * 1x successfull _update_segment_host_mapping after_create. *** We should see 3x attempts to add to placement/host-aggregate, one for each host agent. **** Running the reproducer script with the delay uncommented (No issue): -------------------------------------------------------------------- Run script with the delay, time.sleep(10), enabled. Results: * No StaleDataError * 3 attempts to add the host to placemenb/host-aggregate. MariaDB [neutron]> SELECT * FROM segmenthostmappings; +--------------------------------------+---------------------------------+ | segment_id | host | +--------------------------------------+---------------------------------+ | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host0 | | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host1 | | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host2 | | 11b9258f-8712-43b7-8f39-3eab627a8c7f | routed-devstack.lab.example.com | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host0 | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host1 | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host2 | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | routed-devstack.lab.example.com | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host0 | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host1 | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host2 | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | routed-devstack.lab.example.com | +--------------------------------------+---------------------------------+ Conclution: * 3x successfull _update_segment_host_mapping after_create. ** NOTE: ** The RESP BODY: {"itemNotFound": {"message": "Compute host host1 could not be found.", "code": 404}} errors in the logs is expected, the fake host is not in Nova, so this is expeced. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1743579/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp