Public bug reported: In stress testing of a nova+placement scenario where there is only one nova-compute process (and thus only one resource provider) but more than one thread worth of nova-scheduler it is fairly easy to trigger the "Failed scheduler client operation claim_resources: out of retries: Retry" error found near https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L110
(In a quick test on a devstack with a fake compute driver, 100 separate requests to boot one server, 13 failed for this reason.) If we imagine 4 threads: * A is one nova-scheduler * B is one placement request/response * C is another nova-scheduler * D is a different placement request/request A starts a PUT to /allocations, request B, at the start of which it reads the resource provider and gets a generation and the for whatever reason waits for a while. Then C starts a PUT to /allocations, request D, reads the same resource provider, same generation, but actually completes, getting to increment generation before B. When B gets to increment generation, it fails because now the generation it has is no good for the increment procedure. This is all working as expected but apparently is not ideal for high concurrency with low numbers of compute nodes. The currently retry loop has no sleep() and only counts up to 3 retries. It might make sense for it to do a random sleep before retrying (so as to introduce a bit of jitter in the system), and perhaps retry more times. Input desired. Thoughts? Another option, of course, is "don't run with so few compute nodes", but as we can likely expect this kind of stress testing (it was a real life stress test that worked fine in older (pre-claims-in-the-scheduler) versions that exposed this) we may wish to make it happier. ** Affects: nova Importance: Undecided Status: New ** Tags: placement -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1770220 Title: report client allocation retry handling insufficient Status in OpenStack Compute (nova): New Bug description: In stress testing of a nova+placement scenario where there is only one nova-compute process (and thus only one resource provider) but more than one thread worth of nova-scheduler it is fairly easy to trigger the "Failed scheduler client operation claim_resources: out of retries: Retry" error found near https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L110 (In a quick test on a devstack with a fake compute driver, 100 separate requests to boot one server, 13 failed for this reason.) If we imagine 4 threads: * A is one nova-scheduler * B is one placement request/response * C is another nova-scheduler * D is a different placement request/request A starts a PUT to /allocations, request B, at the start of which it reads the resource provider and gets a generation and the for whatever reason waits for a while. Then C starts a PUT to /allocations, request D, reads the same resource provider, same generation, but actually completes, getting to increment generation before B. When B gets to increment generation, it fails because now the generation it has is no good for the increment procedure. This is all working as expected but apparently is not ideal for high concurrency with low numbers of compute nodes. The currently retry loop has no sleep() and only counts up to 3 retries. It might make sense for it to do a random sleep before retrying (so as to introduce a bit of jitter in the system), and perhaps retry more times. Input desired. Thoughts? Another option, of course, is "don't run with so few compute nodes", but as we can likely expect this kind of stress testing (it was a real life stress test that worked fine in older (pre-claims-in-the- scheduler) versions that exposed this) we may wish to make it happier. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1770220/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp