Public bug reported: Bug 1248563 "Instance deletion is prevented when another component locks up" provided a partial fix https://review.openstack.org/#/c/55444/ which introduces another problem, which is subsequent delete requests are ignored.
When doing Tempest 3rd party CI runs we see instances fail to build (could be a scheduling/resource problem, timeout, whatever) and then get stuck in deleting task_state and are never cleaned up. The patch even says: "Dealing with delete requests that never got executed is not in scope of this change and will be submitted separately." That's the bug reported here. For example, this is several hours after our Tempest run finished: http://paste.openstack.org/show/74584/ There is also some history after patch 55444 merged, we had this revert of a revert https://review.openstack.org/#/c/70187/, which got reverted itself again later because it was causing race failures in hyper-v CI: https://review.openstack.org/#/c/71363/ So there is a lot of half-baked code here and I haven't been able to get a response from Stan on bug 1248563 but basically it boils down to the original change 55444 depended on some later changes working, and those were ultimately reverted due to race conditions breaking in the gate. I would propose that at least for icehouse-rc1 we get the original patch reverted since it's not a complete solution and introduces another bug. ** Affects: nova Importance: High Status: New ** Tags: api icehouse-rc-potential ** Changed in: nova Importance: Undecided => High ** Tags added: icehouse-rc-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1299139 Title: Instances stuck in deleting task_state never cleaned up Status in OpenStack Compute (Nova): New Bug description: Bug 1248563 "Instance deletion is prevented when another component locks up" provided a partial fix https://review.openstack.org/#/c/55444/ which introduces another problem, which is subsequent delete requests are ignored. When doing Tempest 3rd party CI runs we see instances fail to build (could be a scheduling/resource problem, timeout, whatever) and then get stuck in deleting task_state and are never cleaned up. The patch even says: "Dealing with delete requests that never got executed is not in scope of this change and will be submitted separately." That's the bug reported here. For example, this is several hours after our Tempest run finished: http://paste.openstack.org/show/74584/ There is also some history after patch 55444 merged, we had this revert of a revert https://review.openstack.org/#/c/70187/, which got reverted itself again later because it was causing race failures in hyper-v CI: https://review.openstack.org/#/c/71363/ So there is a lot of half-baked code here and I haven't been able to get a response from Stan on bug 1248563 but basically it boils down to the original change 55444 depended on some later changes working, and those were ultimately reverted due to race conditions breaking in the gate. I would propose that at least for icehouse-rc1 we get the original patch reverted since it's not a complete solution and introduces another bug. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1299139/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp